LLM-based Model Deployment
This module teaches the complete lifecycle of LLM-based model deployment—from understanding foundational research (like transformers and RAG) to training, quantization, and deploying both standard and multimodal LLMs. It emphasizes hands-on implementation, evaluation techniques, production readiness, and real-world deployment challenges across various frameworks and model types.
NOTE: The pattern for model deployment will be the same, but different types of models can be picked by different groups. These will be provided as well in this section, but the deliverables will remain the same.
Day 1-15:
- Summary report for research paper for “Attention is all you need”, Analysis of BERT and Transformers
Deliverables
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 16-30:
- Summary report for the following videos on tokenization and creation of GPT from scratch:
Deliverables
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 31-45:
- Summary report for crucial research techniques:
- Vector Databases and Vectorization
- RAG
- Advanced RAG
- Cache RAG
- Implementation of HuggingFace models and inference API
Deliverables
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
- Implementation code and report for HuggingFace models and inference API
Day 46-60:
- Summary report for complete LLM pipeline and its auxiliary systems to understand different techniques and their purpose: (Phase 1)
- LLM data preprocessing
- Different types of data sets: text, Q/A, etc. along with examples of such datasets used
- LLM training
- LLM Loss functions
- LLM Evaluation metrics
- LLM Guardrails
- Additional elements required for LLM production
- LLM data preprocessing
Deliverables
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 61-75:
- Summary report for complete LLM pipeline and its auxiliary systems to understand
different techniques and their purpose: (Phase 2)
- LLM data preprocessing
- Different types of data sets: text, Q/A, etc. along with examples of such datasets used
- LLM training
- LLM Loss functions
- LLM Evaluation metrics
- LLM Guardrails
- Additional elements required for LLM production
- LLM data preprocessing
Deliverables
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 76-90:
- Deployment of LLM model (can be from the list of models, ideally Llama 3.1 is recommended)
Deliverables
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 91-105:
- Summary report for different types of quantization and sample quantization on deployed LLM Model (Phase 1)
Deliverables
- Proper implementation of different quantization techniques for various bits, their advantages and disadvantages, implementation of these techniques on locally deployed LLM models
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 106-120:
- Summary report for different types of quantization and sample quantization on deployed LLM Model (Phase 2)
Deliverables
- Proper implementation of different quantization techniques for various bits, their advantages and disadvantages, implementation of these techniques on locally deployed LLM models
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 121-135:
- Research report of multimodal LLMs (Text to Image, Text to Video)
Deliverables
- Review of multimodal LLMs, detailed report of their functioning
- Summary report of the techniques, tutorial code with proper functioning, blog for publication, video demo (optional but recommended)
Day 136-150:
- Deployment of multimodal LLM models (Phase 1)
Deliverables
- Actual deployment of multimodal LLMs along with detailed internal doc, public blog, working code and testing results
Day 151-165:
- Deployment of multimodal LLM models (Phase 2)
Deliverables
- Actual deployment of multimodal LLMs along with detailed internal doc, public blog, working code and testing results
Day 166-180:
- Summary report of real world challenges for multimodal LLMs
- Character consistency across different generations
- Character movements and actions along with consideration of other characters and backgrounds without breaking the real life shapes of the characters
Deliverables
- Detailed report along with possible suggested solutions and future scope of improvement
- Case study docs for products that have solved it and their properly documented testing and observation reports
Tech Stack
- Python
- HuggingFace
- Tensorflow, Pytorch
- LangChain
- pandas
- DeepSpeed - Microsoft’s framework for optimizing large-scale model training
- FSDP (Fully Sharded Data Parallel) - Efficient training for large models
- LoRA (Low-Rank Adaptation) - Efficient fine-tuning method for reducing training time
- PEFT (Parameter Efficient Fine-Tuning) - Techniques for optimizing LLM fine- tuning
- Megatron-LM - NVIDIA’s large-scale model parallel training framework
- AWS, Digital Ocean
- Ubuntu shell commands, Docker
- ONNX Runtime
- TensorRT
- GPTQ (Quantized LLMs) - High-performance quantization framework
- FastAPI
- Flask
- Gradio/Streamlit
- LAVIS (Multimodal Vision-Language Framework) - Model support for multimodal architectures
- LangChain Guardrails - Tools for controlling LLM outputs
- OpenAI Evals / Hugging Face Evaluate - Standardized benchmarks for LLMs
- MT-Bench (Multi-turn Benchmark for Chat Models) - Chat-based LLM performance benchmarking
Models list
- LLaMA 3.1 (Large Language Model Meta AI):
- DeepSeek R1
- Mistral 7B
- BLOOM
- Falcon 180B
- GPT-NeoX
Multimodal
- LLaVA (Large Language and Vision Assistant)
- https://paperswithcode.com/task/multimodal-large-language-model
- https://paperswithcode.com/paper/mme-a-comprehensive-evaluation-benchmark-for
- https://paperswithcode.com/paper/shapellm-universal-3d-object-understanding
- https://paperswithcode.com/paper/kosmos-2-grounding-multimodal-large-language
- https://paperswithcode.com/paper/timechat-a-time-sensitive-multimodal-large
- https://paperswithcode.com/paper/minigpt4-video-advancing-multimodal-llms-for
- https://paperswithcode.com/paper/a-survey-on-multimodal-large-language-models
- https://paperswithcode.com/paper/mplug-docowl-modularized-multimodal-large
- https://paperswithcode.com/paper/finvis-gpt-a-multimodal-large-language-model