AMD

Place AMD , San Jose, CA, USA

Keywords Quantization, GPU Parallelism, Mixture-of-Experts

Role AI Frameworks Intern

Timeline May 2025 - Aug 2025

Proposed and implemented multi-gpu based solutions for GPTQ quantization of MoE models and model evaluation
Bypassed GIL in python for quantization of experts and distributed the computation among GPU devices using model parallelism
Achieved 1.65x and 10x speedup in quantization of Qwen1.5-MoE-A2.7B & DeepSeek-R1 with 0.5% improvement in perplexity
Implemented data parallelism, partitioning the evaluation dataset across ranks and achieved 5.01x speedup in evaluation time