AI systems + high performance computing
Making machine learning models run fast on real hardware.
I am a Member of Technical Staff at Microsoft AI, where I work across inference and training software stacks for large-scale AI systems. My recent work includes Triton kernel optimizations, FP8 KV cache integration, and checkpointing and fault-tolerance infrastructure for distributed training.
Before joining Microsoft AI, I worked on DeepSpeed inference optimization for OpenAI DALL.E and large language models, edge AI acceleration at Blaize, and Larq Compute Engine at Plumerai. My work sits at the intersection of high performance computing, machine learning systems, and architecture-aware optimization.