AI systems + high performance computing

Making machine learning models run fast on real hardware.

I am a Member of Technical Staff at Microsoft AI, where I work across inference and training software stacks for large-scale AI systems. My recent work includes Triton kernel optimizations, FP8 KV cache integration, and checkpointing and fault-tolerance infrastructure for distributed training.

Before joining Microsoft AI, I worked on DeepSpeed inference optimization for OpenAI DALL.E and large language models, edge AI acceleration at Blaize, and Larq Compute Engine at Plumerai. My work sits at the intersection of high performance computing, machine learning systems, and architecture-aware optimization.

Experience

Microsoft AI

Contributing to inference and training software stacks, including Triton kernel optimizations, FP8 KV cache integration, and checkpointing and fault-tolerance infrastructure for large-scale training.

Microsoft DeepSpeed

Optimized OpenAI DALL.E inference latency and added DeepSpeed Inference support for large language models including OPT, Falcon, and Phi-2.

Blaize

Optimized computer vision deep learning models such as ResNet, OpenPose, and Mask R-CNN for the Blaize Graph Streaming Processor architecture.

Plumerai

Led technical design and development of Larq Compute Engine, an optimized inference engine for binarized neural networks on mobile and embedded devices.

Intel

Benchmarked Intel Deep Learning Boost instructions and implemented Control-Flow Enforcement Technology support in the GNU Debugger.