I am a Staff Software Engineer at Blaize, where I work on optimizing deep learning models for Blaize Graph Streaming Processor (GSP) architecture. Blaize GSP is a novel graph-native computing architecture designed from the ground up for efficient artificial intelligence applications at the edge. My work falls in the intersection of High Performance Computing (HPC) and Machine Learning (ML). I am particularly interested in researching parallel, scalable ML algorithms and developing systems to accelerate these models on various hardware architectures.

Prior to joining Blaize, I was the technical lead, architect, and the main developer of Larq Compute Engine (LCE) at Plumerai. LCE is a highly optimized inference engine for Binarized Neural Networks (BNNs) on mobile and embedded devices. Before that, I was a software engineer at Intel where I worked on new x86 architecture extensions, such as Intel Deep Learning Boost instructions (AVX-512 VNNI) and Control-Flow Enforcement Technology.

I received my PhD in HPC and parallel algorithms from Institute for Advanced Study at Technical University of Munich, under supervision of George Biros and Hans-Joachim Bungartz. You can see a full list of my peer-reviewed or pre-print manuscripts on my Google Scholar page.