PyTorch JIT Compiler
[1/22/26] It's been a few weeks of work (and travelling), but I'm here to report my current progress. Frankly, the past few weeks involved a lot of reading and learning on these topics: A major development includes my Python frontend & Rust setup. This project is actually my 2nd time ever using Rust: for setup purposes, I familiarized myself with Cargo (Rust's package management system), and I'm getting the hang of the memory lifetime model that Rust provides. I also played around with Rust bindings for PyTorch (provided by PyO3) and Rust bindings for MLIR (via Melior). What's next? I have to get my hands dirty with MLIR. I have been using the Toy example to learn implementation fundamentals for an MLIR-based compiler.
sopt API usage
sopt API usage
Above, I have a very simple 2-layer network defined using PyTorch. Similar to PyTorch, I utilize a compile() decorator under my own sopt library. Taking a step back from the technical side, I also wanted to share how fascinating I found this step; I essentially built my own importable Python library that funnels into the binary for my Rust-based compiler! Many libraries that I use on a daily basis follow some form of what I've just implemented here, which was eye-opening to me.

Anyway, during execution, sopt.compile() converts the torch.fx format into a list of related JSON objects that can easily be received by the Rust backend. I set up a class, PyNode, that encapsulates the important data fields from FXNodes.
sopt API usage
sopt compiler receiver endpoint for sopt.compile()
From this point, I am working on setting up my first Dialect/IR on the MLIR skeleton. I call it the "soptfx" dialect. The idea is to lower the PyNodes to operations in MLIR in order to make the data graph; since FX nodes come in 3 main flavors (placeholders, callfunctions, and outputs), I handle this lowering separately for each case. My goal here is to output an accurate .mlir file in order to gauge the correctness of my current logic.




[1/4/26] So, I've been working on this project for the past few weeks. Up until yesterday, I spent most of my time reading up on basic theory behind ML/DL compilers. I plan to add much more to this blog to help me reinforce whatever I learn and to log my journey into this field (incase it inspires anyone).

I'll get started with some preliminaries: my related background up to this point involves GPU programming (CUDA), systems programming, ML systems, and a bit of compiler construction (LLVM). All of these were picked up through courses at my school, UIUC. Over the past few weeks, I read up on a few more technologies:
  • MLIR
  • TVM
  • Triton
  • TensorRT
  • nvFuser
Now, I somewhat understand where these technologies fit into a compiler stack. MLIR serves as compiler infrastructure, the "skeleton" of the compiler we build; TVM is a fullstack compiler that emphasizes loop-level/tiling optimizations and code generation for heterogenous devices; Triton is a language/IR/JIT compiler (used by PyTorch 2.0) that helps write extensible GPU kernels; TensorRT is an inference engine that optimizes models for NVIDIA GPUs; and nvFuser is a "Fusion Code Generator" that generates code optimized for NVIDIA GPUs.

I will revisit these definitions the more I explore :)

Now, I have finalized my design that I want to use for this project. I call it "soptRT" (still need to think of a better name). I've broken this project down into 2 phases:
  • Phase I: To better learn ML optimizations at a low-level, I want to introduce my own compile() trigger in PyTorch that calls a Rust-based compiler that is built on MLIR. This compiler will trigger optimization passes (Fusion, Quantization, Memory Mapping, etc) and funnel into an existing backend (Triton/TVM).
  • Phase II: Next, I want to design my own kernel generator. After reading a little, I realized that code generation is a very interesting problem that currently uses several solutions to implement. TVM uses a paradigm called ML for ML, while engineers handcraft kernels for NVIDIA's cuDNN. Details for this part are TBD and will require me to read a bit more.
Great! Now that the plan is out of the way, I will update this page over time with challenges I run into. For now, I am working on creating the "bridge" from my own PyTorch compile() and my Rust backend with MLIR.

Stay tuned!




Cool links: I'd also like to highlight some cool links I found throughout this project. There are a lot of cool startups and innovations in this field of ML compilers that I want to further explore:
  • Tile IR: NVIDIA released this somewhat recently. This is a "low-level tile virtual machine" allowing a developer to work in terms of tiles.
  • Compiler Optimization Advent Calendar: I saw this on LinkedIn; as the name suggests, the author goes over some interesting compiler designs.
  • Modular: really cool company led by Chris Lattner who I've been following for a while. The language seems very well-designed and I am curious to see how its compiler works.