C++ Optimized ML Orderbook

Microsecond-latency limit orderbook engine with an integrated ML signal layer, built in C++17. The matching engine and inference pipeline share the same memory space — no serialization overhead, no Python GIL.

C++17 CUDA Python CMake in progress
Full Case Study
CUDA Learning Tutorial Series

A structured, hands-on GPU programming curriculum bridging hello world and advanced ML kernel optimization. Covers SIMT execution, shared memory, warp divergence, and matrix multiply from scratch.

CUDA C++ NVCC Python in progress
Full Case Study
NYU GPU Cluster — ML Infrastructure

Enabled NYU's access to 8/16-GPU AI training clusters on repurposed Meta server hardware. Designed workload scheduling architecture and documented infrastructure for student researchers.

Python CUDA Linux Docker complete

Currently building

The C++ orderbook and CUDA tutorial series are both active. Case studies will be updated with benchmarks, architecture diagrams, and code walkthroughs as they ship.

View GitHub