← Projects / Systems Engineering

C++ Optimized ML Orderbook

High-performance limit orderbook engine with integrated ML signal layer — built in C++17 for microsecond-latency market simulation. The matching engine and ML inference live in the same memory space.

C++17 CUDA Python CMake In Progress

GitHub ↗

Problem Statement

Traditional backtesting frameworks are bottlenecked by Python's GIL and interpreted overhead. Real-world HFT systems operate at sub-millisecond latency — a Python orderbook will never get you there. This project explores how far you can push orderbook performance when ML inference lives next to the matching engine in the same memory space, with no serialization boundary between them.

Architecture

System Architecture

Order Ingestion

Lock-free SPSC queue
atomic compare-and-swap

→

Matching Engine

Price-level sorted maps
arena allocator, zero-heap hot path

→

ML Signal Layer

LGBM via ONNX Runtime
<2μs inference, same memory space

↓

Decision Gate

ML signal gates order placement based on predicted short-horizon mid-price direction

Price-level sorted maps with custom arena allocators — zero heap allocation on the order book hot path
ML component: LightGBM model trained on L2 order book features, loaded and served via ONNX Runtime C++ API
ML signal feeds a pre-submission decision layer: predicted mid-price direction gates whether an order is placed or held
Benchmarked with micro-benchmarks timing individual match cycles; profiled with perf and Valgrind/callgrind

Key Design Decisions

C++17 over Rust or Go — STL ecosystem compatibility (std::map, STL algorithms) and seamless integration with ONNX Runtime's C++ API; no FFI overhead
Memory management — RAII throughout; arena allocator for order objects, custom pool allocator for price-level nodes eliminates allocation jitter
ML model choice — LGBM over neural nets: tree inference is deterministic and cache-friendly, consistently under 2μs, unlike transformer-style inference
Concurrency — lock-free SPSC queue for order ingestion feeds a single-writer order book; avoids mutex contention on the critical path

Code Highlight — Arena Allocator Hot Path

// Zero-allocation order insertion via arena
struct Order { uint64_t id; double price; uint32_t qty; Side side; };

class ArenaAllocator {
  alignas(64) uint8_t buf_[POOL_SIZE];
  size_t offset_ = 0;
public:
  Order* alloc() noexcept {
    return new(buf_ + (offset_++ & MASK) * sizeof(Order)) Order{};
  }
};

Results & Metrics

in prog. P99 match latency
benchmarking in progress

in prog. Orders/sec
throughput test pending

in prog. Speedup vs Python
baseline comparison pending

Latency benchmarks vs a pure Python orderbook in progress — expected 100–500× speedup based on prior C++/Python comparisons
ML signal directional accuracy and P&L simulation on held-out L2 data planned for next milestone
Memory footprint profiled with Valgrind massif; arena allocator eliminates GC-style jitter on the hot path

Tech Stack

C++17 CUDA Python CMake ONNX Runtime LightGBM Custom Arena Allocator

View on GitHub ↗ ← All Projects