deep learning · pure c# · no python

Predictable ML inference
for .NET production.

Overfit is a from-scratch deep learning engine for .NET 10. Train in PyTorch or in-process, then run zero-allocation inference with no native dependencies and no Python runtime. AOT-compatible single-binary deployment.

0 B
allocated per GPT-2 token
(KV-cache decode)
7.6×
faster than ONNX Runtime
(Linear 784→10 inference)
0
native deps. no Python.
AOT compatible.

What it does

01 · zero-alloc inference

Run trained models with no GC pressure.

Pre-allocated buffers, span-based math, no per-call allocations on the hot path. Predictable tail latency under concurrent load.

using var engine = InferenceEngine.FromSequential(model, 784, 10);

Span<float> input  = stackalloc float[784];
Span<float> output = stackalloc float[10];
engine.Run(input, output);  // zero-allocation
02 · transformers in pure C#

GPT-2 / Llama / Qwen / Mistral inference, no Python.

Load weights from HuggingFace or directly from GGUF files (F32, F16, BF16, Q8_0, Q4_K, Q6_K). KV-cache decode at 0 bytes per token. Top-10 logit overlap 10/10 vs PyTorch reference (maxAbsDiff = 0.000107).

using var gpt2    = Gpt2.LoadSmall("./models/gpt2");
using var session = gpt2.CreateSession();

session.Reset(gpt2.Tokenizer.Encode("Hello, world."));

for (int i = 0; i < 32; i++)
{
    var token = session.GenerateNextToken(in sampling);
    Console.Write(gpt2.Tokenizer.DecodeToken(token));
}
03 · onnx import

Bring PyTorch-trained models, run them natively.

Direct ONNX import. 14 operators supported (Conv, Gemm, ReLU, Tanh, BatchNorm, MaxPool, Add for skip connections, etc.). Branching DAG topology works (ResNet, DenseNet, EfficientNet).

var model = OnnxImporter.Load("classifier.onnx");
model.Eval();

using var engine = InferenceEngine.FromSequential(model, 784, 10);
var prediction = engine.Predict(input);  // zero-alloc

Benchmarks

AMD Ryzen 9 9950X3D · Windows 11 · .NET 10 · BenchmarkDotNet 0.15.8. Reproducible from the repo.

Single inference — Linear(784→10)

EngineMeanAllocatedvs ONNX
Overfit250.7 ns0 B7.6× faster
ONNX Runtime (pre-allocated)1,899 ns224 Bbaseline
ONNX Runtime (standard)3,388 ns952 B0.56×

GPT-2 Small (124M params) — KV-cache decode, 64 tokens

PathMeanAllocated
Legacy (full forward per token)6,318 ms62.0 MB · grows
KV-cache973 ms0 B / token *
* One-time session creation cost (~74 MB constant). Per-token allocation verified zero by a CI assertion that fails the build if anything allocates during decode.

Concurrent inference — 8 threads × 1,000 calls

EngineMeanAllocatedvs ONNX
Overfit522 ms0 B3.6× faster
ONNX Runtime1,894 ms117 MBbaseline

Built for

infra

Kubernetes anomaly detection

Self-hosted pod-level anomaly detection on AKS / EKS. Replace Datadog or Dynatrace anomaly modules without sending metrics out of cluster. Sub-minute detection latency, no SaaS bill.

.net enterprise

Embedded ML in production services

Add ML inference to existing .NET services without shipping a Python sidecar. One process, one runtime, one deploy pipeline, one security audit.

low-latency

High-frequency / ad-tech inference

Sub-microsecond P99.9 (0.80 µs vs 5.70 µs for ONNX). Zero GC pauses under sustained concurrent load. Deterministic tail latency matters more than peak throughput.

embedded

Edge & on-device deployment

Native AOT single-binary deployment. No CUDA libraries to ship, no Python runtime to install. Industrial automation, gaming anti-cheat, IoT.

What it is NOT

Where this is the wrong tool. Read this before deciding.

Pricing

Dual-licensed: free under AGPLv3 for open-source projects and research. Commercial licenses for closed-source production use:

Self-Service
$4,800
/ year
  • Commercial license, perpetual for the licensed version
  • NuGet package + private GitHub support
  • Email response within 5 business days
  • Self-managed deployment
  • Minor version updates included
Get started
Enterprise
$48,000
/ year
  • Everything in Production
  • 24h SLA + on-call support
  • Dedicated retraining pipeline
  • Custom feature development
  • Architecture review and roadmap input
Talk to us

Get in touch

Tell us about your workload. We respond within one business day. If we are not a fit, we will say so.

Or email directly: devonbike@gmail.com
Your email client should have opened. If not, please write directly to devonbike@gmail.com. We will respond within one business day.