BenchEdge helps AI developers automatically shrink, speed up, and benchmark ML models for edge devices — without fighting hardware-specific optimization tools. The beta focuses on ONNX models with INT8 quantization; pruning, hardware kernels, and PyTorch/TensorFlow support are rolling out.
The mental model behind BenchEdge — from a raw model to a deployable, benchmarked build.
Drop in your ONNX model with a small calibration sample. We detect the architecture automatically.
Pick a target device and set your accuracy budget — e.g. Jetson Orin with a max 1% drop.
BenchEdge tests quantization, pruning and kernels, then benchmarks each build side by side.
Download the recommended build, packaged for your runtime and ready to deploy on-device.
Getting a model to run fast on real hardware is a research project on its own. Most teams can't afford it.
A model that flies on an A100 won't fit in the memory of a phone or a Jetson Nano. Shrinking it by hand is slow and error-prone.
TensorRT, TFLite, QNN, OpenVINO — each device needs its own toolchain, flags, and kernels. Expertise doesn't transfer.
Companies pay specialists six figures to compress models, or burn weeks of engineer time guessing at accuracy/latency tradeoffs.
No manual tuning. No reading vendor docs at 2 AM. Just upload and ship.
Drop in an .onnx file plus a small calibration sample. We auto-detect the architecture and input shapes. (PyTorch & TF conversion in beta.)
BenchEdge applies INT8 quantization with auto-calibration, then benchmarks the build on your target device profile.
Get the fastest model that stays inside your accuracy budget, packaged for your runtime, with a full benchmark report you can trust.
A look at the BenchEdge workspace — manage models, run optimizations, and grab the recommended export.
Interface preview — the beta workspace is rolling out to early-access users.
Pick a device and see how a ResNet-50 would optimize across size, latency, and accuracy.
Model: ResNet-50 · 25.6M params
Sample report based on expected optimization flow. Live numbers come from real hardware benchmarks during the beta.
A full sample run: upload a model, set a device and budget, and let BenchEdge test the techniques and pick the winner.
| Build tested | Size | Latency | Speedup | Accuracy (mAP) | Result |
|---|---|---|---|---|---|
| FP16 | −31% | 14.2 ms | 1.9× | −0.1% | within budget |
| INT8 (full) | −68% | 8.1 ms | 3.4× | −2.1% | over 1% budget |
| INT8 + pruning (mixed) | −42% | 9.8 ms | 2.8× | −0.7% | ★ recommended |
Sample benchmark report — figures are simulated for preview. Live numbers come from real hardware runners during the beta.
Pick a device profile and we handle the rest — no toolchain wrangling.
Everything specialists do by hand — done automatically and benchmarked honestly.
Post-training and quantization-aware INT8/INT4 with automatic calibration. We pick the scheme that holds accuracy.
Remove redundant channels and heads, then fine-tune to recover accuracy — smaller models with no custom runtime.
Device-specific operator fusion and kernel selection for Jetson (TensorRT), Snapdragon (QNN), and more.
Every technique tested on real target hardware. No more guessing which combination is actually fastest.
Set a max accuracy drop (e.g. 1%) and BenchEdge will never ship a build that crosses it. You stay in control.
Download as ONNX, TensorRT engine, TFLite, or CoreML — packaged for your exact runtime and ready to deploy.
We're shipping the core ONNX optimization flow first, then expanding formats, devices, and automation. Here's exactly where we are.
// Built first for ONNX — the format every major framework can export to.
An open CLI for the basics. A cloud platform when you need real hardware benchmarks.
Install the CLI, point it at your model, name a device. That's the whole workflow.
We're onboarding ML engineers shipping models to the edge. Join the waitlist for early access, free optimization credits, and a say in the device roadmap.
We'll email your early-access invite soon. Welcome to BenchEdge.