
ONNX Runtime is a high-performance inference and training engine for machine learning models in the Open Neural Network Exchange (ONNX) format. Developed by Microsoft and released under the MIT license, it enables developers to deploy models trained in PyTorch, TensorFlow, scikit-learn, and other frameworks across diverse hardware targets with minimal code changes.
The runtime abstracts hardware complexity through a pluggable execution provider architecture. CPU inference uses optimized kernels from the Intel Math Kernel Library or Arm Compute Library. GPU acceleration is available via CUDA, DirectML on Windows, and TensorRT on NVIDIA hardware. Edge and mobile deployments leverage CoreML on Apple devices, the Qualcomm Neural Network SDK on Android, and WebGPU for browser-based inference.
ONNX Runtime powers production workloads at scale for companies including Adobe, Autodesk, Hugging Face, NVIDIA, Oracle, and Teradata. It is the default inference engine for ONNX models in Azure Machine Learning and integrates with the Olive model optimization toolchain for quantization, compression, and hardware-specific tuning.
ONNX Runtime natively supports models exported from PyTorch via the ONNX format. PyTorch models can be converted to ONNX using torch.onnx.export() and then optimized and deployed through ONNX Runtime for production inference across diverse hardware targets.
ONNX Runtime supports TensorFlow models converted to ONNX format through the tf2onnx converter. This enables TensorFlow-trained models to leverage ONNX Runtime's hardware acceleration and cross-platform deployment capabilities.
+2 more