Inference server frameworks are the ultimate wingmen for machine learning models. They not only provide support and infrastructure for models to shine in a production environment, but also feature load-balancing and caching/batching capabilities, so your models can sit back and relax while the server takes care of all their workload.

We currently have two open-sourced inference server frameworks available with respective use cases:

For high-performance and flexible model serving in a production environment, we have Mosec. It's perfect for building ML model-enabled backend and microservices that can handle large volumes of requests and scale effortlessly.

For quick and simple prototyping of ML model-enabled web UI apps, we have Gradio and Streamlit. Using these frameworks, you could build and deploy your machine learning models in a web application in minutes, and easily share it with other users.

You could also use Other frameworks to deploy your models. All these frameworks could integrate with popular machine learning frameworks such as PyTorch, TensorFlow, and JAX. They also support a wide range of model formats, including ONNX, TensorFlow SavedModel, and PyTorch TorchScript.