Limits and Limitations

This reference covers a list of all the limits and limitations that apply on ModelZ.

Please contact us on Discord (opens in a new tab) if you have any questions about the current limits.

General limits

The maximum replicas per deployment is set to 5.

Please contact us on Discord (opens in a new tab) or via email if you need more.


Startup time

The maximum startup time is 600 seconds. The inference will fail if the startup time exceeds 600 seconds.

Inference time

The maximum inference time per request is 60 seconds.

Request body size

The maximum request body size is 5MB.



Currently, we only support public images. Private images are not supported yet.


We only support public Huggingface models if you are using the Huggingface Hub.


At the moment, our system is designed to display only the most recent 30 minutes of logs. However, we are actively working on expanding this feature to include a longer historical range of logs.