Before you create the Deployment, you need to first build the inference server and push it to an OCI-compliant registry (e.g. Docker Hub, GitHub Container Registry, etc.).

You could also use our pre-built Templates directly. These templates could help simplify and expedite the deployment process.


Currently we support four inference server frameworks:

  • Mosec: a high-performance and flexible model serving framework for building ML model-enabled backend and microservices.
  • Streamlit: a framework for building ML model-enabled web apps.
  • Gradio: a simple and flexible framework for building ML model-enabled web apps.
  • Other: you could also use your own frameworks to deploy your models.

Here we take Mosec as an example to show how to build a Docker image for the Stable Diffusion.

Building an inference server based on our inference framework Mosec could be straightforward. You will need to provide three key components:

  • A file: This file contains the code for making predictions.
  • A requirements.txt file: This file lists all the dependencies required for the server code to run.
  • A Dockerfile or a simpler build.envd (opens in a new tab): This file contains instructions for building a Docker image that encapsulates the server code and its dependencies.

Here is an template modelz-template-stable-diffusion (opens in a new tab):

In the file, you need to define a class that inherits from mosec.Worker and implements the forward method. The forward method takes a list of inputs and returns a list of outputs with dynamic batching. You could get more details from the Mosec page.

from io import BytesIO
from typing import List
import torch  # type: ignore
from diffusers import StableDiffusionPipeline  # type: ignore
from mosec import Server, Worker, get_logger
from mosec.mixin import MsgpackMixin
logger = get_logger()
class StableDiffusion(MsgpackMixin, Worker):
    def __init__(self):
        self.pipe = StableDiffusionPipeline.from_pretrained(
            "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
        device = "cuda" if torch.cuda.is_available() else "cpu"
        self.pipe =
        self.example = ["useless example prompt"] * 4  # warmup (bs=4)
    def forward(self, data: List[str]) -> List[memoryview]:
        logger.debug("generate images for %s", data)
        res = self.pipe(data)
        logger.debug("NSFW: %s", res[1])
        images = []
        for img in res[0]:
            dummy_file = BytesIO()
  , format="JPEG")
        return images
if __name__ == "__main__":
    server = Server()
    server.append_worker(StableDiffusion, num=1, max_batch_size=4)


In the requirements.txt file, you need to list all the dependencies required for the server code to run.

torch --extra-index-url


In the Dockerfile, you need to define the instructions for building a Docker image that encapsulates the server code and its dependencies.

In most cases, you could use the following template:

ARG base=nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04
FROM ${base}
ENV PATH /opt/conda/bin:$PATH
ARG CONDA_VERSION=py310_22.11.1-1
RUN set -x && \
    UNAME_M="$(uname -m)" && \
    if [ "${UNAME_M}" = "x86_64" ]; then \
        SHA256SUM="00938c3534750a0e4069499baf8f4e6dc1c2e471c86a59caa0dd03f4a9269db6"; \
    elif [ "${UNAME_M}" = "s390x" ]; then \
        SHA256SUM="a150511e7fd19d07b770f278fb5dd2df4bc24a8f55f06d6274774f209a36c766"; \
    elif [ "${UNAME_M}" = "aarch64" ]; then \
        SHA256SUM="48a96df9ff56f7421b6dd7f9f71d548023847ba918c3826059918c08326c2017"; \
    elif [ "${UNAME_M}" = "ppc64le" ]; then \
        SHA256SUM="4c86c3383bb27b44f7059336c3a46c34922df42824577b93eadecefbf7423836"; \
    fi && \
    wget "${MINICONDA_URL}" -O -q && \
    echo "${SHA256SUM}" > shasum && \
    if [ "${CONDA_VERSION}" != "latest" ]; then sha256sum --check --status shasum; fi && \
    mkdir -p /opt && \
    bash -b -p /opt/conda && \
    rm shasum && \
    ln -s /opt/conda/etc/profile.d/ /etc/profile.d/ && \
    echo ". /opt/conda/etc/profile.d/" >> ~/.bashrc && \
    echo "conda activate base" >> ~/.bashrc && \
    find /opt/conda/ -follow -type f -name '*.a' -delete && \
    find /opt/conda/ -follow -type f -name '*' -delete && \
    /opt/conda/bin/conda clean -afy
RUN conda create -n envd python=3.9
ENV ENVD_PREFIX=/opt/conda/envs/envd/bin
RUN update-alternatives --install /usr/bin/python python ${ENVD_PREFIX}/python 1 && \
    update-alternatives --install /usr/bin/python3 python3 ${ENVD_PREFIX}/python3 1 && \
    update-alternatives --install /usr/bin/pip pip ${ENVD_PREFIX}/pip 1 && \
    update-alternatives --install /usr/bin/pip3 pip3 ${ENVD_PREFIX}/pip3 1
COPY requirements.txt /
RUN pip install -r requirements.txt
RUN mkdir -p /workspace
COPY workspace/
WORKDIR workspace
RUN python --dry-run
ENTRYPOINT [ "python", "" ]


On the other hand, a build.envd (opens in a new tab) is a simplified alternative to a Dockerfile. It provides python-based interfaces that contains configuration settings for building a image.

It is easier to use than a Dockerfile as it involves specifying only the dependencies of your machine learning model, not the instructions for CUDA, Python, and other system-level dependencies.

# syntax=v1
def basic():
def build():
    io.copy("", "/")
    run(["python --dry-run"])
    config.entrypoint(["python", ""])

Pushing the image to the registry

After building the image, you can push it to the registry.

docker push <your-image-name>:<your-image-tag>
# or
envd build --output type=image,name=<your-image-name>:<your-image-tag>,push=true

Then, you could deploy the inference server to the ModelZ.