In our system, each "Model" points to an executable code repository (repo) and a specific version, identified by the repo's commit SHA. Users also need to specify the GPU requirement memory for executing this Model. Additionally, they have the option to make the Model publicly visible to all users.
When creating a Model, users are required to fill in an "entry_point_function" field. This specifies a filename and function name within the repo, outlining how the Model will be run when users call the API for execution.
This model normally takes around ~2G GPU to run. So we set the required_gpu to 5G and the job can be distributed to some low-GPU workers, such as Nvidia 3060, for lower cost.
When calling the invoke API, users can pass in the input data. If the input is more than just a text prompt, you can define it as a stringfied JSON. In this example, the code can take an image URL as the input, or with a text prompt. It depends on whether the input_text can be jsonfied. This is completely flexible to users.
Example 2 - stable-diffusion-xl-1-0
In this example of stable-difussion-xl-1.0, In our API, the model type is defined to be "text_to_image." Unlike the "text_to_text" model type, which is designed for generating textual content, "text_to_image" is tailored for models that produce large content, such as images.
In the "text_to_text" model, the invoke() function in the code repository directly returns the generated text. However, for "text_to_image" models, the invoke() function should save the generated image to a local file and return the file name.
Once the file name is returned, our service agent will automatically upload the image from the worker machine to our Content Delivery Network (CDN). The URL of the generated content is then returned to the user.
When creating a new Model via our console or API, here are the parameters:
Required Parameters:
name: Model name,it can only consist of a combination of uppercase and lowercase letters and numbers
model_type: Either text_to_text or text_to_image or text_to_blob
model_code_repo_url: Model repo URL
model_code_version: Commit SHA to use
Optional Parameters:
entry_point_function: The executing function and its filename (default is model_invoke.py/invoke)
runtime_docker_image: The runtime Docker environment (only nvidia/cuda:11.6.2-runtime-ubuntu20.04 is supported currently)
example_input: An example input for the invoke function, helpful for understanding what parameters to pass, especially for public models.
description: Description of the model within 1000 words.
visibility: Either public or private. If public, the model will be visible to all users and others can create InferenceJobs for it. Otherwise, only the owner can create InferenceJobs for it.
required_gpu_memory: Maximum GPU requirements. It's advisable to set it to less than 24GB as most workers will likely have personal computing capabilities.
{
"model_code_repo_url": "https://github.com/ClustroAI/falcon7b-instruct",
"created_at": "Tue, 18 Jul 2023 04:57:25 GMT",
"description": "Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license. Reference - https://huggingface.co/tiiuae/falcon-7b-instruct",
"example_input": "{\"input\": \"{\\\"prompt\\\": \\\"This is an essay about the universe\\\", \\\"max_length\\\": \\\"100\\\", \\\"do_sample\\\": \\\"False\\\"}\"}",
"id": "e6c9ae22-a14c-4ed6-a577-c4000e1b4580",
"entry_point_function": "model_invoke.py/invoke",
"model_type": "text_to_text",
"name": "falcon7b-instruct",
"runtime_docker_image": "nvidia/cuda:11.6.2-runtime-ubuntu20.04",
"updated_at": "Tue, 29 Aug 2023 05:00:02 GMT",
"user_id": "aab9ff07-3c0b-4584-b5ec-f1a5b288f6e3",
"model_code_version": "842b0f5934f7cba93405eec1e429bed1e5f2fbb5",
"visibility": "public",
"default_inference_job": "4324fb1c-52b7-47da-babb-b7b1b7fe18rg",
"username": "clustrodemousername",
"model_image_url":"https://cdn.clustro.ai/static/blip-large.png"
}