InferenceJob

An InferenceJob serves as the API endpoint for a specific Model. When you create an InferenceJob, you must provide the model_id. This action captures a snapshot of the current Model configurations, including parameters like model_code_repo_url, model_code_version, and model_required_gpu_memory. and model_entry_point_function

You also have the option to set the visibility of the InferenceJob to either private or public, determining whether it can be invoked by you alone or by anyone.

If you don't have a dedicated set of workers, simply configure the job_assignment_type to "auto." Besides You can also specify the minimum, maximum, and initial desired number of workers. Our system will then automatically allocate the most suitable backend workers to fulfill the job's requirements.

This refined version aims to provide a clearer and more polished explanation of what an InferenceJob is and how it functions within your system.

Once you create a job, ClustroAI first uses its managed worker to take on the job for initial deployment, usability, and security verification. Once the job passes these checks, it is distributed to the general worker fleet, enabling you to take advantage of our low-cost inference API service.

You can run an InferenceJob in two ways: synchronously (sync) and asynchronously (async).

  • Synchronous Task: In our previous example with SD-XL, a synchronous task was made using run_sync. The call is blocking and waits for the task to complete before returning the result.

  • curl -X POST https://api.clustro.ai/v1/inference_jobs/5ee9fb5a-3cfc-47b4-abc3-aa8411b41b21/run_sync
  • Asynchronous task: Alternatively, you can call invoke, which returns an task_id immediately. You can then use this ID to check the status of the task.

  • curl -X POST https://api.clustro.ai/v1/inference_jobs/5ee9fb5a-3cfc-47b4-abc3-aa8411b41b21/run

When creating an InferenceJob, you have the option to set the current InferenceJob as the default for the provided model by parameter set_as_model_default is true. With this setting applied, you can then run tasks directly using the model id or model name.

All tasks are queued and processed by workers in a sequential order.

When creating an InferenceJob via our console or API, here are the parameters:

Required Parameters:

  • name: Inference Job name,it can only consist of a combination of uppercase and lowercase letters and numbers

  • model_id: The ID of the snapshot of the model

Optional Parameters:

  • description: Description of the job within 1000 words.

  • min_workers: The minimum number of workers.

  • max_workers: The maximum number of workers.

  • desired_workers: The desired number of workers.

  • set_as_model_default: Set the current InferenceJob as the default for the provided model

Last updated