Skip to main content

Model Management

Endpoints for loading, downloading, and unloading models on the node.


Load Model

POST /v1/app/models/load

Load a downloaded model into GPU memory for inference.

Authentication

Optional. Required when allow_network_access is enabled.

Request Body

FieldTypeRequiredDescription
modelIDstringYesID of the model to load
{
"modelID": "llama-3.1-8b-q4"
}

Response

{
"status": "loaded",
"modelID": "llama-3.1-8b-q4"
}

Example

curl -X POST http://localhost:11435/v1/app/models/load \
-H "Content-Type: application/json" \
-d '{"modelID": "llama-3.1-8b-q4"}'

Download Model

POST /v1/app/models/download

Download a model to local storage. The model must be downloaded before it can be loaded.

Authentication

Optional. Required when allow_network_access is enabled.

Request Body

FieldTypeRequiredDescription
modelIDstringYesID of the model to download
{
"modelID": "llama-3.1-8b-q4"
}

Response

{
"status": "downloading",
"modelID": "llama-3.1-8b-q4"
}

Example

curl -X POST http://localhost:11435/v1/app/models/download \
-H "Content-Type: application/json" \
-d '{"modelID": "llama-3.1-8b-q4"}'

Unload Model

POST /v1/app/models/unload

Unload the currently loaded model from GPU memory.

Authentication

Optional. Required when allow_network_access is enabled.

Request Body

None required.

Response

{
"status": "unloaded"
}

Example

curl -X POST http://localhost:11435/v1/app/models/unload