Skip to main content

VisionCardConfig

Vision configuration attached to a model card for VLM support.

Populated from the [vision] section of a TOML model card or auto-detected from config.json during card creation.

imageTokenId object

Token id the model uses as the image placeholder in the prompt. Required by the MLX vision path (which splices image embeddings at this token); None is allowed for a llama.cpp-only vision GGUF, whose chat handler inserts image features itself and never reads this. MLX cards always set it (from config.json).

anyOf
integer
modelTypeModeltype (string)

Vision model-type tag (from config.json's vision_config), selecting the image processor (MLX) or chat handler (llama.cpp). Empty when a bare GGUF repo only signals vision via its mmproj projector; the llama.cpp runner then falls back to its general multimodal handler.

Default value:
weightsRepoWeightsrepo (string)

Repo holding the vision-tower weights when separate from the LM; empty if bundled with the main weights.

Default value:
imageToken object

The literal image placeholder string, when distinct from image_token_id.

anyOf
string
processorRepo object

Repo providing the image processor/preprocessor config, if not the main repo.

anyOf
string
boiTokenId object

Begin-of-image token id, for families that bracket image spans.

anyOf
integer
eoiTokenId object

End-of-image token id, for families that bracket image spans.

anyOf
integer
VisionCardConfig
{
"imageTokenId": 0,
"modelType": "",
"weightsRepo": "",
"imageToken": "string",
"processorRepo": "string",
"boiTokenId": 0,
"eoiTokenId": 0
}