VisionCardConfig

Vision configuration attached to a model card for VLM support.

Populated from the [vision] section of a TOML model card or auto-detected from config.json during card creation.

imageTokenId object

Token id the model uses as the image placeholder in the prompt. Required by the MLX vision path (which splices image embeddings at this token); None is allowed for a llama.cpp-only vision GGUF, whose chat handler inserts image features itself and never reads this. MLX cards always set it (from config.json).

anyOf

integer
null

integer

modelTypeModeltype (string)

Vision model-type tag (from config.json's vision_config), selecting the image processor (MLX) or chat handler (llama.cpp). Empty when a bare GGUF repo only signals vision via its mmproj projector; the llama.cpp runner then falls back to its general multimodal handler.

Default value:

weightsRepoWeightsrepo (string)

Repo holding the vision-tower weights when separate from the LM; empty if bundled with the main weights.

Default value:

imageToken object

The literal image placeholder string, when distinct from image_token_id.

anyOf

string
null

string

processorRepo object

Repo providing the image processor/preprocessor config, if not the main repo.

anyOf

string
null

string

boiTokenId object

Begin-of-image token id, for families that bracket image spans.

anyOf

integer
null

integer

eoiTokenId object

End-of-image token id, for families that bracket image spans.

anyOf

integer
null

integer

VisionCardConfig
{
  "imageTokenId": 0,
  "modelType": "",
  "weightsRepo": "",
  "imageToken": "string",
  "processorRepo": "string",
  "boiTokenId": 0,
  "eoiTokenId": 0
}