PlacementCardConfig
Hardware/routing constraints the planner reads from a model card (#149).
The only card section the planner consults directly. Defaults describe the
current implicit assumption (an MLX model with no extra memory floor), so
cards without a [placement] section behave exactly as before.
Hard constraint: only route to nodes whose advertised backends intersect
this set. Making the implicit {"mlx"} explicit is what enables future
heterogeneous (llama_cpp / rocm / cuda) routing.
["mlx"]minVramGib object
Hard constraint: planner gates on node available memory when set.
- number
- null
maxContextTokens object
Soft: caps the placement-time KV budget check (see #145) when set.
- integer
- null
Soft, ordered preference among the node's backend tags (e.g.
("llama_cpp-vulkan", "llama_cpp-rocm")).
Unlike compatible_backends (a hard filter on which nodes are eligible),
this only ranks eligible nodes/devices: the planner prefers a node that
advertises an earlier-listed tag, and the runner picks the earliest-listed
backend the chosen node actually has. The same model runs on any compatible
backend, but their performance differs per model, so this captures "fastest
on Vulkan, ROCm is an acceptable fallback" while still degrading gracefully
to a node that only offers the fallback. Order is significant and preserved;
an empty tuple means no preference (use the node's default).
[]{
"compatibleBackends": [
"string"
],
"minVramGib": 0,
"maxContextTokens": 0,
"backendPreference": [
"string"
]
}