Skip to main content

PlacementCardConfig

Hardware/routing constraints the planner reads from a model card (#149).

The only card section the planner consults directly. Defaults describe the current implicit assumption (an MLX model with no extra memory floor), so cards without a [placement] section behave exactly as before.

compatibleBackendsstring[]

Hard constraint: only route to nodes whose advertised backends intersect this set. Making the implicit {"mlx"} explicit is what enables future heterogeneous (llama_cpp / rocm / cuda) routing.

Default value: ["mlx"]
minVramGib object

Hard constraint: planner gates on node available memory when set.

anyOf
number
maxContextTokens object

Soft: caps the placement-time KV budget check (see #145) when set.

anyOf
integer
backendPreferencestring[]

Soft, ordered preference among the node's backend tags (e.g. ("llama_cpp-vulkan", "llama_cpp-rocm")).

Unlike compatible_backends (a hard filter on which nodes are eligible), this only ranks eligible nodes/devices: the planner prefers a node that advertises an earlier-listed tag, and the runner picks the earliest-listed backend the chosen node actually has. The same model runs on any compatible backend, but their performance differs per model, so this captures "fastest on Vulkan, ROCm is an acceptable fallback" while still degrading gracefully to a node that only offers the fallback. Order is significant and preserved; an empty tuple means no preference (use the node's default).

Default value: []
PlacementCardConfig
{
"compatibleBackends": [
"string"
],
"minVramGib": 0,
"maxContextTokens": 0,
"backendPreference": [
"string"
]
}