Manifest Reference
A manifest source file contains a top-level models list. Each item describes
one publishable vindex and the slice shape Skulk can later place on runtime
hardware.
Manifests are source files. The catalog is the merged view built from the
packaged Foxlight manifest plus any operator manifests listed in
skulk-weights.yaml.
Example Source Entry
models:
- key: gemma-3-4b-full-q4-k
source_model: google/gemma-3-4b-it
quant: q4k
tier: smoke
slices:
- full
output_name: gemma-3-4b-it-full-q4-k.vindex
hf_repo: FoxlightAI/gemma-3-4b-it-full-q4-k-vindex
hf_collection: FoxlightAI/vindexes-6a124406dd5fb439c431b051
If that source is loaded under the foxlight namespace, the effective
catalog key is:
foxlight/gemma-3-4b-full-q4-k
If an operator source is loaded under my-org, the same short key becomes:
my-org/gemma-3-4b-full-q4-k
Field Reference
Vindex fields
| Field | Meaning |
|---|---|
key | Stable short selector before the catalog namespace is added |
source_model | Hugging Face model ID passed to larql extract |
quant | Quantization passed to LARQL; currently only q4k is accepted by validation (8-bit models are auto-detected as q8k but rejected — see below) |
tier | Publication group, currently smoke or moe |
slices | LARQL slice mode, currently full or expert-server; this is the runtime placement shape |
output_name | Local vindex directory basename under scratch storage |
hf_repo | Hugging Face repository passed to larql publish |
hf_collection | Optional Hugging Face collection slug updated after publish succeeds |
MTP sidecar fields
These two fields must both be set together or both omitted.
| Field | Meaning |
|---|---|
mtp_source_repo | Hugging Face model ID of the original BF16 checkpoint that contains mtp.* tensor keys |
mtp_sidecar_repo | Hugging Face repository where the bf16 mtp.safetensors will be uploaded |
The MTP heads ship at full precision (bf16, unquantized): they are the
speculative drafter, and one bf16 sidecar serves every quantization of the base
model. There is one sidecar per base model, so mtp_sidecar_repo carries no
quant suffix.
The mtp_source_repo is often different from source_model. source_model is typically an
mlx-converted or community checkpoint; mtp_source_repo must be the original PyTorch BF16 release
because mlx-lm's sanitize() strips mtp.* keys during conversion.
Assistant model field
| Field | Meaning |
|---|---|
assistant_model_repo | Optional Hugging Face model ID (owner/name) of a Gemma-4-style companion assistant used for speculative decoding. Mutually exclusive with the MTP sidecar fields |
A model uses either an MTP sidecar or a companion assistant for speculative
decoding, never both. catalog add writes assistant_model_repo automatically
when the base model has no mtp.* keys but a {model}-assistant companion
exists.
Vision sidecar fields
These two fields must both be set together or both omitted.
| Field | Meaning |
|---|---|
vision_source_repo | Hugging Face model ID whose vision weights and configs are mirrored byte-for-byte |
vision_sidecar_repo | Hugging Face repository where the mirrored vision weights are uploaded |
Example entry with MTP sidecar:
models:
- key: qwen3-6b-full-q4-k
source_model: acme/qwen3-6b-mlx-q4k
quant: q4k
tier: smoke
slices:
- full
output_name: qwen3-6b-full-q4-k.vindex
hf_repo: acme/qwen3-6b-full-q4-k-vindex
mtp_source_repo: Qwen/Qwen3-6B
mtp_sidecar_repo: acme/qwen3-6b-mtp
vision_source_repo: acme/qwen3-6b-vl
vision_sidecar_repo: acme/qwen3-6b-vision
A model that uses a companion assistant instead of an MTP sidecar declares
assistant_model_repo on its own:
models:
- key: gemma-3-4b-full-q4-k
source_model: google/gemma-3-4b-it
quant: q4k
tier: smoke
slices:
- full
output_name: gemma-3-4b-it-full-q4-k.vindex
hf_repo: FoxlightAI/gemma-3-4b-it-full-q4-k-vindex
assistant_model_repo: google/gemma-3-4b-it-assistant
Validation Rules
keymust be lowercase kebab-case and unique within its source- effective catalog keys must be unique after namespaces are applied
source_modelmust look likeowner/namequantcurrently supportsq4konly (ALLOWED_QUANTSis q4k-only). 8-bit models are auto-detected asq8kbycatalog add, but validation rejects them — adding an 8-bit model fails with a quant error until 8-bit support is enabledtiermust besmokeormoeslicesmust be non-empty- each slice must be one of
fullorexpert-server fullcannot be combined with other slicesoutput_namemust be a.vindexbasename and unique in the merged cataloghf_repomust look likeowner/nameand be unique in the merged catalog- operator
hf_repoowners must match the sourcehf_ownerinskulk-weights.yaml hf_collectionmust look likeowner/slug- operator
hf_collectionowners must match the sourcehf_ownerinskulk-weights.yaml mtp_source_repoandmtp_sidecar_repomust look likeowner/namemtp_source_repoandmtp_sidecar_repomust both be set together or both omittedmtp_sidecar_repoowner must match thehf_repoownerassistant_model_repomust look likeowner/nameassistant_model_repois mutually exclusive withmtp_source_repo(a model uses an MTP sidecar or a companion assistant, never both)vision_source_repoandvision_sidecar_repomust look likeowner/namevision_source_repoandvision_sidecar_repomust both be set together or both omittedvision_sidecar_repoowner must match thehf_repoowner
Generated Commands
This entry:
key: gemma-3-4b-full-q4-k
source_model: google/gemma-3-4b-it
quant: q4k
slices:
- full
output_name: gemma-3-4b-it-full-q4-k.vindex
hf_repo: FoxlightAI/gemma-3-4b-it-full-q4-k-vindex
hf_collection: FoxlightAI/vindexes-6a124406dd5fb439c431b051
produces this command shape:
larql extract google/gemma-3-4b-it \
-o .scratch/gemma-3-4b-it-full-q4-k.vindex \
--quant q4k
larql publish .scratch/gemma-3-4b-it-full-q4-k.vindex \
--repo FoxlightAI/gemma-3-4b-it-full-q4-k-vindex \
--slices none
After the LARQL publish command succeeds, the publisher adds the repository to the configured collection using the Hugging Face Hub API.
For slices: [full], the publisher sends --slices none because LARQL treats
that as the complete vindex publish path.
For slices: [expert-server], the output is meant for MoE expert weight
serving from CPU/high-memory LARQL servers instead of forcing those weights into
GPU memory.