Skip to main content

Quickstart

This guide gets you from a clean checkout to your first publisher dry-runs — one for a LARQL vindex, one for an MTP sidecar.

If you are new to Skulk and LARQL, read How Skulk Works first. It explains the cluster architecture, what a vindex is, what MTP sidecars are for, and why the publisher exists before you run any commands.

If you already have that context, keep the core model in mind:

  • SWP publishes three artifact types: LARQL vindexes, MTP sidecars, and vision sidecars.
  • A vindex is a vector-index directory LARQL can query, run, and publish to let Skulk split weight-serving work across GPU nodes and CPU/high-memory servers.
  • An MTP sidecar is a full-precision (bf16, unquantized) file (mtp.safetensors) extracted from the BF16 checkpoint for models with native multi-token prediction heads. One sidecar per base model serves every quantization.
  • A vision sidecar is a byte-for-byte mirror of a VLM's vision-encoder weights, for mlx-community quants that omit them.
  • Some models (e.g. Gemma 4) skip embedded MTP heads entirely and ship a companion {model}-assistant drafter model; SWP records the pairing in the catalog instead of extracting tensors.
  • Every real publish also uploads a self-describing README.md model card and files the artifact into its per-type Hugging Face collection.

A dry-run is the best first command because it prints the full publication plan — source model, output path, target repo, commands — without touching disk or network.

Requirements

  • Python 3.11 or newer
  • uv for dependency management
  • this repository checked out locally
  • Node.js 20 or newer only if you are editing the documentation site
  • LARQL and a Hugging Face token when you are ready to publish for real

1. Install The CLI

uv sync --extra dev

This installs the skulk-weights command from the current checkout. Run it via uv run skulk-weights .... Add --extra mtp for MTP/vision real-publish support and --extra ui for the GUI.

2. Validate The Catalog

uv run skulk-weights catalog validate
uv run skulk-weights catalog sources
uv run skulk-weights catalog list --tier smoke

The Foxlight catalog is included automatically. The smoke tier contains the smaller entries that are safest for first publication tests. Keys are namespaced, so Foxlight entries begin with foxlight/.

To add your own catalog later, create a starter config:

uv run skulk-weights catalog init

Then add a source file under your own namespace and run commands with --config skulk-weights.yaml. The built-in Foxlight entries are still included.

3. Check Your Machine

uv run skulk-weights doctor

The doctor command checks the local Python environment, scratch directory, and catalog. Use the stricter publishing checks on the machine that will actually run LARQL:

uv run skulk-weights doctor --publish

4. Dry-Run One Vindex

uv run skulk-weights publish --model foxlight/gemma-3-4b-full-q4-k --artifact vindex --dry-run

You should see a summary like:

model key: foxlight/gemma-3-4b-full-q4-k
artifact: vindex
source model: google/gemma-3-4b-it
output path: .scratch/gemma-3-4b-it-full-q4-k.vindex
target repo: hf://FoxlightAI/gemma-3-4b-it-full-q4-k-vindex
vindex collection: https://huggingface.co/collections/FoxlightAI/vindexes-6a124406dd5fb439c431b051
extract command: larql extract ...
publish command: larql publish ...

That output is the contract. If the source model, output path, slice mode, target repository, or collection is wrong, fix the catalog source before publishing.

One thing that looks surprising: entries with slices: [full] show --slices none in the generated larql publish command. That is correct — LARQL uses none to mean "publish the complete vindex." The catalog field is full; the LARQL flag is none. They refer to the same thing.

5. Dry-Run One MTP Sidecar

For catalog entries that have MTP fields configured, dry-run the sidecar step separately to verify the source repo, sidecar repo, precision, and output path before any download starts:

uv run skulk-weights publish --model my-org/my-model --artifact mtp --dry-run

--artifact also accepts vision (mirror a VLM's vision encoder) and all (every configured artifact for the entry):

uv run skulk-weights publish --model my-org/my-vlm --artifact vision --dry-run

You should see something like:

model key: my-org/my-model
artifact: mtp
mtp source repo: hf://Qwen/Qwen3-6-7B
mtp sidecar repo: hf://my-org/qwen3-6-7b-mtp/mtp.safetensors
mtp precision: bf16 (unquantized)
mtp output path: .scratch/my-org--qwen3-6-7b-mtp-mtp.safetensors

If the built-in Foxlight entries do not have MTP configured yet, the output will say mtp step: not configured for this entry. Refer to the MTP sidecar guide to add an MTP-capable catalog entry, or the Vision sidecar guide for vision encoders.

6. Go Deeper

How Skulk Works explains the end-to-end cluster architecture, why the vindex format exists, and what MTP sidecars enable. MTP Sidecar covers the full extraction workflow, catalog entry format, and troubleshooting for MTP publication. Skulk, LARQL, and Vindexes covers vindex structure and extraction levels in more detail.