Skip to main content

First Publish

Your first real publish should prove the whole path with one small vindex: catalog entry, runner, LARQL, scratch storage, Hugging Face token, upload, and collection update.

Use a smoke entry first. It gives you the same workflow shape as larger vindexes with less disk and time risk. Once the path works, larger full and expert-server entries can be published for the real cost goal: keeping weight-heavy model state off expensive GPU memory where CPU/high-memory LARQL servers can host it.

1. Validate The Catalog

uv run skulk-weights catalog validate

This proves the effective catalog is structurally safe before the runner starts.

2. Check Publication Prerequisites

uv run skulk-weights doctor --publish

This checks the pieces needed for a real vindex publish: larql, PyYAML, HF_TOKEN, huggingface_hub, scratch storage, and the catalog.

It does not check numpy or safetensors, so a passing doctor --publish does not guarantee MTP or vision real-publish readiness. Those artifacts need the mtp extras (uv sync --extra mtp). MTP extraction is pure-numpy and cross-platform — no mlx and no macOS Apple Silicon host is required. See the MTP sidecar and Vision sidecar guides.

3. Review The Dry-Run

uv run skulk-weights publish --model foxlight/gemma-3-4b-full-q4-k --dry-run

Read the source model, output path, target repository, collection, and slice mode. The dry-run should match the vindex you intend to publish and the runtime role it is supposed to support.

4. Publish

export HF_TOKEN=...
export SKULK_WEIGHTS_SCRATCH=/fast/scratch/skulk-weights
uv run skulk-weights publish --model foxlight/gemma-3-4b-full-q4-k

The command refuses to overwrite an existing output path. Use --force only when you intentionally want to replace a local extraction output.

Alongside the vindex, the publish uploads a self-describing README.md model card to the target repo — the source model's license inherited unchanged, plus a Foxlight provenance block pinning the source SHA — and files the repo into the Vindexes Hugging Face collection.

5. Record The Result

After publication, record the catalog key, source model, target repository, collection, slice mode, and runner used. That gives Skulk operators a concrete vindex to inspect when they start assigning GPU inference nodes and CPU/high-memory LARQL servers.