Skulk Model Store
The model store is one of the biggest additions Skulk makes on top of upstream EXO.
In a normal cluster without a model store, each node may need to download model data for itself. With the model store enabled, one node becomes the shared store host and other nodes stage from it over the LAN.
Why You Would Use It
Use the model store when:
- you have more than one node
- your models are large
- you want fewer repeated downloads
- you want a cleaner offline story after the first download
- you want model files to live on a dedicated large disk or volume
What Changes When It Is Enabled
Without the model store:
- nodes download model data independently
- cold starts can be slower across the cluster
- repeated downloads are more common
With the model store:
- one node hosts the shared model store
- other nodes stage needed files from that host
- Skulk keeps the same cluster and inference architecture, but changes where model artifacts come from
GGUF repositories download only the pinned quantization
A GGUF repository often ships several quantizations of the same model (for
example Q4_K_M, Q5_K_M, Q8_0, bf16). The store downloads only the
quantization a model card pins (its gguf_file), plus the multimodal projector
for a vision model, rather than every quant in the repository. This keeps a
single-quant download to roughly the size of that one file instead of the whole
repo.
The store host advertises a routable address
The store host broadcasts the address other nodes use to reach it. Even when you
configure store_host as a hostname, the store host resolves and advertises its
own best routable IPv4 (a private LAN address is preferred). This avoids a
failure mode on a Thunderbolt-meshed fleet, where a bare hostname could resolve
through mDNS to a link-local Thunderbolt address (169.254.x) that a peer
without a direct Thunderbolt link cannot reach, even though the LAN path works.
An operator-supplied routable IP in store_http_host is still honored as-is.
What Does Not Change
- the libp2p mesh, election, master, and worker model stay the same
- the main Skulk API stays the same
- the dashboard remains your main control surface
- single-node Skulk still works fine without the model store
Before You Start
Make sure:
- all nodes are running the same Skulk build
- you know which machine should be the store host
- that machine has enough storage for the models you want to share
- the chosen
store_pathis mounted and writable
The store server uses port 58080 by default.
Recommended Setup: Dashboard First
This is the simplest path for most people.
- Start Skulk on all nodes with
uv run skulk. - Open the dashboard on the node you want to become the store host.
- Go to Settings.
- Enable the store host toggle.
- Choose the store path.
- Save the config.
- Restart Skulk on all nodes if the dashboard tells you a restart is required.
After that, use the dashboard or API normally. When models are available in the store, worker nodes stage from the store host instead of downloading independently.
Manual Setup with skulk.yaml
If you prefer to configure the model store manually, put the same skulk.yaml file on each node.
Minimal example:
model_store:
enabled: true
store_host: mac-studio-1
store_path: /Volumes/ModelStore/models
For most users:
store_hostshould be the hostname of the store machinestore_pathshould be an absolute path on that host
Example Full Configuration
model_store:
enabled: true
store_host: mac-studio-1
store_port: 58080
store_path: /Volumes/ModelStore/models
download:
allow_hf_fallback: true
staging:
enabled: true
node_cache_path: ~/.skulk/staging
cleanup_on_deactivate: false
node_overrides:
mac-studio-1:
staging:
node_cache_path: /Volumes/ModelStore/models
cleanup_on_deactivate: false
How to Think About It
There are two important paths:
store_path: the shared source of truth on the store hostnode_cache_path: the local staging area where a node prepares files before loading them
For worker nodes, node_cache_path is usually a fast local path such as ~/.skulk/staging.
For the store host, you often point node_cache_path at the same directory as store_path so the store host can load directly from the shared volume without making another copy.
Important Fields
model_store.enabled
Turns the model store on or off without deleting the config file.
model_store.store_host
The hostname or node ID of the store host.
For most users, hostname is the easiest and most reliable choice.
model_store.store_port
HTTP port used for store transfers.
Default: 58080
model_store.store_path
Absolute path on the store host where shared models live.
model_store.download.allow_hf_fallback
Controls what happens if a requested model is not already in the store.
| Value | Behavior |
|---|---|
true | Fall back to Hugging Face download when needed |
false | Fail instead of downloading from Hugging Face |
Use false if you want stricter offline or air-gapped behavior.
model_store.staging.node_cache_path
Where a node stages files before loading them.
model_store.staging.cleanup_on_deactivate
If true, staged files are cleaned up when instances are shut down. The
recommended default is false, which keeps the local staging cache warm so
large models do not need to be copied from the store again on every placement.
Use POST /store/purge-staging when you intentionally want to reclaim disk
space.
Typical Flow
First time a model is needed
If the model is not already in the store and fallback is enabled:
- Skulk requests the model.
- The store-aware download path checks the store.
- If the model is missing, Skulk falls back to Hugging Face.
- The model lands in the appropriate local or store-managed path.
Later requests
Once the model exists in the store:
- worker nodes ask the store host for the needed files
- files are staged locally
- inference loads from the staged path
Useful Store Endpoints
These are exposed through the main Skulk API:
GET /store/healthGET /store/registryGET /store/downloadsPOST /store/models/{model_id}/downloadGET /store/models/{model_id}/download/statusDELETE /store/models/{model_id}POST /store/purge-stagingPOST /store/models/{model_id}/optimize
The dashboard's Store Registry view combines these registry entries with model
metadata so it can show capability-derived tags for downloaded models. Today
that includes vision, thinking, embedding, tensor, and optiq when the
underlying model card exposes enough metadata for Skulk to derive them.
Common meanings:
503 Store not configured: the cluster is not configured to use a model store503 Store unreachable: the store is configured, but the API cannot reach it404: the model or job does not exist409: a conflicting operation is already in progress
Troubleshooting
The store host seems unreachable
Check:
- that the store host is running
- that
store_hostmatches the real hostname - that port
58080is reachable on your LAN
Useful check:
curl http://STORE_HOST:58080/health
The model is on disk but does not appear in the store registry
Check:
- that the model is in the configured
store_path - that the registry knows about it
- that the dashboard Store Registry view shows it
Useful check:
curl http://localhost:52415/store/registry
Nodes still download from Hugging Face
Check:
- whether the model is already present in the store
- whether
allow_hf_fallbackis stilltrue - whether the store host is reachable from worker nodes
A multimodal model is in the store but the UI does not show vision support
Check:
- that the model card includes the
visioncapability - that the dashboard is running a current Skulk build
- that
GET /v1/modelsreturns avisiontag for that model
Remember that store registration only tracks artifacts and metadata. Actual image understanding still depends on launching the model and sending a multimodal request through the chat APIs.
Placements are slow even though the model is already in the store
Check whether cleanup_on_deactivate is enabled. If it is true, each model
deactivation removes the local staged copy, so the next placement must copy the
model from the store host again before MLX can load it. Set it to false for
normal clusters and purge staging caches explicitly when disk pressure requires
it.
Staged files are not being cleaned up
This is usually expected. Skulk keeps staged files by default so repeated
placements are warm. If you need disk space back, either enable
cleanup_on_deactivate for that node or use the model-store purge action.
Good Defaults for Most Clusters
- use the dashboard to manage the store config
- choose one machine with the most storage as the store host
- keep
allow_hf_fallback: truewhile you are getting started - use a fast local staging path on worker nodes
- point the store host's
node_cache_pathat the store itself