Skip to main content

NodeDiagnostics

Read-only diagnostic bundle for one Skulk node.

generatedAtGeneratedat (string)required

UTC timestamp when this bundle was built.

runtime objectrequired

Runtime identity and config.

nodeIdNodeid (string)required

Local node ID.

hostnameHostname (string)required

Local hostname.

friendlyName object

Friendly node name from gathered identity data.

anyOf
string
isMasterIsmaster (boolean)required

Whether this node is the current master.

masterNodeId object

Current master node ID, when known.

anyOf
string
cwdCwd (string)required

Current working directory of the API process.

configPathConfigpath (string)required

Config path resolved by this API process.

configFileExistsConfigfileexists (boolean)required

Whether the resolved config path exists from this process cwd.

skulkVersionSkulkversion (string)required

Installed Skulk package version.

skulkCommitSkulkcommit (string)required

Git commit reported by node identity.

libp2PNamespace object

Configured libp2p namespace environment value, if set.

anyOf
string
pythonUnbufferedPythonunbuffered (boolean)required

Whether PYTHONUNBUFFERED is enabled for this process.

tracingEnabledTracingenabled (boolean)required

Current cluster runtime tracing state as seen by this API node.

structuredLoggingConfiguredStructuredloggingconfigured (boolean)required

Whether config enables centralized structured logging.

loggingIngestUrl object

Configured centralized logging ingest URL, when present.

anyOf
string
identity object

Last gathered node identity data.

anyOf
modelIdModelid (string)
Default value: Unknown
chipIdChipid (string)
Default value: Unknown
friendlyNameFriendlyname (string)
Default value: Unknown
osVersionOsversion (string)
Default value: Unknown
osBuildVersionOsbuildversion (string)
Default value: Unknown
skulkVersionSkulkversion (string)
Default value: Unknown
skulkCommitSkulkcommit (string)
Default value: Unknown
resources objectrequired

Resource readings.

gatheredMemory object

Last event-sourced memory reading for this node.

anyOf
ramTotal objectrequired
inBytesInbytes (integer)
Default value: 0
ramAvailable objectrequired
inBytesInbytes (integer)
Default value: 0
swapTotal objectrequired
inBytesInbytes (integer)
Default value: 0
swapAvailable objectrequired
inBytesInbytes (integer)
Default value: 0
currentMemory object

Live memory reading from the API process.

anyOf
ramTotal objectrequired
inBytesInbytes (integer)
Default value: 0
ramAvailable objectrequired
inBytesInbytes (integer)
Default value: 0
swapTotal objectrequired
inBytesInbytes (integer)
Default value: 0
swapAvailable objectrequired
inBytesInbytes (integer)
Default value: 0
currentWired object

Live OS-level wired (unpageable) memory in use (macOS only). Read locally on this endpoint — deliberately NOT on the gossiped MemoryUsage, whose schema rides extra=forbid events — to detect leaked wired memory after an abnormal Metal termination (#239).

anyOf
inBytesInbytes (integer)
Default value: 0
disk object

Last event-sourced disk reading for this node.

anyOf
total objectrequired
inBytesInbytes (integer)
Default value: 0
available objectrequired
inBytesInbytes (integer)
Default value: 0
system object

Last event-sourced system performance reading.

anyOf
gpuUsageGpuusage (number)
Default value: 0
tempTemp (number)
Default value: 0
sysPowerSyspower (number)
Default value: 0
pcpuUsagePcpuusage (number)
Default value: 0
ecpuUsageEcpuusage (number)
Default value: 0
accelerator object
anyOf
vendorVendor (string)

Possible values: [apple, amd, nvidia, intel, cpu, unknown]

Default value: unknown
nameName (string)
Default value: Unknown
utilizationRatio object
anyOf
number
vramTotalBytes object
anyOf
integer
vramUsedBytes object
anyOf
integer
gttTotalBytes object

GPU-mappable host (GTT) memory, for unified-memory APUs (e.g. AMD Strix Halo). On such a node the GPU addresses system RAM beyond the BIOS VRAM carve-out through GTT, so the usable GPU pool is far larger than vram_total_bytes (placement uses this to admit big models on a UMA node). None on discrete GPUs / collectors that do not report it.

anyOf
integer
powerWatts object
anyOf
number
temperatureCelsius object
anyOf
number
clockMhz object
anyOf
integer
network object

Last event-sourced network interface reading.

anyOf
interfaces object[]
  • Array [
  • nameName (string)required
    ipAddressIpaddress (string)required
    interfaceTypeInterfacetype (string)

    Possible values: [wifi, ethernet, maybe_ethernet, thunderbolt, unknown]

    Default value: unknown
  • ]
  • processes object[]

    Relevant local OS processes.

  • Array [
  • pidPid (integer)required

    Operating-system process ID.

    parentPid object

    Operating-system parent process ID, when visible.

    anyOf
    integer
    roleRole (string)required

    Best-effort Skulk role inferred from process lineage and command line.

    Possible values: [skulk, runner, vector, python, other]

    commandCommand (string)required

    Joined process command line.

    status object

    Operating-system process status such as running or sleeping.

    anyOf
    string
    cpuPercent object

    Recent CPU percentage reported by psutil.

    anyOf
    number
    memoryPercent object

    Percent of physical memory used by this process.

    anyOf
    number
    rss object

    Resident set size for this process, when available.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    elapsedSeconds object

    Seconds since process creation, when available.

    anyOf
    number
    isChildOfSkulkIschildofskulk (boolean)

    Whether this process is in the current Skulk API process tree.

    Default value: false
  • ]
  • supervisorRunners object[]

    Live local runner-supervisor diagnostics.

  • Array [
  • runnerIdRunnerid (string)required

    Runner ID.

    instanceIdInstanceid (string)required

    Instance ID.

    nodeIdNodeid (string)required

    Node ID that owns this runner.

    modelIdModelid (string)required

    Model assigned to this runner.

    deviceRankDevicerank (integer)required

    Distributed device rank.

    worldSizeWorldsize (integer)required

    Distributed world size.

    startLayerStartlayer (integer)required

    Inclusive first model layer on this shard.

    endLayerEndlayer (integer)required

    Exclusive final model layer on this shard.

    nLayersNlayers (integer)required

    Total number of model layers.

    pid object

    Runner subprocess PID, when started.

    anyOf
    integer
    processAliveProcessalive (boolean)required

    Whether the runner subprocess is alive.

    exitCode object

    Runner subprocess exit code, when exited.

    anyOf
    integer
    statusKindStatuskind (string)required

    Current runner status variant.

    statusSinceStatussince (string)required

    UTC timestamp for the current status.

    secondsInStatusSecondsinstatus (number)required

    Wall-clock seconds spent in the current runner status.

    phasePhase (string)required

    Last runner phase reported.

    Possible values: [created, idle, connect_group, load_model, warmup, task_submission, task_agreement, prompt_build, vision_preprocess, kv_cache_lookup, prefill_barrier, prefill_pipeline, prefill_stream, decode_barrier, decode_wait_first_token, decode_stream, parser, cancel_requested, cancel_observed, completion, error, shutdown_cleanup]

    phaseStartedAtPhasestartedat (string)required

    UTC timestamp when the current phase started.

    secondsInPhaseSecondsinphase (number)required

    Wall-clock seconds spent in the current phase.

    lastProgressAt object

    UTC timestamp for the last flight-recorder update.

    anyOf
    string
    activeTaskId object

    Task ID associated with the current phase, when known.

    anyOf
    string
    activeCommandId object

    Command ID associated with the current phase, when known.

    anyOf
    string
    phaseDetail object

    Compact human-readable detail for the current phase.

    anyOf
    string
    lastMlxMemory object

    Most recent MLX memory snapshot reported by the runner.

    anyOf
    generatedAtGeneratedat (string)required

    UTC timestamp when the snapshot was taken.

    active object

    Currently active MLX memory, when the runtime exposes it.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    cache object

    MLX cache memory, when the runtime exposes it.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    peak object

    Peak MLX memory since the last reset, when available.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    wiredLimit object

    Configured MLX wired memory limit when known. Current MLX releases do not expose a getter on all platforms, so this may be null.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    sourceSource (string)required

    Runtime module that supplied the measurement, such as mlx.core.

    flightRecorder object[]

    Last 128 local-only runner diagnostic events.

  • Array [
  • atAt (string)required

    UTC timestamp when the runner emitted the update.

    phasePhase (string)required

    Runner phase at this entry.

    Possible values: [created, idle, connect_group, load_model, warmup, task_submission, task_agreement, prompt_build, vision_preprocess, kv_cache_lookup, prefill_barrier, prefill_pipeline, prefill_stream, decode_barrier, decode_wait_first_token, decode_stream, parser, cancel_requested, cancel_observed, completion, error, shutdown_cleanup]

    eventEvent (string)required

    Short event name within the phase.

    detail object

    Compact human-readable detail for diagnostics.

    anyOf
    string
    attrs object

    Structured low-cardinality diagnostic attributes.

    property name* object
    anyOf
    string
    context objectrequired

    Stable runner identity fields for this entry.

    nodeIdNodeid (string)required

    Node ID that owns this runner.

    runnerIdRunnerid (string)required

    Runner ID.

    pid object

    Runner subprocess PID.

    anyOf
    integer
    instanceIdInstanceid (string)required

    Instance ID.

    modelIdModelid (string)required

    Model assigned to this runner.

    rankRank (integer)required

    Distributed rank for this runner.

    worldSizeWorldsize (integer)required

    Distributed world size.

    startLayerStartlayer (integer)required

    Inclusive first layer on this shard.

    endLayerEndlayer (integer)required

    Exclusive final layer on this shard.

    nLayersNlayers (integer)required

    Total model layers.

    taskId object

    Task ID associated with the entry, when known.

    anyOf
    string
    commandId object

    Command ID associated with the entry, when known.

    anyOf
    string
    mlxMemory object

    MLX memory snapshot captured with this entry, when present.

    anyOf
    generatedAtGeneratedat (string)required

    UTC timestamp when the snapshot was taken.

    active object

    Currently active MLX memory, when the runtime exposes it.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    cache object

    MLX cache memory, when the runtime exposes it.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    peak object

    Peak MLX memory since the last reset, when available.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    wiredLimit object

    Configured MLX wired memory limit when known. Current MLX releases do not expose a getter on all platforms, so this may be null.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    sourceSource (string)required

    Runtime module that supplied the measurement, such as mlx.core.

  • ]
  • pendingTaskIdsstring[]

    Tasks sent to the supervisor but not acknowledged by the runner.

    inProgressTasks object[]

    Tasks currently known as in progress by the supervisor.

  • Array [
  • taskIdTaskid (string)required

    Skulk task ID.

    taskKindTaskkind (string)required

    Concrete task model name.

    taskStatusTaskstatus (string)required

    Current event-sourced task status.

    instanceIdInstanceid (string)required

    Instance associated with the task.

    commandId object

    External command ID for user-facing inference tasks.

    anyOf
    string
    runnerId object

    Runner assigned to the task, if known.

    anyOf
    string
    modelId object

    Model associated with the task, if known.

    anyOf
    string
  • ]
  • completedTaskCountCompletedtaskcount (integer)required

    Number of tasks completed by this supervisor.

    cancelledTaskIdsstring[]

    Task IDs cancelled through this supervisor.

    lastTaskSentAt object

    UTC timestamp for the last task submitted to the runner.

    anyOf
    string
    lastEventReceivedAt object

    UTC timestamp for the last event received from the runner.

    anyOf
    string
    lastEventType object

    Class name of the last event received from the runner.

    anyOf
    string
    milestones object[]

    Recent lifecycle milestones retained by the supervisor.

  • Array [
  • atAt (string)required

    UTC timestamp when the milestone was recorded.

    nameName (string)required

    Short milestone name.

    detail object

    Optional compact detail for the milestone.

    anyOf
    string
  • ]
  • ]
  • placements object[]

    Event-sourced placement analysis for current instances.

  • Array [
  • instanceIdInstanceid (string)required

    Instance ID.

    modelIdModelid (string)required

    Placed model ID.

    masterNodeId object

    Current master node ID, when known.

    anyOf
    string
    masterIsPlacementNodeMasterisplacementnode (boolean)required

    Whether the current master is part of this model placement.

    localNodeIsPlacementNodeLocalnodeisplacementnode (boolean)required

    Whether the API node is part of this model placement.

    placementNodeIdsstring[]

    Node IDs participating in the placement.

    runners object[]

    Per-runner placement details.

  • Array [
  • runnerIdRunnerid (string)required

    Runner ID.

    nodeIdNodeid (string)required

    Node ID assigned to this runner.

    friendlyName object

    Friendly node name, when known.

    anyOf
    string
    statusKind object

    Current event-sourced runner status variant.

    anyOf
    string
    deviceRankDevicerank (integer)required

    Distributed device rank.

    worldSizeWorldsize (integer)required

    Distributed world size.

    startLayerStartlayer (integer)required

    Inclusive first model layer on this shard.

    endLayerEndlayer (integer)required

    Exclusive final model layer on this shard.

    nLayersNlayers (integer)required

    Total number of model layers.

    isLocalIslocal (boolean)required

    Whether this assignment is on the API node.

    isMasterIsmaster (boolean)required

    Whether this assignment is on the master node.

    tasks object[]

    Event-sourced tasks associated with this runner assignment.

  • Array [
  • taskIdTaskid (string)required

    Skulk task ID.

    taskKindTaskkind (string)required

    Concrete task model name.

    taskStatusTaskstatus (string)required

    Current event-sourced task status.

    instanceIdInstanceid (string)required

    Instance associated with the task.

    commandId object

    External command ID for user-facing inference tasks.

    anyOf
    string
    runnerId object

    Runner assigned to the task, if known.

    anyOf
    string
    modelId object

    Model associated with the task, if known.

    anyOf
    string
  • ]
  • ]
  • warningsstring[]

    Heuristic warnings that may help explain a stuck placement.

  • ]
  • warningsstring[]

    Top-level diagnostic warnings for this node.

    NodeDiagnostics
    {
    "generatedAt": "string",
    "runtime": {
    "nodeId": "string",
    "hostname": "string",
    "friendlyName": "string",
    "isMaster": true,
    "masterNodeId": "string",
    "cwd": "string",
    "configPath": "string",
    "configFileExists": true,
    "skulkVersion": "string",
    "skulkCommit": "string",
    "libp2PNamespace": "string",
    "pythonUnbuffered": true,
    "tracingEnabled": true,
    "structuredLoggingConfigured": true,
    "loggingIngestUrl": "string"
    },
    "identity": {
    "modelId": "Unknown",
    "chipId": "Unknown",
    "friendlyName": "Unknown",
    "osVersion": "Unknown",
    "osBuildVersion": "Unknown",
    "skulkVersion": "Unknown",
    "skulkCommit": "Unknown"
    },
    "resources": {
    "gatheredMemory": {
    "ramTotal": {
    "inBytes": 0
    },
    "ramAvailable": {
    "inBytes": 0
    },
    "swapTotal": {
    "inBytes": 0
    },
    "swapAvailable": {
    "inBytes": 0
    }
    },
    "currentMemory": {
    "ramTotal": {
    "inBytes": 0
    },
    "ramAvailable": {
    "inBytes": 0
    },
    "swapTotal": {
    "inBytes": 0
    },
    "swapAvailable": {
    "inBytes": 0
    }
    },
    "currentWired": {
    "inBytes": 0
    },
    "disk": {
    "total": {
    "inBytes": 0
    },
    "available": {
    "inBytes": 0
    }
    },
    "system": {
    "gpuUsage": 0,
    "temp": 0,
    "sysPower": 0,
    "pcpuUsage": 0,
    "ecpuUsage": 0,
    "accelerator": {
    "vendor": "unknown",
    "name": "Unknown",
    "utilizationRatio": 0,
    "vramTotalBytes": 0,
    "vramUsedBytes": 0,
    "gttTotalBytes": 0,
    "powerWatts": 0,
    "temperatureCelsius": 0,
    "clockMhz": 0
    }
    },
    "network": {
    "interfaces": [
    {
    "name": "string",
    "ipAddress": "string",
    "interfaceType": "unknown"
    }
    ]
    }
    },
    "processes": [
    {
    "pid": 0,
    "parentPid": 0,
    "role": "skulk",
    "command": "string",
    "status": "string",
    "cpuPercent": 0,
    "memoryPercent": 0,
    "rss": {
    "inBytes": 0
    },
    "elapsedSeconds": 0,
    "isChildOfSkulk": false
    }
    ],
    "supervisorRunners": [
    {
    "runnerId": "string",
    "instanceId": "string",
    "nodeId": "string",
    "modelId": "string",
    "deviceRank": 0,
    "worldSize": 0,
    "startLayer": 0,
    "endLayer": 0,
    "nLayers": 0,
    "pid": 0,
    "processAlive": true,
    "exitCode": 0,
    "statusKind": "string",
    "statusSince": "string",
    "secondsInStatus": 0,
    "phase": "created",
    "phaseStartedAt": "string",
    "secondsInPhase": 0,
    "lastProgressAt": "string",
    "activeTaskId": "string",
    "activeCommandId": "string",
    "phaseDetail": "string",
    "lastMlxMemory": {
    "generatedAt": "string",
    "active": {
    "inBytes": 0
    },
    "cache": {
    "inBytes": 0
    },
    "peak": {
    "inBytes": 0
    },
    "wiredLimit": {
    "inBytes": 0
    },
    "source": "string"
    },
    "flightRecorder": [
    {
    "at": "string",
    "phase": "created",
    "event": "string",
    "detail": "string",
    "attrs": {},
    "context": {
    "nodeId": "string",
    "runnerId": "string",
    "pid": 0,
    "instanceId": "string",
    "modelId": "string",
    "rank": 0,
    "worldSize": 0,
    "startLayer": 0,
    "endLayer": 0,
    "nLayers": 0
    },
    "taskId": "string",
    "commandId": "string",
    "mlxMemory": {
    "generatedAt": "string",
    "active": {
    "inBytes": 0
    },
    "cache": {
    "inBytes": 0
    },
    "peak": {
    "inBytes": 0
    },
    "wiredLimit": {
    "inBytes": 0
    },
    "source": "string"
    }
    }
    ],
    "pendingTaskIds": [
    "string"
    ],
    "inProgressTasks": [
    {
    "taskId": "string",
    "taskKind": "string",
    "taskStatus": "string",
    "instanceId": "string",
    "commandId": "string",
    "runnerId": "string",
    "modelId": "string"
    }
    ],
    "completedTaskCount": 0,
    "cancelledTaskIds": [
    "string"
    ],
    "lastTaskSentAt": "string",
    "lastEventReceivedAt": "string",
    "lastEventType": "string",
    "milestones": [
    {
    "at": "string",
    "name": "string",
    "detail": "string"
    }
    ]
    }
    ],
    "placements": [
    {
    "instanceId": "string",
    "modelId": "string",
    "masterNodeId": "string",
    "masterIsPlacementNode": true,
    "localNodeIsPlacementNode": true,
    "placementNodeIds": [
    "string"
    ],
    "runners": [
    {
    "runnerId": "string",
    "nodeId": "string",
    "friendlyName": "string",
    "statusKind": "string",
    "deviceRank": 0,
    "worldSize": 0,
    "startLayer": 0,
    "endLayer": 0,
    "nLayers": 0,
    "isLocal": true,
    "isMaster": true,
    "tasks": [
    {
    "taskId": "string",
    "taskKind": "string",
    "taskStatus": "string",
    "instanceId": "string",
    "commandId": "string",
    "runnerId": "string",
    "modelId": "string"
    }
    ]
    }
    ],
    "warnings": [
    "string"
    ]
    }
    ],
    "warnings": [
    "string"
    ]
    }