Skip to main content

RunnerSupervisorDiagnostics

Live runner-supervisor state that is not event-sourced.

runnerIdRunnerid (string)required

Runner ID.

instanceIdInstanceid (string)required

Instance ID.

nodeIdNodeid (string)required

Node ID that owns this runner.

modelIdModelid (string)required

Model assigned to this runner.

deviceRankDevicerank (integer)required

Distributed device rank.

worldSizeWorldsize (integer)required

Distributed world size.

startLayerStartlayer (integer)required

Inclusive first model layer on this shard.

endLayerEndlayer (integer)required

Exclusive final model layer on this shard.

nLayersNlayers (integer)required

Total number of model layers.

pid object

Runner subprocess PID, when started.

anyOf
integer
processAliveProcessalive (boolean)required

Whether the runner subprocess is alive.

exitCode object

Runner subprocess exit code, when exited.

anyOf
integer
statusKindStatuskind (string)required

Current runner status variant.

statusSinceStatussince (string)required

UTC timestamp for the current status.

secondsInStatusSecondsinstatus (number)required

Wall-clock seconds spent in the current runner status.

phasePhase (string)required

Last runner phase reported.

Possible values: [created, idle, connect_group, load_model, warmup, task_submission, task_agreement, prompt_build, vision_preprocess, kv_cache_lookup, prefill_barrier, prefill_pipeline, prefill_stream, decode_barrier, decode_wait_first_token, decode_stream, parser, cancel_requested, cancel_observed, completion, error, shutdown_cleanup]

phaseStartedAtPhasestartedat (string)required

UTC timestamp when the current phase started.

secondsInPhaseSecondsinphase (number)required

Wall-clock seconds spent in the current phase.

lastProgressAt object

UTC timestamp for the last flight-recorder update.

anyOf
string
activeTaskId object

Task ID associated with the current phase, when known.

anyOf
string
activeCommandId object

Command ID associated with the current phase, when known.

anyOf
string
phaseDetail object

Compact human-readable detail for the current phase.

anyOf
string
lastMlxMemory object

Most recent MLX memory snapshot reported by the runner.

anyOf
generatedAtGeneratedat (string)required

UTC timestamp when the snapshot was taken.

active object

Currently active MLX memory, when the runtime exposes it.

anyOf
inBytesInbytes (integer)
Default value: 0
cache object

MLX cache memory, when the runtime exposes it.

anyOf
inBytesInbytes (integer)
Default value: 0
peak object

Peak MLX memory since the last reset, when available.

anyOf
inBytesInbytes (integer)
Default value: 0
wiredLimit object

Configured MLX wired memory limit when known. Current MLX releases do not expose a getter on all platforms, so this may be null.

anyOf
inBytesInbytes (integer)
Default value: 0
sourceSource (string)required

Runtime module that supplied the measurement, such as mlx.core.

flightRecorder object[]

Last 128 local-only runner diagnostic events.

  • Array [
  • atAt (string)required

    UTC timestamp when the runner emitted the update.

    phasePhase (string)required

    Runner phase at this entry.

    Possible values: [created, idle, connect_group, load_model, warmup, task_submission, task_agreement, prompt_build, vision_preprocess, kv_cache_lookup, prefill_barrier, prefill_pipeline, prefill_stream, decode_barrier, decode_wait_first_token, decode_stream, parser, cancel_requested, cancel_observed, completion, error, shutdown_cleanup]

    eventEvent (string)required

    Short event name within the phase.

    detail object

    Compact human-readable detail for diagnostics.

    anyOf
    string
    attrs object

    Structured low-cardinality diagnostic attributes.

    property name* object
    anyOf
    string
    context objectrequired

    Stable runner identity fields for this entry.

    nodeIdNodeid (string)required

    Node ID that owns this runner.

    runnerIdRunnerid (string)required

    Runner ID.

    pid object

    Runner subprocess PID.

    anyOf
    integer
    instanceIdInstanceid (string)required

    Instance ID.

    modelIdModelid (string)required

    Model assigned to this runner.

    rankRank (integer)required

    Distributed rank for this runner.

    worldSizeWorldsize (integer)required

    Distributed world size.

    startLayerStartlayer (integer)required

    Inclusive first layer on this shard.

    endLayerEndlayer (integer)required

    Exclusive final layer on this shard.

    nLayersNlayers (integer)required

    Total model layers.

    taskId object

    Task ID associated with the entry, when known.

    anyOf
    string
    commandId object

    Command ID associated with the entry, when known.

    anyOf
    string
    mlxMemory object

    MLX memory snapshot captured with this entry, when present.

    anyOf
    generatedAtGeneratedat (string)required

    UTC timestamp when the snapshot was taken.

    active object

    Currently active MLX memory, when the runtime exposes it.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    cache object

    MLX cache memory, when the runtime exposes it.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    peak object

    Peak MLX memory since the last reset, when available.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    wiredLimit object

    Configured MLX wired memory limit when known. Current MLX releases do not expose a getter on all platforms, so this may be null.

    anyOf
    inBytesInbytes (integer)
    Default value: 0
    sourceSource (string)required

    Runtime module that supplied the measurement, such as mlx.core.

  • ]
  • pendingTaskIdsstring[]

    Tasks sent to the supervisor but not acknowledged by the runner.

    inProgressTasks object[]

    Tasks currently known as in progress by the supervisor.

  • Array [
  • taskIdTaskid (string)required

    Skulk task ID.

    taskKindTaskkind (string)required

    Concrete task model name.

    taskStatusTaskstatus (string)required

    Current event-sourced task status.

    instanceIdInstanceid (string)required

    Instance associated with the task.

    commandId object

    External command ID for user-facing inference tasks.

    anyOf
    string
    runnerId object

    Runner assigned to the task, if known.

    anyOf
    string
    modelId object

    Model associated with the task, if known.

    anyOf
    string
  • ]
  • completedTaskCountCompletedtaskcount (integer)required

    Number of tasks completed by this supervisor.

    cancelledTaskIdsstring[]

    Task IDs cancelled through this supervisor.

    lastTaskSentAt object

    UTC timestamp for the last task submitted to the runner.

    anyOf
    string
    lastEventReceivedAt object

    UTC timestamp for the last event received from the runner.

    anyOf
    string
    lastEventType object

    Class name of the last event received from the runner.

    anyOf
    string
    milestones object[]

    Recent lifecycle milestones retained by the supervisor.

  • Array [
  • atAt (string)required

    UTC timestamp when the milestone was recorded.

    nameName (string)required

    Short milestone name.

    detail object

    Optional compact detail for the milestone.

    anyOf
    string
  • ]
  • RunnerSupervisorDiagnostics
    {
    "runnerId": "string",
    "instanceId": "string",
    "nodeId": "string",
    "modelId": "string",
    "deviceRank": 0,
    "worldSize": 0,
    "startLayer": 0,
    "endLayer": 0,
    "nLayers": 0,
    "pid": 0,
    "processAlive": true,
    "exitCode": 0,
    "statusKind": "string",
    "statusSince": "string",
    "secondsInStatus": 0,
    "phase": "created",
    "phaseStartedAt": "string",
    "secondsInPhase": 0,
    "lastProgressAt": "string",
    "activeTaskId": "string",
    "activeCommandId": "string",
    "phaseDetail": "string",
    "lastMlxMemory": {
    "generatedAt": "string",
    "active": {
    "inBytes": 0
    },
    "cache": {
    "inBytes": 0
    },
    "peak": {
    "inBytes": 0
    },
    "wiredLimit": {
    "inBytes": 0
    },
    "source": "string"
    },
    "flightRecorder": [
    {
    "at": "string",
    "phase": "created",
    "event": "string",
    "detail": "string",
    "attrs": {},
    "context": {
    "nodeId": "string",
    "runnerId": "string",
    "pid": 0,
    "instanceId": "string",
    "modelId": "string",
    "rank": 0,
    "worldSize": 0,
    "startLayer": 0,
    "endLayer": 0,
    "nLayers": 0
    },
    "taskId": "string",
    "commandId": "string",
    "mlxMemory": {
    "generatedAt": "string",
    "active": {
    "inBytes": 0
    },
    "cache": {
    "inBytes": 0
    },
    "peak": {
    "inBytes": 0
    },
    "wiredLimit": {
    "inBytes": 0
    },
    "source": "string"
    }
    }
    ],
    "pendingTaskIds": [
    "string"
    ],
    "inProgressTasks": [
    {
    "taskId": "string",
    "taskKind": "string",
    "taskStatus": "string",
    "instanceId": "string",
    "commandId": "string",
    "runnerId": "string",
    "modelId": "string"
    }
    ],
    "completedTaskCount": 0,
    "cancelledTaskIds": [
    "string"
    ],
    "lastTaskSentAt": "string",
    "lastEventReceivedAt": "string",
    "lastEventType": "string",
    "milestones": [
    {
    "at": "string",
    "name": "string",
    "detail": "string"
    }
    ]
    }