RunnerSupervisorDiagnostics
Live runner-supervisor state that is not event-sourced.
Runner ID.
Instance ID.
Node ID that owns this runner.
Model assigned to this runner.
Distributed device rank.
Distributed world size.
Inclusive first model layer on this shard.
Exclusive final model layer on this shard.
Total number of model layers.
pid object
Runner subprocess PID, when started.
- integer
- null
Whether the runner subprocess is alive.
exitCode object
Runner subprocess exit code, when exited.
- integer
- null
Current runner status variant.
UTC timestamp for the current status.
Wall-clock seconds spent in the current runner status.
Last runner phase reported.
Possible values: [created, idle, connect_group, load_model, warmup, task_submission, task_agreement, prompt_build, vision_preprocess, kv_cache_lookup, prefill_barrier, prefill_pipeline, prefill_stream, decode_barrier, decode_wait_first_token, decode_stream, parser, cancel_requested, cancel_observed, completion, error, shutdown_cleanup]
UTC timestamp when the current phase started.
Wall-clock seconds spent in the current phase.
lastProgressAt object
UTC timestamp for the last flight-recorder update.
- string
- null
activeTaskId object
Task ID associated with the current phase, when known.
- string
- null
activeCommandId object
Command ID associated with the current phase, when known.
- string
- null
phaseDetail object
Compact human-readable detail for the current phase.
- string
- null
lastMlxMemory object
Most recent MLX memory snapshot reported by the runner.
- MlxMemorySnapshot
- null
UTC timestamp when the snapshot was taken.
active object
Currently active MLX memory, when the runtime exposes it.
- Memory
- null
0cache object
MLX cache memory, when the runtime exposes it.
- Memory
- null
0peak object
Peak MLX memory since the last reset, when available.
- Memory
- null
0wiredLimit object
Configured MLX wired memory limit when known. Current MLX releases do not expose a getter on all platforms, so this may be null.
- Memory
- null
0Runtime module that supplied the measurement, such as mlx.core.
flightRecorder object[]
Last 128 local-only runner diagnostic events.
UTC timestamp when the runner emitted the update.
Runner phase at this entry.
Possible values: [created, idle, connect_group, load_model, warmup, task_submission, task_agreement, prompt_build, vision_preprocess, kv_cache_lookup, prefill_barrier, prefill_pipeline, prefill_stream, decode_barrier, decode_wait_first_token, decode_stream, parser, cancel_requested, cancel_observed, completion, error, shutdown_cleanup]
Short event name within the phase.
detail object
Compact human-readable detail for diagnostics.
- string
- null
attrs object
Structured low-cardinality diagnostic attributes.
property name* object
- string
- integer
- number
- boolean
- string[]
context objectrequired
Stable runner identity fields for this entry.
Node ID that owns this runner.
Runner ID.
pid object
Runner subprocess PID.
- integer
- null
Instance ID.
Model assigned to this runner.
Distributed rank for this runner.
Distributed world size.
Inclusive first layer on this shard.
Exclusive final layer on this shard.
Total model layers.
taskId object
Task ID associated with the entry, when known.
- string
- null
commandId object
Command ID associated with the entry, when known.
- string
- null
mlxMemory object
MLX memory snapshot captured with this entry, when present.
- MlxMemorySnapshot
- null
UTC timestamp when the snapshot was taken.
active object
Currently active MLX memory, when the runtime exposes it.
- Memory
- null
0cache object
MLX cache memory, when the runtime exposes it.
- Memory
- null
0peak object
Peak MLX memory since the last reset, when available.
- Memory
- null
0wiredLimit object
Configured MLX wired memory limit when known. Current MLX releases do not expose a getter on all platforms, so this may be null.
- Memory
- null
0Runtime module that supplied the measurement, such as mlx.core.
Tasks sent to the supervisor but not acknowledged by the runner.
inProgressTasks object[]
Tasks currently known as in progress by the supervisor.
Skulk task ID.
Concrete task model name.
Current event-sourced task status.
Instance associated with the task.
commandId object
External command ID for user-facing inference tasks.
- string
- null
runnerId object
Runner assigned to the task, if known.
- string
- null
modelId object
Model associated with the task, if known.
- string
- null
Number of tasks completed by this supervisor.
Task IDs cancelled through this supervisor.
lastTaskSentAt object
UTC timestamp for the last task submitted to the runner.
- string
- null
lastEventReceivedAt object
UTC timestamp for the last event received from the runner.
- string
- null
lastEventType object
Class name of the last event received from the runner.
- string
- null
milestones object[]
Recent lifecycle milestones retained by the supervisor.
UTC timestamp when the milestone was recorded.
Short milestone name.
detail object
Optional compact detail for the milestone.
- string
- null
{
"runnerId": "string",
"instanceId": "string",
"nodeId": "string",
"modelId": "string",
"deviceRank": 0,
"worldSize": 0,
"startLayer": 0,
"endLayer": 0,
"nLayers": 0,
"pid": 0,
"processAlive": true,
"exitCode": 0,
"statusKind": "string",
"statusSince": "string",
"secondsInStatus": 0,
"phase": "created",
"phaseStartedAt": "string",
"secondsInPhase": 0,
"lastProgressAt": "string",
"activeTaskId": "string",
"activeCommandId": "string",
"phaseDetail": "string",
"lastMlxMemory": {
"generatedAt": "string",
"active": {
"inBytes": 0
},
"cache": {
"inBytes": 0
},
"peak": {
"inBytes": 0
},
"wiredLimit": {
"inBytes": 0
},
"source": "string"
},
"flightRecorder": [
{
"at": "string",
"phase": "created",
"event": "string",
"detail": "string",
"attrs": {},
"context": {
"nodeId": "string",
"runnerId": "string",
"pid": 0,
"instanceId": "string",
"modelId": "string",
"rank": 0,
"worldSize": 0,
"startLayer": 0,
"endLayer": 0,
"nLayers": 0
},
"taskId": "string",
"commandId": "string",
"mlxMemory": {
"generatedAt": "string",
"active": {
"inBytes": 0
},
"cache": {
"inBytes": 0
},
"peak": {
"inBytes": 0
},
"wiredLimit": {
"inBytes": 0
},
"source": "string"
}
}
],
"pendingTaskIds": [
"string"
],
"inProgressTasks": [
{
"taskId": "string",
"taskKind": "string",
"taskStatus": "string",
"instanceId": "string",
"commandId": "string",
"runnerId": "string",
"modelId": "string"
}
],
"completedTaskCount": 0,
"cancelledTaskIds": [
"string"
],
"lastTaskSentAt": "string",
"lastEventReceivedAt": "string",
"lastEventType": "string",
"milestones": [
{
"at": "string",
"name": "string",
"detail": "string"
}
]
}