cjm-infra-plugin-system
Standardized interface and data structures for infrastructure monitoring plugins in the cjm-plugin-system ecosystem.
Install
pip install cjm_infra_plugin_systemProject Structure
nbs/
├── core.ipynb # Core data structures for infrastructure monitoring
└── plugin_interface.ipynb # Domain-specific plugin interface for system monitoring
Total: 2 notebooks
Module Dependencies
graph LR
core[core<br/>core]
plugin_interface[plugin_interface<br/>plugin_interface]
plugin_interface --> core
1 cross-module dependencies detected
CLI Reference
No CLI commands found in this project.
Module Overview
Detailed documentation for each module in the project:
core (core.ipynb)
Core data structures for infrastructure monitoring
Import
from cjm_infra_plugin_system.core import (
SystemStats,
ProcessStats
)Classes
@dataclass
class SystemStats:
"Standardized snapshot of system resources."
cpu_percent: float = 0.0 # Overall CPU utilization percentage
memory_used_mb: float = 0.0 # Currently used system RAM in MB
memory_total_mb: float = 0.0 # Total system RAM in MB
memory_available_mb: float = 0.0 # Available system RAM in MB
gpu_type: str = 'None' # GPU vendor: 'NVIDIA', 'AMD', 'Intel', 'None'
gpu_free_memory_mb: float = 0.0 # Free GPU memory in MB (sum of all visible devices)
gpu_total_memory_mb: float = 0.0 # Total GPU memory in MB
gpu_used_memory_mb: float = 0.0 # Used GPU memory in MB
gpu_load_percent: float = 0.0 # GPU compute utilization percentage
details: Dict[str, Any] = field(...) # Per-device stats, temperatures, etc.
def to_dict(self) -> Dict[str, Any]: # Dictionary representation for JSON serialization
"Convert to dictionary for JSON serialization."@dataclass
class ProcessStats:
"""
Per-process resource usage snapshot reported by `MonitorPlugin.list_processes`.
CR-3 introduced this as the typed replacement for `SystemStats.details['processes']`.
Monitor plugins that can enumerate per-process GPU usage (e.g. NVIDIA via `nvitop`)
return a list of these; monitors without per-process visibility return `[]` from
the default `MonitorPlugin.list_processes()` implementation.
"""
pid: int = 0 # OS process ID
gpu_index: int = -1 # GPU index (0-based); -1 if not GPU-bound or unknown
gpu_memory_mb: float = 0.0 # GPU memory usage attributable to this process, in MB
command: str = '' # Process command line (or short name)
def to_dict(self) -> Dict[str, Any]: # Dictionary representation for JSON serialization
"Convert to dictionary for JSON serialization."plugin_interface (plugin_interface.ipynb)
Domain-specific plugin interface for system monitoring
Import
from cjm_infra_plugin_system.plugin_interface import (
MonitorPlugin
)Classes
class MonitorPlugin(PluginInterface):
"""
Abstract base class for hardware monitoring plugins.
CR-3 shifted MonitorPlugin from dispatcher-style `execute(command=...)` to a
typed surface: subclasses override `get_system_status()` returning `SystemStats`
and optionally `list_processes()` returning `List[ProcessStats]`. The legacy
`execute(command=...)` dispatcher is kept as a backward-compat shim so monitors
that predate CR-3 keep working until the SG-47 migration cascade.
Subclasses MUST override at least one of `execute()` or `get_system_status()` —
the `__init_subclass__` guard enforces this at class-definition time to prevent
the recursion trap where both defaults call each other.
"""
def execute(
self,
command: str = "get_system_status", # REMOVE-AFTER-OVERHAUL: rename to `action=` via SG-47 + SG-42 cascade
**kwargs: Any,
) -> Any
"Backward-compat dispatcher (REMOVE-AFTER-OVERHAUL).
Bridges pre-CR-3 callers (substrate's `_get_global_stats` + job-monitor's
`services/monitor.py`) to typed methods. New MonitorPlugin subclasses
override `get_system_status()` directly and inherit this dispatcher; old
subclasses override this dispatcher with their own dict-returning logic
and rely on the default `get_system_status()` to wrap the result.
After SG-47 cascade migrates consumers off the dispatcher and SG-48 sweep
runs, this default body drops; `execute()` either becomes abstract again
(with `command=` renamed to `action=` per SG-42) or is removed from
MonitorPlugin entirely if all monitors override typed methods."
def get_system_status(self) -> SystemStats: # Current system telemetry
"""Gather current system statistics as a typed `SystemStats` snapshot.
The default body (REMOVE-AFTER-OVERHAUL) delegates to
`self.execute("get_system_status")` and wraps the returned dict so that
monitor plugins predating CR-3 keep working. New monitors override this
method directly; the `__init_subclass__` guard ensures at least one of
`execute()` or `get_system_status()` is overridden by every concrete
subclass.
Unknown fields in the dispatcher's return dict are filtered out (rather
than raising `TypeError`) so monitors that emit extra debug fields don't
crash the wrapping.
"""
raw = self.execute("get_system_status")
if isinstance(raw, SystemStats)
"Gather current system statistics as a typed `SystemStats` snapshot.
The default body (REMOVE-AFTER-OVERHAUL) delegates to
`self.execute("get_system_status")` and wraps the returned dict so that
monitor plugins predating CR-3 keep working. New monitors override this
method directly; the `__init_subclass__` guard ensures at least one of
`execute()` or `get_system_status()` is overridden by every concrete
subclass.
Unknown fields in the dispatcher's return dict are filtered out (rather
than raising `TypeError`) so monitors that emit extra debug fields don't
crash the wrapping."
def list_processes(self) -> List[ProcessStats]: # Per-process resource usage
"List per-process resource usage. Default returns `[]`.
Monitors with per-process GPU visibility (NVIDIA via `nvitop`/`pynvml`)
override this. CPU-only monitors and AMD pre-ROCm inherit the empty
default since they cannot enumerate per-GPU-process attribution.
`list_devices()` is deliberately omitted per audit's Q-CR3-1=(c) YAGNI
disposition — add when multi-GPU support surfaces a concrete consumer
need."