Typed parser + writer for the nested v2.0 manifest layout per the 2026-05-19 substrate audit’s CR-8. Substrate manifests transitioned from a flat top-level JSON object to a four-section nested layout: install (deployment-specific facts populated at install time), code (code-derived facts refreshed by cjm-ctl regenerate-manifest), drift_tracking (a config_schema hash that records the witness shape so live-vs-stored comparisons can detect drift), and overrides (an operator-supplied overlay placeholder).
Current format version
The substrate emits format_version: "2.0" on every freshly-written manifest. The reader accepts both "2.0" (nested layout) and legacy manifests (no format_version key, flat layout). Unrecognized future values raise ValueError so the substrate fails loud rather than silently degrading.
Section dataclasses
Four dataclasses mirror the JSON layout one-to-one. CodeSection.class_name is the Python-side rename for the reserved-word class JSON key; the dict serializers below handle the rename at the boundary.
Deployment-specific facts populated at install time.
These fields are written by install_all (paths, conda env, env vars) plus _generate_manifest’s post-introspection step (installed_at, installer_version, package_source). regenerate-manifest preserves the install section across regeneration so paths survive code-side refreshes.
Code-derived facts refreshed by cjm-ctl regenerate-manifest.
Everything in this section comes from running the introspection script inside the capability’s conda env: metadata + config_schema + binary platform/hardware hard-facts. Drift detection hashes this section’s config_schema field as its witness shape.
class_name serializes as the JSON key "class" (Python reserved-word workaround).
config_schema_hash is computed at write time (regenerate-manifest / install_all) from a canonical JSON encoding of the code section’s config_schema. The CapabilityManager’s drift-check fetches the live /config_schema from the worker, hashes it the same way, and compares; a mismatch raises CapabilityMeta.config_schema_drift = True plus a warning log.
Top-level v2.0 manifest with four named sections plus format_version.
Loaded from a v2.0 nested JSON file as-is; format_version is always CURRENT_FORMAT_VERSION.
Config-schema hashing
compute_config_schema_hash canonicalizes the schema (sorted keys, no whitespace) before hashing so the digest is stable across Python versions and dict-insertion orders. Reuses cjm_substrate.utils.hashing.hash_bytes for the algo-tagged "sha256:hex" return shape that the rest of the ecosystem already uses (graph capability, future bundle library).
compute_config_schema_hash
def compute_config_schema_hash( schema:Optional, # JSON Schema or None)->str: # "sha256:hexdigest"
Hash a JSON Schema with stable canonicalization.
None is treated as {} — the hash records “no schema declared” rather than refusing. This way a capability that lost its config_schema between install and load still gets a drift warning rather than a crash.
compute_structural_surface_hash
def compute_structural_surface_hash( surface:Optional, # derive_structural_surface output or None)->str: # "sha256:hexdigest"
Hash a structural surface with stable canonicalization.
Same canonical-JSON + hash_bytes shape as compute_config_schema_hash (the CR-8 idiom). None hashes as {} — but note the drift check skips when the STORED hash is None (pre-surface-era manifest ≠ drift); _generate_manifest only writes a hash when a surface was recorded.
Read path
load_manifest(path) is the public entry point. It detects the on-disk format from the top-level format_version key:
"2.0" → parse the nested sections directly (_from_v2_dict).
manifest_to_dict(m) is the underlying serializer; exposed separately so callers that need the dict (cjm-ctl validate, tests) can pull it without going through disk.
manifest_to_dict
def manifest_to_dict( m:ManifestV2, # Manifest to serialize)->Dict: # v2.0 nested dict ready for `json.dumps`