Manifest Format (v2.0)

Typed parser + writer for the nested v2.0 manifest layout per the 2026-05-19 substrate audit’s CR-8. Substrate manifests transitioned from a flat top-level JSON object to a four-section nested layout: install (deployment-specific facts populated at install time), code (code-derived facts refreshed by cjm-ctl regenerate-manifest), drift_tracking (a config_schema hash that records the witness shape so live-vs-stored comparisons can detect drift), and overrides (an operator-supplied overlay placeholder).

Current format version

The substrate emits format_version: "2.0" on every freshly-written manifest. The reader accepts both "2.0" (nested layout) and legacy manifests (no format_version key, flat layout). Unrecognized future values raise ValueError so the substrate fails loud rather than silently degrading.

Section dataclasses

Four dataclasses mirror the JSON layout one-to-one. CodeSection.class_name is the Python-side rename for the reserved-word class JSON key; the dict serializers below handle the rename at the boundary.


InstallSection


def InstallSection(
    python_path:str='', conda_env:str='', db_path:str='', env_vars:Dict=<factory>, installed_at:str='',
    installer_version:str='', package_source:str=''
)->None:

Deployment-specific facts populated at install time.

These fields are written by install_all (paths, conda env, env vars) plus _generate_manifest’s post-introspection step (installed_at, installer_version, package_source). regenerate-manifest preserves the install section across regeneration so paths survive code-side refreshes.


CodeSection


def CodeSection(
    name:str='', version:str='', description:str='', module:str='', class_name:str='', resources:Optional=None,
    config_schema:Optional=None, regenerated_at:Optional=None, worker_env:Optional=None,
    structural_surface:Optional=None
)->None:

Code-derived facts refreshed by cjm-ctl regenerate-manifest.

Everything in this section comes from running the introspection script inside the capability’s conda env: metadata + config_schema + binary platform/hardware hard-facts. Drift detection hashes this section’s config_schema field as its witness shape.

class_name serializes as the JSON key "class" (Python reserved-word workaround).


DriftTracking


def DriftTracking(
    config_schema_hash:Optional=None, structural_surface_hash:Optional=None
)->None:

Witness hashes for drift detection.

config_schema_hash is computed at write time (regenerate-manifest / install_all) from a canonical JSON encoding of the code section’s config_schema. The CapabilityManager’s drift-check fetches the live /config_schema from the worker, hashes it the same way, and compares; a mismatch raises CapabilityMeta.config_schema_drift = True plus a warning log.


ManifestV2


def ManifestV2(
    install:InstallSection=<factory>, code:CodeSection=<factory>, drift_tracking:DriftTracking=<factory>,
    overrides:Dict=<factory>, format_version:str='2.0'
)->None:

Top-level v2.0 manifest with four named sections plus format_version.

Loaded from a v2.0 nested JSON file as-is; format_version is always CURRENT_FORMAT_VERSION.

Config-schema hashing

compute_config_schema_hash canonicalizes the schema (sorted keys, no whitespace) before hashing so the digest is stable across Python versions and dict-insertion orders. Reuses cjm_substrate.utils.hashing.hash_bytes for the algo-tagged "sha256:hex" return shape that the rest of the ecosystem already uses (graph capability, future bundle library).


compute_config_schema_hash


def compute_config_schema_hash(
    schema:Optional, # JSON Schema or None
)->str: # "sha256:hexdigest"

Hash a JSON Schema with stable canonicalization.

None is treated as {} — the hash records “no schema declared” rather than refusing. This way a capability that lost its config_schema between install and load still gets a drift warning rather than a crash.


compute_structural_surface_hash


def compute_structural_surface_hash(
    surface:Optional, # derive_structural_surface output or None
)->str: # "sha256:hexdigest"

Hash a structural surface with stable canonicalization.

Same canonical-JSON + hash_bytes shape as compute_config_schema_hash (the CR-8 idiom). None hashes as {} — but note the drift check skips when the STORED hash is None (pre-surface-era manifest ≠ drift); _generate_manifest only writes a hash when a surface was recorded.

Read path

load_manifest(path) is the public entry point. It detects the on-disk format from the top-level format_version key:

  • "2.0" → parse the nested sections directly (_from_v2_dict).
  • anything else (including missing) → ValueError (fail loud on unrecognized formats).

It returns a ManifestV2. The downstream code path never sees a flat dict again — only typed sections.


load_manifest


def load_manifest(
    path:Union, # Path to manifest JSON file on disk
)->ManifestV2: # Parsed manifest in v2.0 typed shape

Load a manifest file and return a typed ManifestV2.

Format detection by top-level format_version key: - "2.0" → nested layout, parse directly. - anything else (including missing) → ValueError (fail loud).

Write path

write_manifest(path, m) always emits v2.0 layout.

manifest_to_dict(m) is the underlying serializer; exposed separately so callers that need the dict (cjm-ctl validate, tests) can pull it without going through disk.


manifest_to_dict


def manifest_to_dict(
    m:ManifestV2, # Manifest to serialize
)->Dict: # v2.0 nested dict ready for `json.dumps`

Serialize a ManifestV2 to a v2.0 dict.

Always emits format_version == CURRENT_FORMAT_VERSION.


write_manifest


def write_manifest(
    path:Union, # Output JSON file path
    manifest:ManifestV2, # Manifest to serialize
)->None:

Serialize a ManifestV2 to disk in v2.0 nested layout (indent=2).

Tests

# structural_surface round-trip: field + witness hash survive
# manifest_to_dict -> json -> load_manifest; a pre-surface manifest dict
# (no keys) parses to None/None.
import tempfile, os

surface = {"methods": [{"name": "execute", "signature": "(self, audio, **kwargs) -> Any"}],
           "properties": ["name", "version"], "attributes": []}
m = ManifestV2(code=CodeSection(name="p", structural_surface=surface),
               drift_tracking=DriftTracking(
                   structural_surface_hash=compute_structural_surface_hash(surface)))
with tempfile.TemporaryDirectory() as td:
    f = os.path.join(td, "m.json")
    write_manifest(f, m)
    back = load_manifest(f)
    assert back.code.structural_surface == surface
    assert back.drift_tracking.structural_surface_hash == compute_structural_surface_hash(surface)
    # determinism: key order must not change the hash
    reordered = {"properties": ["name", "version"], "attributes": [],
                 "methods": [{"signature": "(self, audio, **kwargs) -> Any", "name": "execute"}]}
    assert compute_structural_surface_hash(reordered) == compute_structural_surface_hash(surface)

pre_surface = ManifestV2()
with tempfile.TemporaryDirectory() as td:
    f = os.path.join(td, "m.json")
    write_manifest(f, pre_surface)
    back = load_manifest(f)
    assert back.code.structural_surface is None
    assert back.drift_tracking.structural_surface_hash is None
print("structural_surface manifest round-trip OK")
structural_surface manifest round-trip OK