# Content Hashing Utilities


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## hash_bytes

Computes a cryptographic hash of byte content, returning a
self-describing string in `"algo:hexdigest"` format. This format embeds
the algorithm name, making hashes forward-compatible if the algorithm
changes.

------------------------------------------------------------------------

### hash_bytes

``` python

def hash_bytes(
    content:bytes, # Byte content to hash
    algo:str='sha256', # Hash algorithm name (e.g., "sha256", "sha3_256")
)->str: # Hash string in "algo:hexdigest" format

```

*Compute a hash of byte content.*

``` python
result = hash_bytes(b"hello world")
print(f"hash_bytes result: {result}")

# Check format
algo, digest = result.split(":", 1)
assert algo == "sha256"
assert len(digest) == 64  # SHA-256 produces 64 hex chars

# Deterministic
assert hash_bytes(b"hello world") == hash_bytes(b"hello world")

# Different content produces different hash
assert hash_bytes(b"hello world") != hash_bytes(b"hello World")

print("hash_bytes tests passed")
```

    hash_bytes result: sha256:b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
    hash_bytes tests passed

``` python
# Custom algorithm
sha512_result = hash_bytes(b"test", algo="sha512")
print(f"SHA-512 result: {sha512_result[:30]}...")
assert sha512_result.startswith("sha512:")
assert len(sha512_result.split(":")[1]) == 128  # SHA-512 produces 128 hex chars

print("Custom algorithm test passed")
```

    SHA-512 result: sha512:ee26b0dd4af7e749aa1a8ee...
    Custom algorithm test passed

## hash_file

Stream-hashes a file without loading it entirely into memory. Uses
chunked reads suitable for large files (audio, video, etc.).

------------------------------------------------------------------------

### hash_file

``` python

def hash_file(
    path:Union, # Path to file to hash
    algo:str='sha256', # Hash algorithm name
    chunk_size:int=8192, # Read chunk size in bytes
)->str: # Hash string in "algo:hexdigest" format

```

*Stream-hash a file without loading it entirely into memory.*

``` python
import tempfile
import os

# Create a temp file with known content
with tempfile.NamedTemporaryFile(delete=False, mode='wb') as tmp:
    tmp.write(b"hello world")
    tmp_path = tmp.name

# Hash the file
file_hash = hash_file(tmp_path)
print(f"hash_file result: {file_hash}")

# Should match hash_bytes of the same content
assert file_hash == hash_bytes(b"hello world")

# Test with Path object
assert hash_file(Path(tmp_path)) == file_hash

# Cleanup
os.unlink(tmp_path)
print("hash_file tests passed")
```

    hash_file result: sha256:b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
    hash_file tests passed

## verify_hash

Verifies byte content against an expected hash string. Automatically
extracts the algorithm from the `"algo:hexdigest"` format.

------------------------------------------------------------------------

### verify_hash

``` python

def verify_hash(
    content:bytes, # Content to verify
    expected:str, # Expected hash in "algo:hexdigest" format
)->bool: # True if content matches expected hash

```

*Verify content against an expected hash string.*

``` python
original = b"hello world"
h = hash_bytes(original)

# Matching content
assert verify_hash(original, h) == True

# Modified content
assert verify_hash(b"hello World", h) == False

# Works with different algorithms
h_sha512 = hash_bytes(original, algo="sha512")
assert verify_hash(original, h_sha512) == True
assert verify_hash(b"tampered", h_sha512) == False

print("verify_hash tests passed")
```

    verify_hash tests passed

------------------------------------------------------------------------

### hash_dict_canonical

``` python

def hash_dict_canonical(
    data:Optional, # Dict to hash (or None — treated as {})
    algo:str='sha256', # Hash algorithm name
)->str: # Hash string in "algo:hexdigest" format

```

*Hash a dict via canonical JSON encoding.*

Canonicalization:
`json.dumps(data, sort_keys=True, separators=(",", ":"))`. Sorted keys
eliminate dict-insertion-order variance; minimal separators eliminate
whitespace variance. Result is deterministic across Python versions and
machines.

``` python
# Insertion-order independence — sorted-keys canonicalization eliminates variance
d_a = {"model": "base", "device": "cuda"}
d_b = {"device": "cuda", "model": "base"}
assert hash_dict_canonical(d_a) == hash_dict_canonical(d_b)

# Different values produce different hashes
d_c = {"model": "base", "device": "cpu"}
assert hash_dict_canonical(d_a) != hash_dict_canonical(d_c)

# None and {} hash identically (canonical-empty)
assert hash_dict_canonical(None) == hash_dict_canonical({})

# Algorithm-tagged format
result = hash_dict_canonical(d_a)
assert result.startswith("sha256:")
print(f"hash_dict_canonical(d_a) = {result}")

# Nested dicts also canonicalize stably
nested_a = {"outer": {"a": 1, "b": 2}, "trailing": True}
nested_b = {"trailing": True, "outer": {"b": 2, "a": 1}}
assert hash_dict_canonical(nested_a) == hash_dict_canonical(nested_b)

print("hash_dict_canonical tests passed")
```

## hash_dict_canonical

Hashes a dict via deterministic JSON canonicalization (sorted keys, no
whitespace) before delegating to `hash_bytes`. Two callers in the
substrate:

- **CR-8** `compute_config_schema_hash` — hashes a capability’s JSON
  Schema for drift detection (manifest-stored vs live worker).
- **CR-7** `compute_config_hash` — hashes a capability instance’s
  effective config to key `EmpiricalResourceRecord` storage by
  (instance_id, config_hash).

Same canonicalization rules → same stability guarantees across Python
versions, dict insertion orders, and machines. None is treated as `{}`
so a missing schema/config still produces a deterministic hash rather
than raising.
