# Cache Paths


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## \_sanitize_stem

Replace filesystem-unsafe characters in an input file’s stem so the
resulting directory name is portable across Linux / macOS / Windows.
Length-cap mitigates pathologically long source filenames.

## Stat-cache for content hashes

Computing a SHA-256 over a multi-hour podcast WAV (~1–2 GB) takes a few
seconds even with `hash_file`’s streaming reader. Per-cache-lookup
hashing would be untenable for chained-plugin workflows where the same
input might be referenced dozens of times in a single workflow run.

The substrate maintains a small SQLite stat-cache at
`<substrate_data_dir>/input_hash_cache.db` mapping
`(absolute_path, mtime_ns, size)` → `content_hash`. Lookups that hit the
cache return in microseconds; cold lookups compute the hash once and
write it back.

The cache uses a module-level `threading.Lock` to serialize SQLite
writes (SQLite handles concurrent reads fine, but writes from multiple
threads need coordination at the Python layer to avoid
`database is locked` errors). Reads are still fast because SQLite’s WAL
mode (set on connect) allows concurrent readers + one writer.

`mtime_ns` is preferred over `mtime` (float seconds) — nanosecond
precision distinguishes a fast write-twice operation that a 1-second
mtime resolution would conflate. `size` is the secondary check —
paranoid defense against filesystem mtime resolution issues.

## cache_dir_for_config

The main entry point. Returns (and optionally creates) a deterministic
per-(input-content, config) cache directory.

Plugins use this in lieu of hand-rolled
`<plugin_data_dir>/<action>/<stem>` output-path derivation. The user’s
ffmpeg segmentation bug (`segment_audio` with different
`max_segment_duration` values overwriting each other in the same
directory) is the canonical motivating example — fixing it requires the
config to enter the cache key, which this helper makes mandatory by
construction.

------------------------------------------------------------------------

### cache_dir_for_config

``` python

def cache_dir_for_config(
    plugin_data_dir:Union, # The plugin's own data subdirectory (typically <cfg.plugin_data_dir>/<plugin_name>)
    input_path:Union, # The input file the plugin operates on
    action:str, # The plugin action name (e.g., "segment_audio", "convert", "execute")
    config_dict:Dict, # The plugin's effective config for this action
    input_hash_length:int=6, # Truncation length for the input content hash in the directory name
    config_hash_length:int=12, # Truncation length for the config hash in the directory name
    create:bool=True, # Auto-create the directory (parents=True, exist_ok=True)
    hash_input_content:bool=True, # If False, hash str(input_path) instead (e.g., URL inputs)
    skip_input_cache:bool=False, # If True, bypass the stat-cache (always recompute content hash)
)->Path: # The deterministic cache directory path

```

*Return (and optionally create) a per-(input-content, config) cache
directory.*

Path layout::

    <plugin_data_dir>/<action>/<sanitized-stem>/<input_hash[:N]>_<config_hash[:M]>/

The same `(input_content, action, config_dict)` always resolves to the
same path; any change to input content OR config produces a different
path. This means:

1.  Different configs go to different directories — no silent overwrite.
2.  Stale-artifact accumulation is impossible — each unique
    `(input_content, config)` tuple has its OWN directory.
3.  For chained plugin sequences, upstream config changes propagate
    through content changes: if plugin A’s output content depends on A’s
    config and plugin B reads that output, B’s cache key automatically
    reflects A’s config indirectly.

`hash_input_content=False` switches to hashing the string form of
`input_path` instead of file content — for plugins whose “input” is a
URL, a database row ID, or another non-file identifier. Sequence
chaining via content propagation only works for true file inputs.

`skip_input_cache=True` recomputes the input content hash even if the
stat-cache has a record. Useful for plugins that just wrote the input
file and want to record its canonical hash without stale-cache risk.

Raises FileNotFoundError if `input_path` doesn’t exist and
`hash_input_content=True`. Raises OSError on directory-create failure
when `create=True`.

## list_cache_entries + prune_cache_for_input

Operator-facing affordances for inspecting and cleaning up the cache.
The `<plugin_data_dir>/<action>/<stem>/` parent contains one directory
per unique config variant the plugin has been run with.
`list_cache_entries` enumerates them; `prune_cache_for_input` deletes
them (optionally preserving a specified set).

------------------------------------------------------------------------

### prune_cache_for_input

``` python

def prune_cache_for_input(
    plugin_data_dir:Union, # The plugin's own data subdirectory
    input_path:Union, # The input file whose cache entries to prune
    action:str, # The plugin action name
    keep:Optional=None, # Paths to preserve through the sweep (returns by list_cache_entries)
    dry_run:bool=False, # If True, return what WOULD be deleted without touching filesystem
)->List: # Paths that were (or would be) deleted

```

*Delete per-config cache directories for `(input, action)`, optionally*
preserving a `keep` set.

Pairs with `list_cache_entries` for inspect-then-prune workflows: list
candidates, choose which to keep, then call prune with the keep set.
`keep=None` deletes ALL entries.

`dry_run=True` returns the would-delete list without touching the
filesystem — useful for operator confirmation before destructive ops.

Returns the list of deleted (or would-delete) paths.

------------------------------------------------------------------------

### list_cache_entries

``` python

def list_cache_entries(
    plugin_data_dir:Union, # The plugin's own data subdirectory
    input_path:Union, # The input file whose cache entries to list
    action:str, # The plugin action name
)->List: # All config-hash directories for this (input, action)

```

*Enumerate all per-config cache directories for a given (input,
action).*

Returns the paths of every `<input_hash>_<config_hash>` directory under
`<plugin_data_dir>/<action>/<sanitized-stem>/`. Each entry corresponds
to a unique `(input_content, config)` tuple — operators can inspect
their contents, diff them, or pass selected ones to
`prune_cache_for_input` to keep them through a sweep.

Returns an empty list if the parent directory doesn’t exist (plugin
never ran this action for this input).

## Tests

Exercise the helpers end-to-end against a tempdir + tempfile so the
cache_paths module doesn’t depend on any specific plugin’s data dir.
