GPU model release

Robust move-to-CPU + drop-references + gc + CUDA-cache cleanup for releasing models, factored out of the per-capability reimplementations.

release_model is the single source of truth for the move-to-CPU / del / gc / empty_cache / synchronize sequence that every torch GPU capability reimplements (Whisper, Voxtral-HF, Qwen3-FA, Demucs, LavaSR). Capabilities call it from _release_<trigger> (CR-4 reconfigure), on_disable (CR-2), and cleanup.

It is best-effort: per-attribute .to('cpu') failures and CUDA-cleanup failures are logged and never raised, so teardown can’t mask the original code path. Objects without a .to method (processors, tokenizers) are dropped without the CPU move.


release_model


def release_model(
    obj:Any, # The capability instance holding the model attribute(s)
    model_attr_names:List, # Names of the attributes to release, in release order
    device:str='cuda', # Device the model is on; gates the CUDA-specific cleanup
    logger:Logger, # Logger for best-effort failure reporting
)->None:

Release one or more model objects: move to CPU, drop references, gc, free CUDA cache.

For each name in model_attr_names, if obj has a non-None attribute: 1. when on CUDA, best-effort .to('cpu') (frees GPU tensors; skipped for objects without a .to method, e.g. processors/tokenizers), 2. setattr(obj, name, None) and drop the local reference. Then a single gc.collect() and — on CUDA — empty_cache() + synchronize().

Best-effort throughout: failures are logged and swallowed. Missing or already-None attributes are skipped, so the call is idempotent.

Tests.

import logging as _logging
_log = _logging.getLogger("cjm_substrate_torch_utils.test")


class _FakeModule:
    def __init__(self): self.moved_to = None
    def to(self, d): self.moved_to = d; return self


class _FakeProcessor:  # no .to() -- like an AutoProcessor / tokenizer
    pass


class _Holder:
    pass


# device="cuda": .to('cpu') attempted on objects that have it; attrs cleared.
_h = _Holder(); _h.model = _FakeModule(); _h.processor = _FakeProcessor()
_m = _h.model
release_model(_h, ["model", "processor"], device="cuda", logger=_log)
assert _h.model is None and _h.processor is None
assert _m.moved_to == "cpu"

# Missing / already-None attrs are skipped; idempotent (no error on re-call).
release_model(_h, ["model", "processor", "does_not_exist"], device="cuda", logger=_log)
assert _h.model is None

# device="cpu": no .to('cpu') attempted.
_h2 = _Holder(); _h2.model = _FakeModule(); _m2 = _h2.model
release_model(_h2, ["model"], device="cpu", logger=_log)
assert _h2.model is None
assert _m2.moved_to is None