import logging as _logging
_log = _logging.getLogger("cjm_substrate_torch_utils.test")
class _FakeModule:
def __init__(self): self.moved_to = None
def to(self, d): self.moved_to = d; return self
class _FakeProcessor: # no .to() -- like an AutoProcessor / tokenizer
pass
class _Holder:
pass
# device="cuda": .to('cpu') attempted on objects that have it; attrs cleared.
_h = _Holder(); _h.model = _FakeModule(); _h.processor = _FakeProcessor()
_m = _h.model
release_model(_h, ["model", "processor"], device="cuda", logger=_log)
assert _h.model is None and _h.processor is None
assert _m.moved_to == "cpu"
# Missing / already-None attrs are skipped; idempotent (no error on re-call).
release_model(_h, ["model", "processor", "does_not_exist"], device="cuda", logger=_log)
assert _h.model is None
# device="cpu": no .to('cpu') attempted.
_h2 = _Holder(); _h2.model = _FakeModule(); _m2 = _h2.model
release_model(_h2, ["model"], device="cpu", logger=_log)
assert _h2.model is None
assert _m2.moved_to is NoneGPU model release
release_model is the single source of truth for the move-to-CPU / del / gc / empty_cache / synchronize sequence that every torch GPU capability reimplements (Whisper, Voxtral-HF, Qwen3-FA, Demucs, LavaSR). Capabilities call it from _release_<trigger> (CR-4 reconfigure), on_disable (CR-2), and cleanup.
It is best-effort: per-attribute .to('cpu') failures and CUDA-cleanup failures are logged and never raised, so teardown can’t mask the original code path. Objects without a .to method (processors, tokenizers) are dropped without the CPU move.
release_model
def release_model(
obj:Any, # The capability instance holding the model attribute(s)
model_attr_names:List, # Names of the attributes to release, in release order
device:str='cuda', # Device the model is on; gates the CUDA-specific cleanup
logger:Logger, # Logger for best-effort failure reporting
)->None:
Release one or more model objects: move to CPU, drop references, gc, free CUDA cache.
For each name in model_attr_names, if obj has a non-None attribute: 1. when on CUDA, best-effort .to('cpu') (frees GPU tensors; skipped for objects without a .to method, e.g. processors/tokenizers), 2. setattr(obj, name, None) and drop the local reference. Then a single gc.collect() and — on CUDA — empty_cache() + synchronize().
Best-effort throughout: failures are logged and swallowed. Missing or already-None attributes are skipped, so the call is idempotent.
Tests.