_fake_oom = RuntimeError("CUDA out of memory. Tried to allocate 2.00 GiB")
# Returns a typed CapabilityResourceError with a populated gpu_vram_mb shortfall.
_err = cuda_oom_to_capability_resource_error(_fake_oom, label="loading test model")
assert isinstance(_err, CapabilityResourceError)
assert _err.resource_shortfall is not None
assert _err.resource_shortfall.resource == "gpu_vram_mb"
assert _err.resource_shortfall.available >= 0.0
# needed == available + default headroom (100.0), regardless of GPU presence.
assert _err.resource_shortfall.needed == _err.resource_shortfall.available + 100.0
assert "loading test model" in str(_err)
# Custom headroom is honored.
_err2 = cuda_oom_to_capability_resource_error(_fake_oom, label="x", headroom_mb=250.0)
assert _err2.resource_shortfall.needed == _err2.resource_shortfall.available + 250.0
# The raise-from idiom works and preserves the cause.
try:
raise cuda_oom_to_capability_resource_error(_fake_oom, label="infer") from _fake_oom
except CapabilityResourceError as _e:
assert _e.__cause__ is _fake_oomCUDA OOM handling
CapabilityResourceError (SG-47 Track B) so CR-7 reactive retry can evict and reload.
A torch GPU capability wraps its risky inference / model-load call in a try/except torch.cuda.OutOfMemoryError and converts the caught error with this helper, then re-raises it preserving the original cause. The substrate’s CR-7 reactive-retry path (always-reload-then-retry on CapabilityResourceError) consumes the typed error and its ResourceShortfall.
This is the single source of truth for the ~8-line OOM-conversion block that Voxtral-HF (3 sites), Qwen3-FA, LavaSR, Whisper, and Demucs each reimplement.
cuda_oom_to_capability_resource_error
def cuda_oom_to_capability_resource_error(
exc:BaseException, # The caught CUDA OOM exception (e.g. torch.cuda.OutOfMemoryError)
label:str, # Context for the message, e.g. "loading model 'X'" or "inference"
headroom_mb:float=100.0, # Best-effort margin added to `available` to estimate `needed`
)->CapabilityResourceError: # Typed error for the substrate's CR-7 reactive-retry path
Convert a CUDA out-of-memory exception into a substrate-typed CapabilityResourceError.
SG-47 Track B: a capability’s GPU inference / model-load site catches torch.cuda.OutOfMemoryError and re-raises the result of this helper so the substrate sees a typed resource error (evict + reload + retry via CR-7) instead of an opaque crash.
needed is a best-effort estimate (available + headroom_mb): the true required VRAM is unknowable from the exception, and CR-7 triggers eviction regardless of magnitude, so an approximation above available is sufficient.
The caller raises the returned error, preserving the original cause:
try:
model = Model.from_pretrained(repo_id, ...)
except torch.cuda.OutOfMemoryError as e:
raise cuda_oom_to_capability_resource_error(e, label=f"loading {repo_id!r}") from e
Tests.