# CUDA OOM handling


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

A torch GPU capability wraps its risky inference / model-load call in a
`try/except torch.cuda.OutOfMemoryError` and converts the caught error
with this helper, then re-raises it preserving the original cause. The
substrate’s CR-7 reactive-retry path (always-reload-then-retry on
`CapabilityResourceError`) consumes the typed error and its
`ResourceShortfall`.

This is the single source of truth for the ~8-line OOM-conversion block
that Voxtral-HF (3 sites), Qwen3-FA, LavaSR, Whisper, and Demucs each
reimplement.

------------------------------------------------------------------------

### cuda_oom_to_capability_resource_error

``` python

def cuda_oom_to_capability_resource_error(
    exc:BaseException, # The caught CUDA OOM exception (e.g. torch.cuda.OutOfMemoryError)
    label:str, # Context for the message, e.g. "loading model 'X'" or "inference"
    headroom_mb:float=100.0, # Best-effort margin added to `available` to estimate `needed`
)->CapabilityResourceError: # Typed error for the substrate's CR-7 reactive-retry path

```

*Convert a CUDA out-of-memory exception into a substrate-typed
`CapabilityResourceError`.*

SG-47 Track B: a capability’s GPU inference / model-load site catches
`torch.cuda.OutOfMemoryError` and re-raises the result of this helper so
the substrate sees a typed resource error (evict + reload + retry via
CR-7) instead of an opaque crash.

`needed` is a best-effort estimate (`available + headroom_mb`): the true
required VRAM is unknowable from the exception, and CR-7 triggers
eviction regardless of magnitude, so an approximation above `available`
is sufficient.

The caller raises the returned error, preserving the original cause:

    try:
        model = Model.from_pretrained(repo_id, ...)
    except torch.cuda.OutOfMemoryError as e:
        raise cuda_oom_to_capability_resource_error(e, label=f"loading {repo_id!r}") from e

Tests.

``` python
_fake_oom = RuntimeError("CUDA out of memory. Tried to allocate 2.00 GiB")

# Returns a typed CapabilityResourceError with a populated gpu_vram_mb shortfall.
_err = cuda_oom_to_capability_resource_error(_fake_oom, label="loading test model")
assert isinstance(_err, CapabilityResourceError)
assert _err.resource_shortfall is not None
assert _err.resource_shortfall.resource == "gpu_vram_mb"
assert _err.resource_shortfall.available >= 0.0
# needed == available + default headroom (100.0), regardless of GPU presence.
assert _err.resource_shortfall.needed == _err.resource_shortfall.available + 100.0
assert "loading test model" in str(_err)

# Custom headroom is honored.
_err2 = cuda_oom_to_capability_resource_error(_fake_oom, label="x", headroom_mb=250.0)
assert _err2.resource_shortfall.needed == _err2.resource_shortfall.available + 250.0

# The raise-from idiom works and preserves the cause.
try:
    raise cuda_oom_to_capability_resource_error(_fake_oom, label="infer") from _fake_oom
except CapabilityResourceError as _e:
    assert _e.__cause__ is _fake_oom
```