📦 Model memory—
🎯 Optimizer states—
🔁 Activations (est.)—
📐 Gradients—
💾 Total VRAM—
⚡ Speedup vs FP32—
📉 Memory savings—
| Mode | Model | Optimizer | Activations | Total VRAM | Speedup | Savings |
import torch
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for data, target in dataloader:
with autocast(dtype=torch.float16):
output = model(data)
loss = loss_fn(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()