PyTorch Memory Calculator

Estimate GPU VRAM usage for training — model, gradient, optimizer, activations
Off On
Off On
Model memory
Gradient memory
Optimizer state
Activation memory
Total VRAM

GPU compatibility

* Estimates assume typical transformer architecture. Activation memory depends on hidden size, layers, attention heads. This calculator uses a simplified heuristic (hidden=4096, layers=32, heads=32).

This tool provides approximate VRAM estimates for PyTorch training. Actual usage may vary. Use for initial sizing.