GPU Memory Calculator for Training

Estimate GPU VRAM required to train your neural network. Calculate memory for parameters, gradients, optimizer states, and activations. Find the right batch size for your GPU.

Built by Michael Lip

Frequently Asked Questions

How much GPU memory does training require?

Training memory ≈ model_params * bytes_per_param * multiplier + activation_memory. With float32 and Adam: multiplier is 4 (1x params + 1x grads + 2x Adam states). With mixed precision: roughly 60-70% of float32. Activations scale linearly with batch size.

How do I estimate activation memory?

Activation memory stores intermediate outputs for backpropagation. For CNNs: sum of (batch * channels * H * W * 4 bytes) across all layers. For transformers: dominated by attention maps (batch * num_heads * seq_len^2 * 4 bytes per layer). This tool provides estimates based on your architecture.

What GPU do I need for my model?

Quick guide: <10M params → any GPU (4GB+). 10M-100M params → 8-16GB (RTX 3070/4080). 100M-1B params → 24-48GB (RTX 3090/A6000). 1B+ params → multiple GPUs or 80GB (A100/H100). Mixed precision roughly halves these requirements.

How does mixed precision affect memory?

Mixed precision (torch.cuda.amp) stores model weights in float16 (2 bytes) but keeps a float32 master copy. Net effect: about 60% of full float32 memory for parameters/optimizer states. Activations are stored in float16, roughly halving activation memory.

About This Tool

This tool is part of HeyTensor, a free suite of PyTorch and deep learning utilities. All calculations run entirely in your browser — no data is sent to any server. The source code is open on GitHub.

Contact

HeyTensor is built and maintained by Michael Lip. For questions or feedback, email [email protected].

📊 Based on real data from our PyTorch Error Database — 52 errors analyzed from Stack Overflow