Question 1

How is total VRAM calculated?

Accepted Answer

Model parameters (FP32) + gradients (same as params) + optimizer states (depends on optimizer: 2x params for SGD, 4x for Adam/AdamW) + activation memory (based on batch size, sequence length, hidden dim). Mixed precision reduces parameter and activation storage by half (FP16) or keeps FP32 for master weights.

Question 2

What speedup can I expect?

Accepted Answer

FP16/BF16 mixed precision typically yields 1.5–3x speedup on modern GPUs with tensor cores. TF32 offers ~1.2–1.5x. Actual speedup depends on hardware, batch size, and model architecture.