Batch Size Optimizer

Recommended Max Batch Size

per GPU / per accumulation step

Total Batch Size (with Grad Accumulation)

256

across accumulation steps

Memory Utilization

78%

of available VRAM

Memory Breakdown

Model Weights 28.0 GB

Activations (batch) 12.0 GB

Gradients 28.0 GB

Optimizer States 14.0 GB

Total Per-Batch 18.8 GB

Recommendation. Use batch size 32 with 8 gradient accumulation steps for a total batch size of 256. This keeps VRAM usage at ~78% with headroom for intermediate computations.