What Are the Layer Shapes in VGG-16?

Q: What are the layer shapes in VGG-16?

VGG-16 shape trace: Input (3, 224, 224) → Conv3x3(64) → (64, 224) → Conv3x3(64) → (64, 224) → Pool → (64, 112) → Conv3x3(128) → (128, 112) → Conv3x3(128) → (128, 112) → Pool → (128, 56) → Conv3x3(256)×3 → (256, 56) → Pool → (256, 28) → Conv3x3(512)×3 → (512, 28) → Pool → (512, 14) → Conv3x3(512)×3 → (512, 14) → Pool → (512, 7) → Flatten(25088) → FC(4096) → FC(4096) → FC(1000). Total: 138M parameters.

VGG-16: 224 → 224 → 112 → 112 → 56 → 56 → 56 → 28 → 28 → 28 → 14 → 14 → 14 → 7 → 7 → 7 → Flatten(25088) → 4096 → 4096 → 1000

Complete Shape Trace

Input:                     (batch,   3, 224, 224)

# Block 1 — 2 conv layers
Conv3x3(64)  + ReLU        (batch,  64, 224, 224)
Conv3x3(64)  + ReLU        (batch,  64, 224, 224)
MaxPool2d(2, 2)             (batch,  64, 112, 112)

# Block 2 — 2 conv layers
Conv3x3(128) + ReLU        (batch, 128, 112, 112)
Conv3x3(128) + ReLU        (batch, 128, 112, 112)
MaxPool2d(2, 2)             (batch, 128,  56,  56)

# Block 3 — 3 conv layers
Conv3x3(256) + ReLU        (batch, 256,  56,  56)
Conv3x3(256) + ReLU        (batch, 256,  56,  56)
Conv3x3(256) + ReLU        (batch, 256,  56,  56)
MaxPool2d(2, 2)             (batch, 256,  28,  28)

# Block 4 — 3 conv layers
Conv3x3(512) + ReLU        (batch, 512,  28,  28)
Conv3x3(512) + ReLU        (batch, 512,  28,  28)
Conv3x3(512) + ReLU        (batch, 512,  28,  28)
MaxPool2d(2, 2)             (batch, 512,  14,  14)

# Block 5 — 3 conv layers
Conv3x3(512) + ReLU        (batch, 512,  14,  14)
Conv3x3(512) + ReLU        (batch, 512,  14,  14)
Conv3x3(512) + ReLU        (batch, 512,  14,  14)
MaxPool2d(2, 2)             (batch, 512,   7,   7)

# Classifier
Flatten                     (batch, 25088)           # 512 * 7 * 7
Linear(25088, 4096) + ReLU  (batch, 4096)
Linear(4096, 4096)  + ReLU  (batch, 4096)
Linear(4096, 1000)          (batch, 1000)

Key Observations

All conv layers use 3×3 kernels with padding=1 (preserves spatial size)
Spatial reduction comes only from MaxPool2d(2, 2): 224 → 112 → 56 → 28 → 14 → 7
Channels double at each block: 64 → 128 → 256 → 512 → 512
The Flatten produces 512 × 7 × 7 = 25,088 features
The FC layers dominate parameters: the first FC alone has 25,088 × 4,096 = 102.8M parameters
Total: ~138M parameters (most in FC layers)

Try the Parameter Counter

What Are the Layer Shapes in VGG-16?

Complete Shape Trace

Key Observations

Related Questions