What Does Conv2d Output with 28×28 Input, Kernel 3, Stride 2?
Conv2d with 28×28 input, kernel_size=3, stride=2, padding=1 outputs 14×14. The formula gives: floor((28 + 2×1 - 3) / 2) + 1 = 14.
Formula Breakdown
The Conv2d output size formula is:
output_size = floor((input_size - kernel_size + 2 * padding) / stride) + 1
Plugging in the values for 28×28 input:
output = floor((28 - 3 + 2*1) / 2) + 1
output = floor((28 - 3 + 2) / 2) + 1
output = floor(27 / 2) + 1
output = floor(13.5) + 1
output = 14
So the spatial dimensions go from 28×28 to 14×14.
PyTorch Code Example
import torch
import torch.nn as nn
# Define the Conv2d layer
conv = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2, padding=1)
# Create input tensor: (batch, channels, height, width)
x = torch.randn(1, 64, 28, 28)
output = conv(x)
print(output.shape) # torch.Size([1, 128, 14, 14])
# Verify with formula
expected = (28 + 2 * 1 - 3) // 2 + 1
print(f"Expected output size: {expected}x{expected}") # 14x14
Architecture Context
This is a strided convolution that halves spatial dimensions. Modern architectures like ResNet and ConvNeXt use this instead of max-pooling for downsampling.
Parameter Count
A Conv2d(64, 128, 3) layer has:
parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 64 * 128 * 3 * 3 + 128
parameters = 73,856
This layer has 73,856 trainable parameters (73728 weights + 128 bias terms).
Practical Tips
- Memory usage: The output feature map for a single image is 128 × 14 × 14 = 25,088 float values (0.10 MB in float32).
- Batch dimension: Multiply memory by batch size. A batch of 32 uses 3.1 MB for this layer's output alone.
- Same padding rule: For any kernel, setting padding = (kernel_size - 1) / 2 with stride=1 preserves spatial dimensions.