What Does Conv2d Output with 64×64 Input, Kernel 5?
Conv2d with 64×64 input, kernel_size=5, stride=1, padding=2 outputs 64×64. This is a “same” convolution — the output has the same spatial dimensions as the input. The formula gives: floor((64 + 2×2 - 5) / 1) + 1 = 64.
Formula Breakdown
The Conv2d output size formula is:
output_size = floor((input_size - kernel_size + 2 * padding) / stride) + 1
Plugging in the values for 64×64 input:
output = floor((64 - 5 + 2*2) / 1) + 1
output = floor((64 - 5 + 4) / 1) + 1
output = floor(63 / 1) + 1
output = floor(63) + 1
output = 64
So the spatial dimensions go from 64×64 to 64×64.
PyTorch Code Example
import torch
import torch.nn as nn
# Define the Conv2d layer
conv = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=2)
# Create input tensor: (batch, channels, height, width)
x = torch.randn(1, 64, 64, 64)
output = conv(x)
print(output.shape) # torch.Size([1, 128, 64, 64])
# Verify with formula
expected = (64 + 2 * 2 - 5) // 1 + 1
print(f"Expected output size: {expected}x{expected}") # 64x64
Architecture Context
A 5×5 kernel with padding=2 preserves spatial dimensions (same convolution). Used in early layers of Inception/GoogLeNet modules.
Parameter Count
A Conv2d(64, 128, 5) layer has:
parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 64 * 128 * 5 * 5 + 128
parameters = 204,928
This layer has 204,928 trainable parameters (204800 weights + 128 bias terms).
Practical Tips
- Memory usage: The output feature map for a single image is 128 × 64 × 64 = 524,288 float values (2.00 MB in float32).
- Batch dimension: Multiply memory by batch size. A batch of 32 uses 64.0 MB for this layer's output alone.
- Same padding rule: For any kernel, setting padding = (kernel_size - 1) / 2 with stride=1 preserves spatial dimensions.