What Does Conv2d Output with 512×512 Input, Kernel 7, Stride 2?
Conv2d with 512×512 input, kernel_size=7, stride=2, padding=3 outputs 256×256. The formula gives: floor((512 + 2×3 - 7) / 2) + 1 = 256.
Formula Breakdown
The Conv2d output size formula is:
output_size = floor((input_size - kernel_size + 2 * padding) / stride) + 1
Plugging in the values for 512×512 input:
output = floor((512 - 7 + 2*3) / 2) + 1
output = floor((512 - 7 + 6) / 2) + 1
output = floor(511 / 2) + 1
output = floor(255.5) + 1
output = 256
So the spatial dimensions go from 512×512 to 256×256.
PyTorch Code Example
import torch
import torch.nn as nn
# Define the Conv2d layer
conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
# Create input tensor: (batch, channels, height, width)
x = torch.randn(1, 3, 512, 512)
output = conv(x)
print(output.shape) # torch.Size([1, 64, 256, 256])
# Verify with formula
expected = (512 + 2 * 3 - 7) // 2 + 1
print(f"Expected output size: {expected}x{expected}") # 256x256
Architecture Context
A 7×7 strided convolution that halves spatial dimensions. This is the classic ResNet conv1 configuration for processing large input images.
Parameter Count
A Conv2d(3, 64, 7) layer has:
parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 3 * 64 * 7 * 7 + 64
parameters = 9,472
This layer has 9,472 trainable parameters (9408 weights + 64 bias terms).
Practical Tips
- Memory usage: The output feature map for a single image is 64 × 256 × 256 = 4,194,304 float values (16.00 MB in float32).
- Batch dimension: Multiply memory by batch size. A batch of 32 uses 512.0 MB for this layer's output alone.
- Same padding rule: For any kernel, setting padding = (kernel_size - 1) / 2 with stride=1 preserves spatial dimensions.