What Does Conv2d Output with 224×224 Input and Kernel 3?
Conv2d with 224×224 input, kernel_size=3, stride=1, padding=1 outputs 224×224. The formula is: output = (224 - 3 + 2×1) / 1 + 1 = 224. This preserves spatial dimensions.
Formula Breakdown
The Conv2d output size formula is:
output_size = (input_size - kernel_size + 2 * padding) / stride + 1
Plugging in the values:
output = (224 - 3 + 2*1) / 1 + 1
output = (224 - 3 + 2) / 1 + 1
output = 223 / 1 + 1
output = 224
PyTorch Code
import torch
import torch.nn as nn
conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1)
x = torch.randn(1, 3, 224, 224)
output = conv(x)
print(output.shape) # torch.Size([1, 64, 224, 224])
Why This Matters
A 3×3 kernel with padding=1 is the standard "same" convolution that preserves spatial dimensions. This is the most common convolution configuration in modern architectures like VGG, ResNet, and DenseNet. It allows you to stack many convolutional layers without shrinking the feature maps, and only reduce spatial size through explicit pooling or strided convolutions.