What does Conv2d output with 224x224 input, kernel 7, stride 2?

Conv2d with 224x224 input, kernel_size=7, stride=2, padding=3 outputs 112x112. The formula is: output = floor((224 - 7 + 2*3) / 2) + 1 = floor(223/2) + 1 = 111 + 1 = 112. This is the exact configuration used as ResNet's first convolutional layer (conv1), which aggressively downsamples the input by 2x in each spatial dimension.

What Does Conv2d Output with 224×224 Input, Kernel 7, Stride 2?

Conv2d with 224×224 input, kernel_size=7, stride=2, padding=3 outputs 112×112. This is ResNet's first conv layer. The stride=2 halves the spatial dimensions.

Formula Breakdown

output_size = floor((input - kernel + 2*padding) / stride) + 1

Plugging in the values:

output = floor((224 - 7 + 2*3) / 2) + 1
output = floor((224 - 7 + 6) / 2) + 1
output = floor(223 / 2) + 1
output = 111 + 1
output = 112

PyTorch Code

import torch
import torch.nn as nn

# This is exactly ResNet's conv1 layer
conv = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
x = torch.randn(1, 3, 224, 224)
output = conv(x)
print(output.shape)  # torch.Size([1, 64, 112, 112])

Why ResNet Uses This

ResNet uses a large 7×7 kernel with stride=2 as its first layer to quickly reduce spatial dimensions from 224×224 to 112×112 while capturing large receptive field features. This is followed by a MaxPool2d(3, stride=2) which further reduces to 56×56. The aggressive early downsampling keeps the computational cost manageable for the deeper residual blocks that follow.

Try the Conv2d Calculator

What Does Conv2d Output with 224×224 Input, Kernel 7, Stride 2?

Formula Breakdown

PyTorch Code

Why ResNet Uses This

Related Questions