What Does Conv2d Output with 224×224 Input, Kernel 3, Stride 2?

Conv2d with 224×224 input, kernel_size=3, stride=2, padding=1 outputs 112×112. The formula gives: floor((224 + 2×1 - 3) / 2) + 1 = 112.

Formula Breakdown

The Conv2d output size formula is:

output_size = floor((input_size - kernel_size + 2 * padding) / stride) + 1

Plugging in the values for 224×224 input:

output = floor((224 - 3 + 2*1) / 2) + 1
output = floor((224 - 3 + 2) / 2) + 1
output = floor(223 / 2) + 1
output = floor(111.5) + 1
output = 112

So the spatial dimensions go from 224×224 to 112×112.

PyTorch Code Example

import torch
import torch.nn as nn

# Define the Conv2d layer
conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2, padding=1)

# Create input tensor: (batch, channels, height, width)
x = torch.randn(1, 3, 224, 224)
output = conv(x)
print(output.shape)  # torch.Size([1, 64, 112, 112])

# Verify with formula
expected = (224 + 2 * 1 - 3) // 2 + 1
print(f"Expected output size: {expected}x{expected}")  # 112x112

Architecture Context

This is a strided convolution that halves spatial dimensions. Modern architectures like ResNet and ConvNeXt use this instead of max-pooling for downsampling.

Parameter Count

A Conv2d(3, 64, 3) layer has:

parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 3 * 64 * 3 * 3 + 64
parameters = 1,792

This layer has 1,792 trainable parameters (1728 weights + 64 bias terms).

Practical Tips

Related Questions

Try the Conv2d Calculator