What Does Conv2d Output with 224x224 Input, Kernel 5?

Conv2d with 224x224 input, kernel_size=5, stride=1, padding=2 outputs 224x224. The formula is: output = floor((input + 2*padding - kernel) / stride) + 1 = floor((224 + 2*2 - 5) / 1) + 1 = 224.

What Does Conv2d Output with 224×224 Input, Kernel 5?

Conv2d with 224×224 input, kernel_size=5, stride=1, padding=2 outputs 224×224. This is a “same” convolution — the output has the same spatial dimensions as the input. The formula gives: floor((224 + 2×2 - 5) / 1) + 1 = 224.

Formula Breakdown

The Conv2d output size formula is:

output_size = floor((input_size - kernel_size + 2 * padding) / stride) + 1

Plugging in the values for 224×224 input:

output = floor((224 - 5 + 2*2) / 1) + 1
output = floor((224 - 5 + 4) / 1) + 1
output = floor(223 / 1) + 1
output = floor(223) + 1
output = 224

So the spatial dimensions go from 224×224 to 224×224.

PyTorch Code Example

import torch
import torch.nn as nn

# Define the Conv2d layer
conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, stride=1, padding=2)

# Create input tensor: (batch, channels, height, width)
x = torch.randn(1, 3, 224, 224)
output = conv(x)
print(output.shape)  # torch.Size([1, 64, 224, 224])

# Verify with formula
expected = (224 + 2 * 2 - 5) // 1 + 1
print(f"Expected output size: {expected}x{expected}")  # 224x224

Architecture Context

A 5×5 kernel with padding=2 preserves spatial dimensions (same convolution). Used in early layers of Inception/GoogLeNet modules.

Parameter Count

A Conv2d(3, 64, 5) layer has:

parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 3 * 64 * 5 * 5 + 64
parameters = 4,864

This layer has 4,864 trainable parameters (4800 weights + 64 bias terms).

Practical Tips

Memory usage: The output feature map for a single image is 64 × 224 × 224 = 3,211,264 float values (12.25 MB in float32).
Batch dimension: Multiply memory by batch size. A batch of 32 uses 392.0 MB for this layer's output alone.
Same padding rule: For any kernel, setting padding = (kernel_size - 1) / 2 with stride=1 preserves spatial dimensions.

Try the Conv2d Calculator