What Does Conv2d Output with 128×128 Input, Kernel 5, Stride 2?

Conv2d with 128×128 input, kernel_size=5, stride=2, padding=2 outputs 64×64. The formula gives: floor((128 + 2×2 - 5) / 2) + 1 = 64.

Formula Breakdown

The Conv2d output size formula is:

output_size = floor((input_size - kernel_size + 2 * padding) / stride) + 1

Plugging in the values for 128×128 input:

output = floor((128 - 5 + 2*2) / 2) + 1
output = floor((128 - 5 + 4) / 2) + 1
output = floor(127 / 2) + 1
output = floor(63.5) + 1
output = 64

So the spatial dimensions go from 128×128 to 64×64.

PyTorch Code Example

import torch
import torch.nn as nn

# Define the Conv2d layer
conv = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=2, padding=2)

# Create input tensor: (batch, channels, height, width)
x = torch.randn(1, 64, 128, 128)
output = conv(x)
print(output.shape)  # torch.Size([1, 128, 64, 64])

# Verify with formula
expected = (128 + 2 * 2 - 5) // 2 + 1
print(f"Expected output size: {expected}x{expected}")  # 64x64

Architecture Context

A strided 5×5 convolution used in some GAN generators and discriminators, as well as certain Inception variants for spatial reduction.

Parameter Count

A Conv2d(64, 128, 5) layer has:

parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 64 * 128 * 5 * 5 + 128
parameters = 204,928

This layer has 204,928 trainable parameters (204800 weights + 128 bias terms).

Practical Tips

Related Questions

Try the Conv2d Calculator