What Is Stride in Conv2d?
Stride controls how far the kernel moves between positions. stride=2 halves the output size. stride=1 (default) keeps the size with proper padding.
How Stride Affects Output Size
output_size = floor((input - kernel + 2*padding) / stride) + 1
# 224x224 input, kernel=3, padding=1
stride=1: floor((224 - 3 + 2) / 1) + 1 = 224 (same size)
stride=2: floor((224 - 3 + 2) / 2) + 1 = 112 (half size)
stride=4: floor((224 - 3 + 2) / 4) + 1 = 56 (quarter size)
PyTorch Examples
import torch
import torch.nn as nn
x = torch.randn(1, 3, 224, 224)
# stride=1: preserves spatial size
conv_s1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
print(conv_s1(x).shape) # [1, 64, 224, 224]
# stride=2: halves spatial size
conv_s2 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1)
print(conv_s2(x).shape) # [1, 64, 112, 112]
# stride=2 with kernel=7: ResNet conv1
conv_resnet = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
print(conv_resnet(x).shape) # [1, 64, 112, 112]
Stride vs Pooling for Downsampling
- MaxPool2d(2, 2) — takes the max value in each 2×2 window, no learnable parameters
- Conv2d(stride=2) — learnable downsampling, the network learns how to downsample
Modern architectures (ResNet, EfficientNet) prefer strided convolutions over pooling because the network can learn what information to preserve during downsampling.
Asymmetric Strides
# Different stride for height and width
conv = nn.Conv2d(3, 64, kernel_size=3, stride=(2, 1), padding=1)
x = torch.randn(1, 3, 224, 224)
print(conv(x).shape) # [1, 64, 112, 224] — height halved, width same