What Does Conv2d Output with 7x7 Input, Kernel 7?

Conv2d with 7x7 input, kernel_size=7, stride=1, padding=0 outputs 1x1. The formula is: output = floor((input + 2*padding - kernel) / stride) + 1 = floor((7 + 2*0 - 7) / 1) + 1 = 1.

What Does Conv2d Output with 7×7 Input, Kernel 7?

Conv2d with 7×7 input, kernel_size=7, stride=1, padding=0 outputs 1×1. The formula gives: floor((7 + 2×0 - 7) / 1) + 1 = 1.

Formula Breakdown

The Conv2d output size formula is:

output_size = floor((input_size - kernel_size + 2 * padding) / stride) + 1

Plugging in the values for 7×7 input:

output = floor((7 - 7 + 2*0) / 1) + 1
output = floor((7 - 7 + 0) / 1) + 1
output = floor(0 / 1) + 1
output = floor(0) + 1
output = 1

So the spatial dimensions go from 7×7 to 1×1.

PyTorch Code Example

import torch
import torch.nn as nn

# Define the Conv2d layer
conv = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=7, stride=1, padding=0)

# Create input tensor: (batch, channels, height, width)
x = torch.randn(1, 64, 7, 7)
output = conv(x)
print(output.shape)  # torch.Size([1, 128, 1, 1])

# Verify with formula
expected = (7 + 2 * 0 - 7) // 1 + 1
print(f"Expected output size: {expected}x{expected}")  # 1x1

Architecture Context

This reduces a 7×7 feature map to 1×1 — equivalent to global average pooling. Found at the end of ResNet before the fully connected layer.

Parameter Count

A Conv2d(64, 128, 7) layer has:

parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 64 * 128 * 7 * 7 + 128
parameters = 401,536

This layer has 401,536 trainable parameters (401408 weights + 128 bias terms).

Practical Tips

Memory usage: The output feature map for a single image is 128 × 1 × 1 = 128 float values (0.00 MB in float32).
Batch dimension: Multiply memory by batch size. A batch of 32 uses 0.0 MB for this layer's output alone.
Same padding rule: For any kernel, setting padding = (kernel_size - 1) / 2 with stride=1 preserves spatial dimensions.

Try the Conv2d Calculator