Receptive Field Calculator for Stacked CNN Layers

Add your Conv2d and MaxPool2d layers below to compute the effective receptive field, feature jump, and start offset at every stage of a PyTorch convolutional network. Output updates live as you edit.

#TypeKernelStridePaddingDilation RFJumpStart
Receptive field (px)
Jump / stride product
Start offset (px)
Layers

How the receptive field is computed

The receptive field (RF) of a unit in a convolutional feature map is the size of the region in the original input image that influences that unit's value. Stacking convolutions and pooling layers compounds the RF non-linearly, which is why a 50-layer ResNet can "see" hundreds of pixels even though each kernel is only 3×3. This tool walks the network layer by layer and tracks three quantities introduced in Dang-Ha's RF analysis: the receptive field size r, the jump j (the pixel distance between two adjacent features in the current map, equal to the cumulative product of strides), and the start s (the center coordinate of the first feature relative to the input).

For each layer with kernel k, stride S, padding P and dilation d, the effective kernel becomes k_eff = d·(k−1)+1, and the recurrence applied to the previous layer's values is:

j_out = j_in × S r_out = r_in + (k_eff − 1) × j_in s_out = s_in + ((k_eff − 1)/2 − P) × j_in

Initialization is j = 1, r = 1, s = 0.5 at the input. The key insight that competitor "output-size" calculators miss is that the RF growth is driven by the incoming jump j_in, not the layer's own stride. A 3×3 conv placed after two stride-2 pools adds (3−1)×4 = 8 pixels of RF per layer, while the same conv at the input adds only 2. The start offset s tells you whether your feature map is spatially aligned with the input or shifted — a negative or fractional s often signals an off-by-one padding choice that will quietly misregister your detections.

Dilation widens the effective kernel without adding parameters, so a dilated stack grows the RF aggressively while keeping the jump fixed — the standard trick behind dense-prediction backbones. Use the per-layer columns to find exactly which layer first covers your object scale, then prune or dilate accordingly.

Related Tools

Conv2d Output Shape Calculator MaxPool2d Output Calculator Activation Functions Explorer