Tensor Shape Cheat Sheet: Every Layer's Input/Output Formula

April 2025 · 7 min read · By Michael Lip

This is a quick-reference guide for the output shape of every common PyTorch layer. Each entry includes the formula, the input/output format, and a worked example with concrete numbers. Bookmark this and stop re-deriving formulas every time you build a model.

You can verify any of these formulas instantly using HeyTensor's calculator.

Conv2d

Input: (N, C_in, H, W)
Output: (N, C_out, H_out, W_out)

H_out = floor((H + 2*padding - dilation*(kernel_size-1) - 1) / stride) + 1

Example: Input (1, 3, 224, 224), Conv2d(3, 64, kernel_size=7, stride=2, padding=3):

H_out = floor((224 + 2*3 - 1*(7-1) - 1) / 2) + 1
     = floor((224 + 6 - 6 - 1) / 2) + 1
     = floor(223 / 2) + 1 = 111 + 1 = 112
Output: (1, 64, 112, 112)

Conv1d

Input: (N, C_in, L)
Output: (N, C_out, L_out)

L_out = floor((L + 2*padding - dilation*(kernel_size-1) - 1) / stride) + 1

Example: Input (32, 1, 1000), Conv1d(1, 16, kernel_size=5, stride=1, padding=2):

L_out = floor((1000 + 4 - 4 - 1) / 1) + 1 = 1000
Output: (32, 16, 1000)

Linear

Input: (*, H_in) — any number of leading dimensions
Output: (*, H_out) — only the last dimension changes

Output last dim = out_features

Example: Input (32, 512), Linear(512, 10):

Output: (32, 10)

For 3D input (32, 50, 512), Linear(512, 256):

Output: (32, 50, 256)  # applied to last dim independently

LSTM

Input: (N, L, H_in) with batch_first=True
Output: (N, L, D*H_out) where D=2 if bidirectional else 1

Output features = hidden_size * num_directions

Example: Input (16, 100, 300), LSTM(300, 128, bidirectional=True, batch_first=True):

Output: (16, 100, 256)  # 128 * 2 = 256

GRU

Same formula as LSTM.

Example: Input (16, 50, 128), GRU(128, 64, batch_first=True):

Output: (16, 50, 64)

MultiheadAttention

Input: (N, L, E) with batch_first=True
Output: (N, L, E) — shape does not change

Constraint: embed_dim must be divisible by num_heads.

head_dim = embed_dim / num_heads

Example: Input (8, 50, 512), MultiheadAttention(embed_dim=512, num_heads=8):

head_dim = 512 / 8 = 64
Output: (8, 50, 512)  # same shape

BatchNorm2d / BatchNorm1d

Input: Same as output. BatchNorm does not change the tensor shape.

Parameter: num_features must equal the channel dimension of the input.

# After Conv2d(3, 64, 3):
nn.BatchNorm2d(64)  # num_features = 64 = out_channels

MaxPool2d / AvgPool2d

Input: (N, C, H, W)
Output: (N, C, H_out, W_out) — channels unchanged

H_out = floor((H + 2*padding - kernel_size) / stride) + 1

Example: Input (1, 64, 112, 112), MaxPool2d(kernel_size=2, stride=2):

H_out = floor((112 + 0 - 2) / 2) + 1 = 56
Output: (1, 64, 56, 56)

Flatten

Input: (N, C, H, W) with default start_dim=1
Output: (N, C*H*W)

flattened_features = product of all dims from start_dim to end_dim

Example: Input (32, 64, 7, 7), nn.Flatten():

Output: (32, 3136)  # 64 * 7 * 7 = 3136

Dropout

Does not change shape. Output shape equals input shape exactly.

Transpose / Permute

transpose(dim0, dim1): Swaps two dimensions.

Example: Input (32, 100, 512).transpose(1, 2):

Output: (32, 512, 100)

Reshape / View

Total number of elements must remain the same. Use -1 for one dimension to auto-calculate.

Example: Input (32, 3, 8, 8).view(32, -1):

Total elements per sample: 3 * 8 * 8 = 192
Output: (32, 192)

Concatenate (torch.cat)

Increases size along the specified dimension. All other dimensions must match.

Example: torch.cat([a, b], dim=1) where a is (32, 64, 8, 8) and b is (32, 128, 8, 8):

Output: (32, 192, 8, 8)  # 64 + 128 = 192 along dim 1

Quick Reference Table

Layer             Input Shape        Output Shape          Key Param
----------------------------------------------------------------------
Conv2d            (N,C,H,W)         (N,Co,Ho,Wo)          kernel, stride, pad
Conv1d            (N,C,L)           (N,Co,Lo)             kernel, stride, pad
Linear            (*,Hin)           (*,Hout)              out_features
LSTM              (N,L,Hin)         (N,L,D*Hh)            hidden_size, bidir
GRU               (N,L,Hin)         (N,L,D*Hh)            hidden_size, bidir
MHA               (N,L,E)           (N,L,E)               num_heads
BatchNorm         same               same                  num_features
MaxPool2d         (N,C,H,W)         (N,C,Ho,Wo)           kernel, stride
Flatten           (N,C,H,W)         (N,C*H*W)             start_dim
Dropout           any                same                  p (rate)
Transpose         any                swapped dims          dim0, dim1
Reshape           any                target shape          -1 for auto

For interactive verification of any of these formulas, use the HeyTensor calculator. You can chain multiple layers together and see the shape propagation at every step.

Part of the ML toolkit tools collection.