Tensor Shape Cheat Sheet: Every Layer's Input/Output Formula
This is a quick-reference guide for the output shape of every common PyTorch layer. Each entry includes the formula, the input/output format, and a worked example with concrete numbers. Bookmark this and stop re-deriving formulas every time you build a model.
You can verify any of these formulas instantly using HeyTensor's calculator.
Conv2d
Input: (N, C_in, H, W)
Output: (N, C_out, H_out, W_out)
H_out = floor((H + 2*padding - dilation*(kernel_size-1) - 1) / stride) + 1
Example: Input (1, 3, 224, 224), Conv2d(3, 64, kernel_size=7, stride=2, padding=3):
H_out = floor((224 + 2*3 - 1*(7-1) - 1) / 2) + 1
= floor((224 + 6 - 6 - 1) / 2) + 1
= floor(223 / 2) + 1 = 111 + 1 = 112
Output: (1, 64, 112, 112)
Conv1d
Input: (N, C_in, L)
Output: (N, C_out, L_out)
L_out = floor((L + 2*padding - dilation*(kernel_size-1) - 1) / stride) + 1
Example: Input (32, 1, 1000), Conv1d(1, 16, kernel_size=5, stride=1, padding=2):
L_out = floor((1000 + 4 - 4 - 1) / 1) + 1 = 1000
Output: (32, 16, 1000)
Linear
Input: (*, H_in) — any number of leading dimensions
Output: (*, H_out) — only the last dimension changes
Output last dim = out_features
Example: Input (32, 512), Linear(512, 10):
Output: (32, 10)
For 3D input (32, 50, 512), Linear(512, 256):
Output: (32, 50, 256) # applied to last dim independently
LSTM
Input: (N, L, H_in) with batch_first=True
Output: (N, L, D*H_out) where D=2 if bidirectional else 1
Output features = hidden_size * num_directions
Example: Input (16, 100, 300), LSTM(300, 128, bidirectional=True, batch_first=True):
Output: (16, 100, 256) # 128 * 2 = 256
GRU
Same formula as LSTM.
Example: Input (16, 50, 128), GRU(128, 64, batch_first=True):
Output: (16, 50, 64)
MultiheadAttention
Input: (N, L, E) with batch_first=True
Output: (N, L, E) — shape does not change
Constraint: embed_dim must be divisible by num_heads.
head_dim = embed_dim / num_heads
Example: Input (8, 50, 512), MultiheadAttention(embed_dim=512, num_heads=8):
head_dim = 512 / 8 = 64
Output: (8, 50, 512) # same shape
BatchNorm2d / BatchNorm1d
Input: Same as output. BatchNorm does not change the tensor shape.
Parameter: num_features must equal the channel dimension of the input.
# After Conv2d(3, 64, 3):
nn.BatchNorm2d(64) # num_features = 64 = out_channels
MaxPool2d / AvgPool2d
Input: (N, C, H, W)
Output: (N, C, H_out, W_out) — channels unchanged
H_out = floor((H + 2*padding - kernel_size) / stride) + 1
Example: Input (1, 64, 112, 112), MaxPool2d(kernel_size=2, stride=2):
H_out = floor((112 + 0 - 2) / 2) + 1 = 56
Output: (1, 64, 56, 56)
Flatten
Input: (N, C, H, W) with default start_dim=1
Output: (N, C*H*W)
flattened_features = product of all dims from start_dim to end_dim
Example: Input (32, 64, 7, 7), nn.Flatten():
Output: (32, 3136) # 64 * 7 * 7 = 3136
Dropout
Does not change shape. Output shape equals input shape exactly.
Transpose / Permute
transpose(dim0, dim1): Swaps two dimensions.
Example: Input (32, 100, 512).transpose(1, 2):
Output: (32, 512, 100)
Reshape / View
Total number of elements must remain the same. Use -1 for one dimension to auto-calculate.
Example: Input (32, 3, 8, 8).view(32, -1):
Total elements per sample: 3 * 8 * 8 = 192
Output: (32, 192)
Concatenate (torch.cat)
Increases size along the specified dimension. All other dimensions must match.
Example: torch.cat([a, b], dim=1) where a is (32, 64, 8, 8) and b is (32, 128, 8, 8):
Output: (32, 192, 8, 8) # 64 + 128 = 192 along dim 1
Quick Reference Table
Layer Input Shape Output Shape Key Param
----------------------------------------------------------------------
Conv2d (N,C,H,W) (N,Co,Ho,Wo) kernel, stride, pad
Conv1d (N,C,L) (N,Co,Lo) kernel, stride, pad
Linear (*,Hin) (*,Hout) out_features
LSTM (N,L,Hin) (N,L,D*Hh) hidden_size, bidir
GRU (N,L,Hin) (N,L,D*Hh) hidden_size, bidir
MHA (N,L,E) (N,L,E) num_heads
BatchNorm same same num_features
MaxPool2d (N,C,H,W) (N,C,Ho,Wo) kernel, stride
Flatten (N,C,H,W) (N,C*H*W) start_dim
Dropout any same p (rate)
Transpose any swapped dims dim0, dim1
Reshape any target shape -1 for auto
For interactive verification of any of these formulas, use the HeyTensor calculator. You can chain multiple layers together and see the shape propagation at every step.
Part of the ML toolkit tools collection.