Every PyTorch Shape Error Explained (With Fixes)
Shape errors are the most common category of PyTorch RuntimeErrors. They happen when tensor dimensions do not match what a layer or operation expects. The error messages contain the information you need to fix them, but the wording can be cryptic if you do not know what to look for.
This guide covers the 10 most common shape-related errors in PyTorch 2.x. Each entry includes the exact error text, what causes it, and how to fix it with real code.
1. mat1 and mat2 shapes cannot be multiplied
Error text:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x512 and 256x10)
Cause: A nn.Linear layer's in_features does not match the actual input size. In this example, the input has 512 features per sample but the Linear layer expects 256.
Fix: Change in_features to match the actual input size. To find the correct value, print the tensor shape before the Linear layer.
# Before:
self.fc = nn.Linear(256, 10)
# After:
self.fc = nn.Linear(512, 10)
# Or debug with:
print(x.shape) # torch.Size([64, 512])
self.fc = nn.Linear(x.shape[1], 10)
This is the single most common PyTorch shape error. It almost always involves a nn.Linear layer receiving flattened CNN features. Use HeyTensor's Chain Mode to trace the exact shape through your layers and find the correct in_features value.
2. Expected input batch_size (X) to match target batch_size (Y)
Error text:
RuntimeError: Expected input batch_size (32) to match target batch_size (16)
Cause: The model output and the labels have different first-dimension sizes. This happens when a reshape operation accidentally merges or splits the batch dimension.
Fix: Check for .view() or .reshape() calls that might change the batch size. Ensure your data loader returns matching batches.
# Wrong — might change batch size:
x = x.view(-1, num_classes)
# Right — preserve batch size:
x = x.view(x.size(0), -1)
3. size mismatch, m1: [AxB], m2: [CxD]
Error text:
RuntimeError: size mismatch, m1: [32 x 9216], m2: [1024 x 128]
Cause: Same root cause as error 1, but from an older PyTorch version or from torch.mm() directly. The columns of the first matrix (9216) must equal the rows of the second (1024).
Fix: Set in_features=9216 in your Linear layer, or adjust the previous layer to output 1024 features.
4. Expected 4D input (got 2D input)
Error text:
RuntimeError: Expected 4D (got 2D) input to Conv2d
Cause: nn.Conv2d expects input shaped as (batch, channels, height, width). A 2D tensor like (batch, features) cannot be processed by a Conv layer.
Fix: Reshape or unsqueeze to add the missing dimensions.
# If you have grayscale images stored as (batch, 784):
x = x.view(batch_size, 1, 28, 28)
# Or for single images without batch dim:
x = x.unsqueeze(0) # adds batch dim
5. shape '[X]' is invalid for input of size Y
Error text:
RuntimeError: shape '[32, 3, 32, 32]' is invalid for input of size 49152
Cause: The product of target dimensions (32 * 3 * 32 * 32 = 98304) does not equal the total number of elements (49152). The math does not work out.
Fix: Use -1 in one dimension to let PyTorch calculate it automatically, or compute the correct shape manually.
# Let PyTorch figure out one dimension:
x = x.view(batch_size, 3, -1) # auto-calculates last dim
# Or compute: 49152 / 32 = 1536 elements per sample
x = x.view(32, 1536)
6. Given groups=1, weight of size [64, 3, 3, 3]
Error text:
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[1, 1, 28, 28] to have 3 channels, but got 1 channels instead
Cause: The Conv2d layer was defined with in_channels=3 (from the weight shape) but received a single-channel input.
Fix: Match in_channels to your actual data. Grayscale images have 1 channel, RGB images have 3.
# For grayscale:
self.conv1 = nn.Conv2d(1, 64, kernel_size=3)
# For RGB:
self.conv1 = nn.Conv2d(3, 64, kernel_size=3)
7. The size of tensor a (X) must match the size of tensor b (Y)
Error text:
RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 1
Cause: An element-wise operation (add, multiply, etc.) between two tensors with incompatible shapes at the specified dimension. Broadcasting rules require matching sizes or size 1.
Fix: Reshape one tensor so dimensions align, or check that the tensors come from matching operations.
# If adding a skip connection:
# residual shape: [batch, 64, 32, 32]
# x shape: [batch, 128, 16, 16] # mismatch!
# Fix with 1x1 conv to match channels and spatial dims:
self.downsample = nn.Sequential(
nn.Conv2d(64, 128, 1, stride=2),
nn.BatchNorm2d(128)
)
8. Dimension out of range
Error text:
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
Cause: Calling an operation on a dimension that does not exist. A 2D tensor has dimensions 0 and 1 (or -1 and -2). Requesting dimension 2 fails.
Fix: Print tensor.dim() and check which dimensions are valid.
# Wrong — 2D tensor has no dim=2:
x = torch.softmax(x, dim=2)
# Right:
x = torch.softmax(x, dim=1) # or dim=-1
9. Expected hidden size (2, 32, 256), got (2, 64, 256)
Error text:
RuntimeError: Expected hidden[0] size (2, 32, 256), got [2, 64, 256]
Cause: When passing hidden states between LSTM/GRU forward calls, the batch size of the hidden state does not match the new input batch size. This commonly happens at the end of an epoch when the last batch is smaller.
Fix: Either pad the last batch or re-initialize hidden states at each batch.
# Re-initialize hidden state each batch:
def init_hidden(self, batch_size):
return (torch.zeros(2, batch_size, 256),
torch.zeros(2, batch_size, 256))
10. multi-target not supported in CrossEntropyLoss
Error text:
RuntimeError: multi-target not supported at /pytorch/aten/src/THNN/generic/ClassNLLCriterion.c
Cause: nn.CrossEntropyLoss expects targets as class indices of shape (batch,), not one-hot encoded tensors of shape (batch, num_classes).
Fix: Convert one-hot targets to class indices, or use nn.BCEWithLogitsLoss for multi-label classification.
# Convert one-hot to indices:
targets = targets.argmax(dim=1)
# Or use BCEWithLogitsLoss for multi-label:
criterion = nn.BCEWithLogitsLoss()
Debugging Strategy
When you hit a shape error you cannot immediately parse, follow this process:
- Add
print(x.shape)before and after each layer in your forward method. - Use HeyTensor's Chain Mode to model your architecture and see shapes at every step.
- Paste the exact error into HeyTensor's Error Debugger to extract dimensions and get a targeted fix.
- Check for accidental dimension changes from
.view(),.reshape(), or.permute()calls.
Most shape errors come from three places: the transition from convolutional to linear layers (forgetting to flatten or using wrong in_features), mismatched skip connections in ResNets, and incorrect hidden state handling in RNNs. Build your architectures incrementally and verify shapes as you go.
Try the A/B test calculator at abwex.com if you are running experiments on model variants.