How to Fix "Sizes of tensors must match except in dimension" in PyTorch
The error Sizes of tensors must match except in dimension means when concatenating tensors with torch.cat() or torch.stack(), all non-concatenation dimensions must match exactly. For example, torch.cat([tensor_a, tensor_b], dim=0) requires both tensors to have the same shape in dimensions 1, 2, etc.
What Causes This Error
When concatenating tensors with torch.cat() or torch.stack(), all non-concatenation dimensions must match exactly. For example, torch.cat([tensor_a, tensor_b], dim=0) requires both tensors to have the same shape in dimensions 1, 2, etc.
Scenario 1: Concatenating Feature Maps of Different Sizes
Skip connections or multi-scale features may produce tensors of different spatial sizes.
The Error
features_high = torch.randn(1, 64, 32, 32) # 32x32
features_low = torch.randn(1, 64, 16, 16) # 16x16
combined = torch.cat([features_high, features_low], dim=1)
# RuntimeError: Sizes of tensors must match except in dimension 1.
# Expected size 32 but got size 16 for tensor number 1 in the list
The Fix
import torch.nn.functional as F
features_high = torch.randn(1, 64, 32, 32)
features_low = torch.randn(1, 64, 16, 16)
# Option 1: Upsample the smaller tensor
features_low_up = F.interpolate(features_low, size=(32, 32), mode='bilinear', align_corners=False)
combined = torch.cat([features_high, features_low_up], dim=1) # Works: [1, 128, 32, 32]
# Option 2: Downsample the larger tensor
features_high_down = F.adaptive_avg_pool2d(features_high, (16, 16))
combined = torch.cat([features_high_down, features_low], dim=1) # Works: [1, 128, 16, 16]
In U-Net and FPN architectures, feature maps at different scales must be resized before concatenation. Use F.interpolate for upsampling or adaptive pooling for downsampling.
Scenario 2: Batching Sequences of Different Lengths
NLP tasks often have variable-length sequences that cannot be directly concatenated.
The Error
seq1 = torch.randn(5, 768) # 5 tokens
seq2 = torch.randn(8, 768) # 8 tokens
batch = torch.stack([seq1, seq2])
# RuntimeError: Sizes of tensors must match except in dimension 0
The Fix
# Option 1: Pad to maximum length
from torch.nn.utils.rnn import pad_sequence
seq1 = torch.randn(5, 768)
seq2 = torch.randn(8, 768)
batch = pad_sequence([seq1, seq2], batch_first=True) # [2, 8, 768], padded with zeros
# Option 2: Truncate to minimum length
min_len = min(seq1.size(0), seq2.size(0))
batch = torch.stack([seq1[:min_len], seq2[:min_len]]) # [2, 5, 768]
pad_sequence pads shorter tensors with zeros to match the longest. Use attention masks to ignore padded positions during training.
Scenario 3: Residual Connection Shape Mismatch
Skip/residual connections require the input and output to have identical shapes.
The Error
class Block(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(64, 128, 3, padding=1) # Changes channels!
def forward(self, x): # x: [B, 64, H, W]
return x + self.conv(x) # Error! [B, 64, H, W] + [B, 128, H, W]
# RuntimeError: Sizes of tensors must match except in dimension
The Fix
class Block(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(64, 128, 3, padding=1)
self.shortcut = nn.Conv2d(64, 128, 1) # 1x1 conv to match channels
def forward(self, x):
return self.shortcut(x) + self.conv(x) # Both [B, 128, H, W]. Works!
ResNet uses 1x1 convolutions (projection shortcuts) to match channel dimensions when the residual path changes the number of channels.
Quick Debugging Checklist
- Print tensor
.dtypeand.devicebefore operations - Check for in-place operations:
+=,*=,.add_(),.mul_() - Verify shapes with
print(tensor.shape)at each step - Use
torch.autograd.set_detect_anomaly(True)to pinpoint the exact operation
# Enable anomaly detection to find the exact line
torch.autograd.set_detect_anomaly(True)
# Check tensor properties
print(f"dtype: {tensor.dtype}, device: {tensor.device}, shape: {tensor.shape}")
print(f"requires_grad: {tensor.requires_grad}")