How to Fix "Sizes of tensors must match except in dimension" in PyTorch

The error Sizes of tensors must match except in dimension means when concatenating tensors with torch.cat() or torch.stack(), all non-concatenation dimensions must match exactly. For example, torch.cat([tensor_a, tensor_b], dim=0) requires both tensors to have the same shape in dimensions 1, 2, etc.

What Causes This Error

When concatenating tensors with torch.cat() or torch.stack(), all non-concatenation dimensions must match exactly. For example, torch.cat([tensor_a, tensor_b], dim=0) requires both tensors to have the same shape in dimensions 1, 2, etc.

Scenario 1: Concatenating Feature Maps of Different Sizes

Skip connections or multi-scale features may produce tensors of different spatial sizes.

The Error

features_high = torch.randn(1, 64, 32, 32)  # 32x32
features_low = torch.randn(1, 64, 16, 16)   # 16x16
combined = torch.cat([features_high, features_low], dim=1)
# RuntimeError: Sizes of tensors must match except in dimension 1.
# Expected size 32 but got size 16 for tensor number 1 in the list

The Fix

import torch.nn.functional as F

features_high = torch.randn(1, 64, 32, 32)
features_low = torch.randn(1, 64, 16, 16)

# Option 1: Upsample the smaller tensor
features_low_up = F.interpolate(features_low, size=(32, 32), mode='bilinear', align_corners=False)
combined = torch.cat([features_high, features_low_up], dim=1)  # Works: [1, 128, 32, 32]

# Option 2: Downsample the larger tensor
features_high_down = F.adaptive_avg_pool2d(features_high, (16, 16))
combined = torch.cat([features_high_down, features_low], dim=1)  # Works: [1, 128, 16, 16]

In U-Net and FPN architectures, feature maps at different scales must be resized before concatenation. Use F.interpolate for upsampling or adaptive pooling for downsampling.

Scenario 2: Batching Sequences of Different Lengths

NLP tasks often have variable-length sequences that cannot be directly concatenated.

The Error

seq1 = torch.randn(5, 768)   # 5 tokens
seq2 = torch.randn(8, 768)   # 8 tokens
batch = torch.stack([seq1, seq2])
# RuntimeError: Sizes of tensors must match except in dimension 0

The Fix

# Option 1: Pad to maximum length
from torch.nn.utils.rnn import pad_sequence

seq1 = torch.randn(5, 768)
seq2 = torch.randn(8, 768)
batch = pad_sequence([seq1, seq2], batch_first=True)  # [2, 8, 768], padded with zeros

# Option 2: Truncate to minimum length
min_len = min(seq1.size(0), seq2.size(0))
batch = torch.stack([seq1[:min_len], seq2[:min_len]])  # [2, 5, 768]

pad_sequence pads shorter tensors with zeros to match the longest. Use attention masks to ignore padded positions during training.

Scenario 3: Residual Connection Shape Mismatch

Skip/residual connections require the input and output to have identical shapes.

The Error

class Block(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(64, 128, 3, padding=1)  # Changes channels!

    def forward(self, x):  # x: [B, 64, H, W]
        return x + self.conv(x)  # Error! [B, 64, H, W] + [B, 128, H, W]
# RuntimeError: Sizes of tensors must match except in dimension

The Fix

class Block(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(64, 128, 3, padding=1)
        self.shortcut = nn.Conv2d(64, 128, 1)  # 1x1 conv to match channels

    def forward(self, x):
        return self.shortcut(x) + self.conv(x)  # Both [B, 128, H, W]. Works!

ResNet uses 1x1 convolutions (projection shortcuts) to match channel dimensions when the residual path changes the number of channels.

Quick Debugging Checklist

# Enable anomaly detection to find the exact line
torch.autograd.set_detect_anomaly(True)

# Check tensor properties
print(f"dtype: {tensor.dtype}, device: {tensor.device}, shape: {tensor.shape}")
print(f"requires_grad: {tensor.requires_grad}")

Related Questions

Try the Shape Mismatch Solver