Fix "mat1 and mat2 shapes cannot be multiplied" in PyTorch

By Michael Lip · May 16, 2026 · 12 min read

TL;DR: This error means your nn.Linear layer's in_features does not match the last dimension of the input tensor. The fix: add print(x.shape) before the Linear layer, read the last number, then set nn.Linear(that_number, out_features). If you are transitioning from Conv2d to Linear, you almost certainly need to flatten first and compute in_features = out_channels * H_out * W_out.

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x2048 and 512x10)

If you have ever trained a neural network in PyTorch, you have seen this error. It is the single most common PyTorch RuntimeError, appearing in our analysis of 52 real-world PyTorch errors as the #1 reported issue across Stack Overflow, GitHub Issues, and Reddit. The error fires at the exact moment PyTorch tries to multiply two matrices whose inner dimensions do not align — and it halts your entire training pipeline.

This guide covers every scenario that produces this error, gives you code you can copy-paste to fix each one, and includes an interactive shape checker so you can test matrix dimensions before you even write the Python.

What the error actually means
How to read the error message
7 real-world causes (with code fixes)
Step-by-step debugging process
Interactive shape checker tool
Prevention tips and best practices
FAQ

What the Error Actually Means

Matrix multiplication has one hard rule: if you multiply matrix A of shape (m, n) by matrix B of shape (p, q), then n must equal p. The inner dimensions must match. The result will be shape (m, q).

In PyTorch, nn.Linear(in_features, out_features) stores a weight matrix of shape (out_features, in_features). During the forward pass, it computes output = input @ weight.T + bias. For this multiplication to work, the last dimension of your input tensor must equal in_features.

When it does not match, PyTorch raises:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (AxB and CxD)

Here, mat1 is your input tensor (shape A x B), and mat2 is the weight matrix (shape C x D, which is in_features x out_features after transpose). The error fires because B != C.

This same error can appear with torch.mm(), torch.matmul(), torch.bmm(), or the @ operator. The root cause is always the same: inner dimensions do not align.

How to Read the Error Message

The error message contains every piece of information you need to fix it. Let us decode a real example:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x2048 and 512x10)

Component	Value	Meaning
`mat1`	64 x 2048	Your input tensor. 64 is the batch size. 2048 is the actual feature count.
`mat2`	512 x 10	The Linear layer's weight matrix (transposed). 512 is in_features (what it expects). 10 is out_features.
The mismatch	2048 != 512	Your input has 2048 features but the Linear layer was defined with `in_features=512`.
The fix	Change `nn.Linear(512, 10)` to `nn.Linear(2048, 10)`

The second number in mat1 (B) must equal the first number in mat2 (C). That is the entire rule. Every fix in this guide ultimately reduces to making those two numbers match.

7 Real-World Causes (With Code Fixes)

Forgot to Flatten After Conv2d

This is the #1 cause. After convolutional layers, your tensor is 4D: (batch, channels, height, width). The nn.Linear layer expects a 2D input: (batch, features). If you pass the 4D tensor directly, PyTorch internally reshapes it in a way that makes the dimensions wrong.

# BUG: passing 4D tensor directly to Linear
class BrokenCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2)
        self.fc = nn.Linear(64 * 16 * 16, 10)  # expects flattened

    def forward(self, x):          # x: (batch, 3, 32, 32)
        x = self.pool(F.relu(self.conv1(x)))  # (batch, 64, 16, 16)
        x = self.fc(x)             # ERROR! 4D tensor hits Linear
        return x

Fix: Add x = x.flatten(1) or x = torch.flatten(x, 1) before the Linear layer. You can also use nn.Flatten() as a module in nn.Sequential.

# FIXED: flatten before Linear
def forward(self, x):              # x: (batch, 3, 32, 32)
    x = self.pool(F.relu(self.conv1(x)))  # (batch, 64, 16, 16)
    x = x.flatten(1)               # (batch, 64*16*16) = (batch, 16384)
    x = self.fc(x)                 # works!
    return x

Wrong in_features Calculation

Even if you remembered to flatten, you may have computed in_features wrong. This is especially common when you have multiple Conv2d layers with different strides, paddings, and pooling operations. Each one changes the spatial dimensions, and the final flattened size is hard to calculate by hand.

# BUG: wrong in_features calculation
class MiscalcCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),    # 32x32 -> 32x32
            nn.ReLU(),
            nn.MaxPool2d(2),                    # 32x32 -> 16x16
            nn.Conv2d(32, 64, 3, padding=1),   # 16x16 -> 16x16
            nn.ReLU(),
            nn.MaxPool2d(2),                    # 16x16 -> 8x8
            nn.Conv2d(64, 128, 3, padding=1),  # 8x8 -> 8x8
            nn.ReLU(),
            nn.MaxPool2d(2),                    # 8x8 -> 4x4
        )
        # BUG: wrote 128*8*8 but it's actually 128*4*4
        self.classifier = nn.Linear(128 * 8 * 8, 10)

    def forward(self, x):
        x = self.features(x)
        x = x.flatten(1)                       # (batch, 128*4*4) = (batch, 2048)
        x = self.classifier(x)                 # ERROR: expects 8192, got 2048
        return x

Fix: Run a dummy forward pass to find the correct size. This method is foolproof regardless of how many layers you have.

# FIXED: use a dummy pass to find the correct in_features
import torch

model_features = nn.Sequential(
    nn.Conv2d(3, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
    nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
    nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
)

# Pass a dummy tensor through to discover the actual output size
dummy = torch.zeros(1, 3, 32, 32)
with torch.no_grad():
    out = model_features(dummy)
flat_size = out.flatten(1).shape[1]
print(f"Correct in_features: {flat_size}")  # 2048

# Now use the correct value
classifier = nn.Linear(flat_size, 10)  # nn.Linear(2048, 10)

Changed Input Image Size Without Updating the Linear Layer

Your model was designed for 224x224 images but you are feeding it 256x256 or 128x128 images. The Conv2d and pooling layers produce a different spatial output, which changes the flattened size. The Linear layer still expects the original value.

# Model designed for 224x224 inputs
class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, 7, stride=2, padding=3),  # 224->112
            nn.ReLU(),
            nn.MaxPool2d(3, stride=2, padding=1),       # 112->56
            nn.Conv2d(64, 128, 3, padding=1),           # 56->56
            nn.ReLU(),
            nn.AdaptiveAvgPool2d((1, 1)),               # 56->1x1
        )
        self.fc = nn.Linear(128, 10)  # 128*1*1 = 128

# This works with 224x224 AND 256x256 AND any size!
# Because AdaptiveAvgPool2d always outputs 1x1 spatial

Fix: Use nn.AdaptiveAvgPool2d((1, 1)) before your Linear layer. This forces the spatial dimensions to 1x1 regardless of input size, so in_features = num_channels always. This is how ResNet, EfficientNet, and most modern architectures handle it.

# The pattern every modern CNN uses:
self.pool = nn.AdaptiveAvgPool2d((1, 1))  # any spatial -> 1x1
self.fc = nn.Linear(num_channels, num_classes)

def forward(self, x):
    x = self.features(x)      # (batch, C, H, W) -- H,W can be anything
    x = self.pool(x)           # (batch, C, 1, 1) -- always!
    x = x.flatten(1)           # (batch, C)
    x = self.fc(x)             # works for any input resolution

Transfer Learning: Wrong Classifier Head

When fine-tuning a pretrained model, you need to replace the classifier head with your own. But if you use the wrong in_features, you get this error. Each backbone produces a different feature size.

# BUG: wrong in_features for the backbone
import torchvision.models as models

model = models.resnet50(weights='IMAGENET1K_V1')
# ResNet-50's fc layer is nn.Linear(2048, 1000)
# But you wrote 512 instead of 2048:
model.fc = nn.Linear(512, 5)  # ERROR: backbone outputs 2048, not 512

Fix: Inspect the existing classifier to get the correct in_features, then replace it.

# FIXED: read the original in_features
model = models.resnet50(weights='IMAGENET1K_V1')
print(model.fc)  # Linear(in_features=2048, out_features=1000, bias=True)

# Use the correct value
num_ftrs = model.fc.in_features  # 2048
model.fc = nn.Linear(num_ftrs, 5)  # correct!

# Common backbone output sizes:
# ResNet-18/34:  512
# ResNet-50/101: 2048
# VGG-16:       4096 (after classifier[0])
# EfficientNet:  1280 (B0) to 2560 (B7)
# ViT-Base:     768
# ViT-Large:    1024

Batch Dimension Confusion in Manual Matrix Ops

When you use torch.mm() or the @ operator directly instead of nn.Linear, it is easy to mix up which tensor should come first, or to forget that the batch dimension adds an extra axis.

# BUG: wrong order of operands
W = torch.randn(10, 512)  # weight matrix (out, in)
x = torch.randn(64, 256)  # input (batch, features)

# You want: output = x @ W.T
# But you wrote:
output = torch.mm(W, x)  # ERROR: (10x512) @ (64x256) -- 512 != 64

Fix: For manual matrix multiplication, the input goes first: x @ W.T. Make sure the second dimension of the left matrix matches the first dimension of the right matrix.

# FIXED: correct operand order
W = torch.randn(10, 512)  # weight matrix (out_features, in_features)
x = torch.randn(64, 512)  # input (batch, in_features) -- note: 512, not 256

# Option A: x @ W.T
output = x @ W.T  # (64, 512) @ (512, 10) -> (64, 10) OK

# Option B: torch.mm
output = torch.mm(x, W.T)  # same thing

# Option C: for batched 3D tensors, use torch.bmm or torch.matmul
x_3d = torch.randn(8, 64, 512)   # (batch, seq, features)
W_3d = torch.randn(8, 512, 10)   # (batch, features, out)
output = torch.bmm(x_3d, W_3d)   # (8, 64, 10)

Wrong Reshape or View Before Linear

Using .view() or .reshape() with wrong arguments can silently rearrange your tensor into the wrong shape. PyTorch will not warn you until the matrix multiplication fails.

# BUG: reshaping to wrong dimensions
class BadReshape(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 64, 3, padding=1)  # keeps 32x32
        self.fc = nn.Linear(64, 10)

    def forward(self, x):            # (batch, 3, 32, 32)
        x = F.relu(self.conv(x))     # (batch, 64, 32, 32)
        # BUG: this makes x = (batch, 64) by taking only one spatial position
        # Actually, view requires total elements to match
        x = x.view(x.size(0), -1)   # (batch, 64*32*32) = (batch, 65536)
        x = self.fc(x)              # ERROR: expects 64, got 65536
        return x

Fix: If you mean to flatten, the Linear layer must match the full flattened size. If you want to keep in_features=64, add a global average pooling layer to reduce spatial dimensions to 1x1 first.

# FIXED Option A: match the flatten size
self.fc = nn.Linear(64 * 32 * 32, 10)  # 65536

# FIXED Option B: pool down first, then use small Linear
def forward(self, x):
    x = F.relu(self.conv(x))         # (batch, 64, 32, 32)
    x = F.adaptive_avg_pool2d(x, 1)  # (batch, 64, 1, 1)
    x = x.flatten(1)                 # (batch, 64)
    x = self.fc(x)                   # nn.Linear(64, 10) -- works!
    return x

Multi-Head Attention or Transformer Dimension Mismatch

In Transformer architectures, the nn.MultiheadAttention module requires embed_dim to be divisible by num_heads. Additionally, the projection matrices inside MHA perform matrix multiplications that fail if your input embedding dimension does not match the declared embed_dim.

# BUG: input embedding dim doesn't match declared embed_dim
class BrokenTransformer(nn.Module):
    def __init__(self):
        super().__init__()
        self.embed = nn.Embedding(10000, 256)    # produces dim 256
        self.attn = nn.MultiheadAttention(
            embed_dim=512,   # BUG: says 512 but input is 256
            num_heads=8
        )
        self.fc = nn.Linear(512, 10)

    def forward(self, x):
        x = self.embed(x)          # (seq, batch, 256)
        x, _ = self.attn(x, x, x)  # ERROR: mat1 (N*8 x 256) and mat2 (512 x 512)
        return self.fc(x)

Fix: Make embed_dim match your actual embedding dimension. Or add a projection layer between them.

# FIXED Option A: match embed_dim to embedding output
self.embed = nn.Embedding(10000, 512)  # change to 512
self.attn = nn.MultiheadAttention(embed_dim=512, num_heads=8)

# FIXED Option B: add a projection layer
self.embed = nn.Embedding(10000, 256)
self.proj = nn.Linear(256, 512)  # project 256 -> 512
self.attn = nn.MultiheadAttention(embed_dim=512, num_heads=8)

def forward(self, x):
    x = self.embed(x)       # (seq, batch, 256)
    x = self.proj(x)        # (seq, batch, 512)
    x, _ = self.attn(x, x, x)  # works!
    return self.fc(x)

Step-by-Step Debugging Process

When you encounter this error, follow these five steps in order. They will work for any architecture, any PyTorch version, any scenario.

1 Read the Error Dimensions

Copy the two shapes from the error message. Write them down:

# From the error:
# mat1: (64, 2048)  -- your input
# mat2: (512, 10)   -- the weight matrix
# Problem: 2048 != 512

The first number in mat1 (64) is your batch size — ignore it. The second number (2048) is the actual feature count your data has. The first number in mat2 (512) is what the Linear layer was constructed with as in_features.

2 Add Shape Print Statements

Insert print(x.shape) after every layer in your forward() method. This creates a complete shape trace:

def forward(self, x):
    print(f"Input:       {x.shape}")
    x = self.conv1(x)
    print(f"After conv1: {x.shape}")
    x = self.pool(x)
    print(f"After pool:  {x.shape}")
    x = self.conv2(x)
    print(f"After conv2: {x.shape}")
    x = self.pool(x)
    print(f"After pool2: {x.shape}")
    x = x.flatten(1)
    print(f"After flat:  {x.shape}")  # <-- this is your actual in_features
    x = self.fc(x)  # error would fire here
    return x

Run one batch. The print before the Linear layer tells you the exact value in_features should be.

3 Run a Dummy Forward Pass

If you do not want to modify your forward method, create a dummy tensor matching your input shape and pass it through everything before the Linear layer:

# For image models:
dummy = torch.zeros(1, 3, 224, 224)  # match your actual input size
with torch.no_grad():
    features = model.features(dummy)  # or whatever your conv layers are
    flat = features.flatten(1)
    print(f"in_features should be: {flat.shape[1]}")

# For sequence models:
dummy = torch.zeros(1, 128, dtype=torch.long)  # (batch, seq_len)
with torch.no_grad():
    emb = model.embedding(dummy)
    print(f"Embedding dim: {emb.shape[-1]}")

4 Fix the Linear Layer

Update in_features to match the value you discovered. There are three approaches:

# Approach A: hardcode the correct value
self.fc = nn.Linear(2048, 10)

# Approach B: compute it dynamically in __init__
dummy = torch.zeros(1, 3, 224, 224)
with torch.no_grad():
    n = self.features(dummy).flatten(1).shape[1]
self.fc = nn.Linear(n, 10)

# Approach C: use LazyLinear (PyTorch >= 1.8)
self.fc = nn.LazyLinear(10)  # in_features inferred on first call

5 Verify With a Full Forward Pass

After fixing, always run a complete forward pass with dummy data to confirm there are no remaining shape issues:

# Verify the entire model end-to-end
model = YourModel()
model.eval()
with torch.no_grad():
    dummy_input = torch.randn(2, 3, 224, 224)  # batch=2, typical image
    output = model(dummy_input)
    print(f"Output shape: {output.shape}")  # should be (2, num_classes)
    print("No errors -- model architecture is correct!")

Interactive Matrix Shape Checker

Use this tool to verify whether two matrices can be multiplied before you write the code. Enter the shapes as comma-separated dimensions (e.g., 64,512 for a 64x512 matrix).

Matrix Multiplication Shape Checker

mat1 shape (rows, cols)

mat2 shape (rows, cols)

Try:

Prevention Tips and Best Practices

The best way to handle this error is to never encounter it in the first place. Here are the practices that eliminate it from your workflow permanently.

Use nn.LazyLinear for Prototyping

When you are experimenting with architectures, use nn.LazyLinear(out_features) instead of nn.Linear(in_features, out_features). PyTorch will determine in_features automatically on the first forward pass. Once your architecture is finalized, replace it with a regular nn.Linear for production.

# During prototyping -- no more manual calculations
self.classifier = nn.Sequential(
    nn.LazyLinear(256),
    nn.ReLU(),
    nn.LazyLinear(10),
)

# After architecture is finalized, materialize:
model(dummy_input)  # trigger lazy initialization
print(model.classifier[0])  # now shows actual in_features

Always Use AdaptiveAvgPool2d Before FC Layers

This is the single most effective prevention technique. It decouples the Linear layer's in_features from the input image resolution. ResNet, EfficientNet, MobileNet, and most modern architectures use this pattern.

# This pattern works for ANY input resolution:
self.features = nn.Sequential(...)       # your conv layers
self.pool = nn.AdaptiveAvgPool2d((1, 1)) # always 1x1 output
self.fc = nn.Linear(last_channels, num_classes)

def forward(self, x):
    x = self.features(x)   # (B, C, H, W) -- H,W can be anything
    x = self.pool(x)        # (B, C, 1, 1) -- always
    x = x.flatten(1)        # (B, C)
    return self.fc(x)       # never fails

Write a Shape Assertion

Add a one-line assertion at the start of your forward method to catch shape issues early with a clear error message:

def forward(self, x):
    assert x.shape[-1] == self.fc.in_features, \
        f"Expected input features {self.fc.in_features}, got {x.shape[-1]}"
    return self.fc(x)

Compute in_features Dynamically in init

Instead of calculating the flatten size by hand, compute it with a dummy pass during __init__:

class RobustCNN(nn.Module):
    def __init__(self, in_channels=3, input_size=32, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(in_channels, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
        )
        # Automatically compute the correct in_features
        with torch.no_grad():
            dummy = torch.zeros(1, in_channels, input_size, input_size)
            flat_size = self.features(dummy).flatten(1).shape[1]
        self.classifier = nn.Linear(flat_size, num_classes)

    def forward(self, x):
        x = self.features(x)
        x = x.flatten(1)
        return self.classifier(x)  # always correct

Prevention Checklist

Always flatten 4D tensors before passing to nn.Linear
Use nn.AdaptiveAvgPool2d((1, 1)) to decouple from input resolution
Compute in_features via dummy forward pass, never by hand
Use nn.LazyLinear during prototyping
When fine-tuning, read model.fc.in_features before replacing
Add print(x.shape) debug lines during development
Run a full dummy forward pass after any architecture change
Test with at least two different batch sizes to catch batch-dim bugs

The nn.LazyLinear vs nn.Linear Decision

Feature	nn.Linear	nn.LazyLinear
Requires in_features	Yes, at construction	No, inferred at first forward
TorchScript compatible	Yes	No (until materialized)
ONNX export	Yes	No (until materialized)
Debugging clarity	Better (explicit)	Worse (hidden shape)
Best for	Production, export	Prototyping, experiments

Understanding Matrix Multiplication Shapes in PyTorch

To truly prevent this error, it helps to understand what PyTorch is doing under the hood at each layer. Every nn.Linear(in_features, out_features) layer stores a weight matrix W of shape (out_features, in_features) and an optional bias vector b of shape (out_features,). The forward computation is:

output = input @ W.T + b
# input:  (*, in_features)    -- * means any number of leading dims
# W.T:    (in_features, out_features)
# output: (*, out_features)

The asterisk * means nn.Linear can handle inputs with more than 2 dimensions. If you pass a 3D tensor of shape (batch, seq_len, embed_dim), it multiplies along the last dimension and preserves the others. The output is (batch, seq_len, out_features). This is how Transformer feed-forward layers work.

The critical rule: the last dimension of the input must equal in_features. If it does not, you get the mat1/mat2 error. Everything in this guide reduces to enforcing that single rule.

Shape Propagation Through Common Layers

Here is a reference table showing how each layer transforms tensor shapes. Use this to trace shapes through your architecture manually when debugging:

Layer	Input Shape	Output Shape	Key Parameter
`nn.Linear(I, O)`	(*, I)	(*, O)	Last dim must = I
`nn.Conv2d(Ci, Co, k)`	(B, Ci, H, W)	(B, Co, H', W')	H' depends on k, stride, pad
`nn.MaxPool2d(k)`	(B, C, H, W)	(B, C, H/k, W/k)	Spatial dims shrink
`nn.AdaptiveAvgPool2d(s)`	(B, C, H, W)	(B, C, s, s)	Output size fixed
`nn.Flatten(1)`	(B, C, H, W)	(B, CHW)	Collapses dims 1+
`nn.Embedding(V, D)`	(B, S) int	(B, S, D)	Adds embed dim
`nn.LSTM(I, H)`	(S, B, I)	(S, B, H)	Hidden size H
`nn.BatchNorm2d(C)`	(B, C, H, W)	(B, C, H, W)	Shape unchanged

When This Error Appears in torch.compile

If you are using torch.compile() (PyTorch 2.0+), the same error can surface during the compilation phase rather than at runtime. The traceback will include references to torch._dynamo and may look unfamiliar, but the fix is identical: make the inner dimensions match. To debug, temporarily disable compilation (model = model instead of model = torch.compile(model)) and add print statements.

Batched vs Non-Batched Multiplication

The error message always shows the 2D view of the tensors, even if your original tensors are 3D or higher. For batched operations:

# torch.bmm: both inputs must be 3D
A = torch.randn(8, 64, 32)   # (batch, n, m)
B = torch.randn(8, 32, 16)   # (batch, m, p)
C = torch.bmm(A, B)          # (8, 64, 16) -- works, inner dim 32==32

# torch.matmul: handles broadcasting automatically
A = torch.randn(64, 32)      # (n, m)
B = torch.randn(8, 32, 16)   # (batch, m, p)
C = torch.matmul(A, B)       # ERROR: can't broadcast (64,32) with (8,32,16)

# Fix: add batch dim
A = torch.randn(8, 64, 32)
C = torch.matmul(A, B)       # (8, 64, 16) -- works

Frequently Asked Questions

What causes RuntimeError: mat1 and mat2 shapes cannot be multiplied?

The inner dimensions of a matrix multiplication do not match. In PyTorch, this almost always occurs at an nn.Linear layer where in_features does not equal the actual number of features in the input tensor. The error message format is "(AxB and CxD)" where B (your actual features) must equal C (the Linear layer's in_features).

How do I find the correct in_features for nn.Linear after Conv2d?

Run a dummy forward pass: x = torch.zeros(1, C, H, W); x = conv_layers(x); print(x.flatten(1).shape[1]). This prints the exact number you need for in_features. Alternatively, use nn.LazyLinear(out_features) and PyTorch will infer it automatically on the first forward pass.

What is the difference between mat1 and mat2 in the error message?

mat1 is your input tensor (the data flowing through the network). mat2 is the weight matrix of the nn.Linear layer. For mat1 of shape (A, B), A is the batch size and B is the feature count. For mat2 of shape (C, D), C is in_features and D is out_features. The error fires when B != C.

Can nn.LazyLinear prevent this error?

Yes. nn.LazyLinear(out_features) defers the in_features calculation until the first forward pass. PyTorch automatically sets in_features to match whatever input it receives. This eliminates manual dimension calculations but adds a small overhead on the first call and makes the model non-scriptable until materialized.

Why do I get this error when using a pretrained model?

Pretrained models have a fixed classifier head. If you replace it with nn.Linear(wrong_size, num_classes), you get this error. Check the model's original classifier: print(model.fc) or print(model.classifier). The in_features of your replacement must match the backbone's output dimension (2048 for ResNet-50, 512 for ResNet-18, 768 for ViT-Base).

How does nn.Flatten affect the shape before nn.Linear?

nn.Flatten(start_dim=1) collapses all dimensions after the batch dimension into one. A tensor of shape (batch, C, H, W) becomes (batch, C*H*W). This C*H*W value is what in_features must equal. Forgetting to flatten, or flattening from the wrong dimension, are two of the most common causes of this error.

Does this error happen with torch.matmul or torch.mm too?

Yes. torch.mm(A, B) requires A.shape[1] == B.shape[0]. torch.matmul follows the same rule for 2D inputs. The error message is identical. For batched matrix multiplication with torch.bmm, both tensors must be 3D and the inner dimensions must match: (batch, n, m) @ (batch, m, p).

Is this error the same in PyTorch 1.x and 2.x?

The error message is identical across PyTorch versions. However, PyTorch 2.0+ with torch.compile may report the error at compile time with a slightly different traceback. The fix is always the same: make the inner dimensions match.

Open HeyTensor Shape Calculator

Fix "mat1 and mat2 shapes cannot be multiplied" in PyTorch

Contents

What the Error Actually Means

How to Read the Error Message

7 Real-World Causes (With Code Fixes)

Forgot to Flatten After Conv2d

Wrong in_features Calculation

Changed Input Image Size Without Updating the Linear Layer

Transfer Learning: Wrong Classifier Head

Batch Dimension Confusion in Manual Matrix Ops

Wrong Reshape or View Before Linear

Multi-Head Attention or Transformer Dimension Mismatch

Step-by-Step Debugging Process

1 Read the Error Dimensions

2 Add Shape Print Statements

3 Run a Dummy Forward Pass

4 Fix the Linear Layer

5 Verify With a Full Forward Pass

Interactive Matrix Shape Checker

Matrix Multiplication Shape Checker

Prevention Tips and Best Practices

Use nn.LazyLinear for Prototyping

Always Use AdaptiveAvgPool2d Before FC Layers

Write a Shape Assertion

Compute in_features Dynamically in __init__

Prevention Checklist

The nn.LazyLinear vs nn.Linear Decision

Understanding Matrix Multiplication Shapes in PyTorch

Shape Propagation Through Common Layers

When This Error Appears in torch.compile

Batched vs Non-Batched Multiplication

Frequently Asked Questions

Related Resources

Compute in_features Dynamically in init