How to Fix "weight is on CPU but input is on CUDA" in PyTorch

The error weight is on CPU but input is on CUDA means pyTorch requires all tensors in an operation to be on the same device. If you move your input data to GPU with .cuda() but forget to move the model, the model weights remain on CPU while the input is on CUDA. This device mismatch causes the error.

What Causes This Error

PyTorch requires all tensors in an operation to be on the same device. If you move your input data to GPU with .cuda() but forget to move the model, the model weights remain on CPU while the input is on CUDA. This device mismatch causes the error.

Scenario 1: Forgot to Move Model to GPU

Moving data to CUDA without moving the model first.

The Error

model = MyModel()  # Model on CPU by default
x = torch.randn(1, 3, 224, 224).cuda()  # Input on GPU
output = model(x)
# RuntimeError: Input type (torch.cuda.FloatTensor) and weight type
# (torch.FloatTensor) should be the same

The Fix

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)  # Move model to GPU
x = torch.randn(1, 3, 224, 224).to(device)  # Move input to same device
output = model(x)  # Works!

# Best practice: always use a device variable
# This makes your code work on both CPU and GPU machines

Always define a device variable and use .to(device) for both model and data. This pattern works seamlessly on machines with or without GPUs.

Scenario 2: Loading a Checkpoint on Wrong Device

Loading a GPU-trained model on CPU or vice versa without map_location.

The Error

# Model was saved on GPU
# torch.save(model.state_dict(), "model.pth")

# Loading on CPU machine without map_location
model = MyModel()
model.load_state_dict(torch.load("model.pth"))  # Loads to GPU tensors!
x = torch.randn(1, 3, 224, 224)  # CPU input
output = model(x)
# RuntimeError: weight is on CUDA but input is on CPU

The Fix

# Always specify map_location when loading
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel()
model.load_state_dict(torch.load("model.pth", map_location=device))
model.to(device)

x = torch.randn(1, 3, 224, 224).to(device)
output = model(x)  # Works!

# Or force CPU loading:
# model.load_state_dict(torch.load("model.pth", map_location="cpu"))

torch.load with map_location ensures weights are loaded to the correct device regardless of where they were saved. Always use map_location=device for portable code.

Scenario 3: Mixed Device in DataLoader Loop

Forgetting to move batch data to GPU inside the training loop.

The Error

model = model.cuda()
for batch_x, batch_y in dataloader:
    # batch_x and batch_y are on CPU (DataLoader default)
    output = model(batch_x)  # Error! CPU input, CUDA model
    loss = criterion(output, batch_y)

The Fix

device = torch.device("cuda")
model = model.to(device)

for batch_x, batch_y in dataloader:
    batch_x = batch_x.to(device)  # Move to GPU
    batch_y = batch_y.to(device)  # Move labels too!
    output = model(batch_x)  # Works!
    loss = criterion(output, batch_y)

# Pro tip: use non_blocking=True for async transfers
# batch_x = batch_x.to(device, non_blocking=True)

DataLoader returns CPU tensors by default. Move each batch to the model device at the start of each iteration. Use non_blocking=True with pinned memory for faster transfers.

Quick Debugging Checklist

# Enable anomaly detection to find the exact line
torch.autograd.set_detect_anomaly(True)

# Check tensor properties
print(f"dtype: {tensor.dtype}, device: {tensor.device}, shape: {tensor.shape}")
print(f"requires_grad: {tensor.requires_grad}")

Related Questions

Try the CUDA OOM Solver