How to Fix "weight is on CPU but input is on CUDA" in PyTorch
The error weight is on CPU but input is on CUDA means pyTorch requires all tensors in an operation to be on the same device. If you move your input data to GPU with .cuda() but forget to move the model, the model weights remain on CPU while the input is on CUDA. This device mismatch causes the error.
What Causes This Error
PyTorch requires all tensors in an operation to be on the same device. If you move your input data to GPU with .cuda() but forget to move the model, the model weights remain on CPU while the input is on CUDA. This device mismatch causes the error.
Scenario 1: Forgot to Move Model to GPU
Moving data to CUDA without moving the model first.
The Error
model = MyModel() # Model on CPU by default
x = torch.randn(1, 3, 224, 224).cuda() # Input on GPU
output = model(x)
# RuntimeError: Input type (torch.cuda.FloatTensor) and weight type
# (torch.FloatTensor) should be the same
The Fix
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device) # Move model to GPU
x = torch.randn(1, 3, 224, 224).to(device) # Move input to same device
output = model(x) # Works!
# Best practice: always use a device variable
# This makes your code work on both CPU and GPU machines
Always define a device variable and use .to(device) for both model and data. This pattern works seamlessly on machines with or without GPUs.
Scenario 2: Loading a Checkpoint on Wrong Device
Loading a GPU-trained model on CPU or vice versa without map_location.
The Error
# Model was saved on GPU
# torch.save(model.state_dict(), "model.pth")
# Loading on CPU machine without map_location
model = MyModel()
model.load_state_dict(torch.load("model.pth")) # Loads to GPU tensors!
x = torch.randn(1, 3, 224, 224) # CPU input
output = model(x)
# RuntimeError: weight is on CUDA but input is on CPU
The Fix
# Always specify map_location when loading
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel()
model.load_state_dict(torch.load("model.pth", map_location=device))
model.to(device)
x = torch.randn(1, 3, 224, 224).to(device)
output = model(x) # Works!
# Or force CPU loading:
# model.load_state_dict(torch.load("model.pth", map_location="cpu"))
torch.load with map_location ensures weights are loaded to the correct device regardless of where they were saved. Always use map_location=device for portable code.
Scenario 3: Mixed Device in DataLoader Loop
Forgetting to move batch data to GPU inside the training loop.
The Error
model = model.cuda()
for batch_x, batch_y in dataloader:
# batch_x and batch_y are on CPU (DataLoader default)
output = model(batch_x) # Error! CPU input, CUDA model
loss = criterion(output, batch_y)
The Fix
device = torch.device("cuda")
model = model.to(device)
for batch_x, batch_y in dataloader:
batch_x = batch_x.to(device) # Move to GPU
batch_y = batch_y.to(device) # Move labels too!
output = model(batch_x) # Works!
loss = criterion(output, batch_y)
# Pro tip: use non_blocking=True for async transfers
# batch_x = batch_x.to(device, non_blocking=True)
DataLoader returns CPU tensors by default. Move each batch to the model device at the start of each iteration. Use non_blocking=True with pinned memory for faster transfers.
Quick Debugging Checklist
- Print tensor
.dtypeand.devicebefore operations - Check for in-place operations:
+=,*=,.add_(),.mul_() - Verify shapes with
print(tensor.shape)at each step - Use
torch.autograd.set_detect_anomaly(True)to pinpoint the exact operation
# Enable anomaly detection to find the exact line
torch.autograd.set_detect_anomaly(True)
# Check tensor properties
print(f"dtype: {tensor.dtype}, device: {tensor.device}, shape: {tensor.shape}")
print(f"requires_grad: {tensor.requires_grad}")