How to Fix "Gradient Is None" in PyTorch

Q: How to fix gradient is None in PyTorch?

Ensure requires_grad=True on your tensor, and that you called .backward() before accessing .grad. Common causes: 1) Tensor created without requires_grad=True, 2) .backward() not called yet, 3) Tensor was detached with .detach() or .data, 4) Operations inside torch.no_grad() context, 5) Non-leaf tensor (use .retain_grad() if needed). For model parameters, requires_grad is True by default.

Ensure requires_grad=True on your tensor, and that you called .backward() before accessing .grad. Detached tensors and operations inside torch.no_grad() lose gradients.

Cause 1: requires_grad Not Set

# BUG: tensors don't track gradients by default
x = torch.tensor([1.0, 2.0, 3.0])
y = x * 2
y.sum().backward()
print(x.grad)  # None!

# FIX: set requires_grad=True
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x * 2
y.sum().backward()
print(x.grad)  # tensor([2., 2., 2.]) ✓

Cause 2: Forgot to Call .backward()

x = torch.randn(3, requires_grad=True)
y = x.sum()
# print(x.grad)  # None — backward() not called yet!

y.backward()
print(x.grad)  # tensor([1., 1., 1.]) ✓

Cause 3: Tensor Was Detached

x = torch.randn(3, requires_grad=True)
y = x.detach() * 2  # detach() breaks the computation graph
y.sum().backward()   # ERROR or x.grad is None

# FIX: don't detach if you need gradients
y = x * 2
y.sum().backward()
print(x.grad)  # Works ✓

Cause 4: Inside torch.no_grad()

x = torch.randn(3, requires_grad=True)
with torch.no_grad():
    y = x * 2  # no computation graph built
# y.sum().backward()  # ERROR: grad can't be created

# FIX: remove torch.no_grad() during training
y = x * 2
y.sum().backward()  # Works ✓

Cause 5: Non-Leaf Tensor

x = torch.randn(3, requires_grad=True)
y = x * 2  # y is non-leaf (result of computation)
y.sum().backward()
print(y.grad)  # None — only leaf tensors get .grad

# FIX: use retain_grad() for non-leaf tensors
x = torch.randn(3, requires_grad=True)
y = x * 2
y.retain_grad()  # tell PyTorch to keep this gradient
y.sum().backward()
print(y.grad)  # tensor([1., 1., 1.]) ✓

Try HeyTensor Shape Calculator