How to Fix "one of the variables needed for gradient computation has been modified" in PyTorch
The error one of the variables needed for gradient computation has been modified means pyTorch autograd records operations to build a computation graph for backpropagation. When you modify a tensor in-place (e.g., x += 1, x.add_(1), x[:] = value), the saved reference becomes invalid because the underlying data changed. This causes incorrect gradients or this error.
What Causes This Error
PyTorch autograd records operations to build a computation graph for backpropagation. When you modify a tensor in-place (e.g., x += 1, x.add_(1), x[:] = value), the saved reference becomes invalid because the underlying data changed. This causes incorrect gradients or this error.
Scenario 1: In-place Addition in Forward Pass
Using += instead of + modifies tensors in place.
The Error
class Model(nn.Module):
def forward(self, x):
out = self.layer1(x)
out += self.residual(x) # In-place! Modifies out
out = self.layer2(out)
return out
# RuntimeError: one of the variables needed for gradient computation
# has been modified by an inplace operation
The Fix
class Model(nn.Module):
def forward(self, x):
out = self.layer1(x)
out = out + self.residual(x) # Out-of-place: creates new tensor
out = self.layer2(out)
return out
# Or use torch.add explicitly:
# out = torch.add(out, self.residual(x))
Replace += with = ... + to create a new tensor instead of modifying in place. This preserves the original values needed for gradient computation.
Scenario 2: In-place Activation Functions
Using inplace=True on activations that feed into operations needing gradients.
The Error
model = nn.Sequential(
nn.Linear(256, 128),
nn.ReLU(inplace=True), # In-place modification
nn.Linear(128, 10)
)
# May cause "modified by an inplace operation" in some graph configurations
The Fix
model = nn.Sequential(
nn.Linear(256, 128),
nn.ReLU(inplace=False), # Safe: creates new tensor
nn.Linear(128, 10)
)
# inplace=True saves memory but risks gradient errors.
# Only use inplace=True when you're certain the tensor isn't
# needed by other branches of the computation graph.
While inplace=True saves memory, it can break gradient computation in networks with skip connections or shared parameters. Default to inplace=False unless profiling shows a clear memory benefit.
Scenario 3: Modifying Weight Tensors During Forward
Directly modifying parameters or buffers during forward pass.
The Error
class Model(nn.Module):
def forward(self, x):
self.weight.data.zero_() # In-place modification of parameter!
self.weight.data.add_(compute_weight(x))
return F.linear(x, self.weight)
# RuntimeError: one of the variables needed for gradient computation
# has been modified by an inplace operation
The Fix
class Model(nn.Module):
def forward(self, x):
# Create a new weight tensor instead of modifying in-place
w = compute_weight(x) # Compute fresh weights
return F.linear(x, w)
# If you need conditional weights, clone first:
# w = self.weight.clone()
# w = w + delta # out-of-place modification on clone
Never modify .data of parameters during forward pass. Use functional operations that create new tensors, or .clone() first to avoid corrupting the computation graph.
Quick Debugging Checklist
- Print tensor
.dtypeand.devicebefore operations - Check for in-place operations:
+=,*=,.add_(),.mul_() - Verify shapes with
print(tensor.shape)at each step - Use
torch.autograd.set_detect_anomaly(True)to pinpoint the exact operation
# Enable anomaly detection to find the exact line
torch.autograd.set_detect_anomaly(True)
# Check tensor properties
print(f"dtype: {tensor.dtype}, device: {tensor.device}, shape: {tensor.shape}")
print(f"requires_grad: {tensor.requires_grad}")