How Many Parameters Does Conv2d(64, 128, 3) Have?
Conv2d(64, 128, 3) has 73,856 trainable parameters. This includes 73,728 weights and 128 bias terms.
Formula Breakdown
For a Conv2d layer, the parameter count is:
parameters = in_channels * out_channels * kernel_size^2 + out_channels (bias)
parameters = 64 * 128 * 3 * 3 + 128
parameters = 64 * 128 * 9 + 128
parameters = 73,728 + 128
parameters = 73,856
Each of the 128 output filters is a 3D kernel of shape (64, 3, 3). That gives 128 × 64 × 3 × 3 = 73,728 weights, plus 128 bias terms. Total: 73,856 trainable parameters.
Memory Usage
In float32, this layer uses 0.28 MB of memory for weights alone. During training with Adam optimizer, multiply by 3 = 0.85 MB.
Architecture Context
This layer configuration is found in the transition from 64 to 128 channels, common in VGG, ResNet, and most CNNs. Understanding parameter counts helps you estimate model size, memory requirements, and the risk of overfitting. Layers with more parameters need more training data and compute to train effectively.
Convolutional layers are parameter-efficient compared to fully connected layers because weights are shared across spatial positions. A Conv2d(64, 128, 3) processes any input spatial size with the same 73,856 parameters.
PyTorch Code to Verify
import torch.nn as nn
layer = nn.Conv2d(64, 128, kernel_size=3)
# Count parameters
total = sum(p.numel() for p in layer.parameters())
print(f"Total parameters: {total}") # 73,856
# Break it down
print(f"Weight shape: {layer.weight.shape}") # (128, 64, 3, 3)
print(f"Weight params: {layer.weight.numel()}") # 73,728
print(f"Bias shape: {layer.bias.shape}") # (128,)
print(f"Bias params: {layer.bias.numel()}") # 128
# Without bias (common in batch-normalized networks)
layer_no_bias = nn.Conv2d(64, 128, kernel_size=3, bias=False)
print(f"Without bias: {sum(p.numel() for p in layer_no_bias.parameters())}") # 73,728
Comparison: With vs. Without Bias
| Configuration | Parameters |
|---|---|
| Conv2d(64, 128, 3) (with bias) | 73,856 |
| Conv2d(64, 128, 3, bias=False) | 73,728 |
When using BatchNorm after a convolutional layer, the bias is redundant because BatchNorm has its own bias term. Setting bias=False saves 128 parameters per layer.