How Many Parameters Does Linear(2048, 1000) Have?
Linear(2048, 1000) has 2,049,000 trainable parameters. This includes 2,048,000 weights and 1000 bias terms.
Formula Breakdown
For a Linear layer, the parameter count is:
parameters = in_features * out_features + out_features (bias)
parameters = 2048 * 1000 + 1000
parameters = 2,048,000 + 1000
parameters = 2,049,000
The weight matrix W has shape (1000, 2048) = 2,048,000 values. The bias vector b has 1000 values. Together: 2,049,000 trainable parameters.
Memory Usage
In float32, this layer uses 7.82 MB of memory for weights alone. During training with Adam optimizer, multiply by 3 (weights + momentum + variance) = 23.45 MB.
Architecture Context
This layer configuration is found in ResNet-50/101/152 final classification layer (1000 ImageNet classes). Understanding parameter counts helps you estimate model size, memory requirements, and the risk of overfitting. Layers with more parameters need more training data and compute to train effectively.
Linear layers are often the most parameter-heavy part of a network. For example, VGG-16 has ~124M parameters in its three fully connected layers versus only ~14M in all its convolutional layers. Modern architectures minimize linear layers by using global average pooling.
PyTorch Code to Verify
import torch.nn as nn
layer = nn.Linear(2048, 1000)
# Count parameters
total = sum(p.numel() for p in layer.parameters())
print(f"Total parameters: {total}") # 2,049,000
# Break it down
print(f"Weight shape: {layer.weight.shape}") # (1000, 2048)
print(f"Weight params: {layer.weight.numel()}") # 2,048,000
print(f"Bias shape: {layer.bias.shape}") # (1000,)
print(f"Bias params: {layer.bias.numel()}") # 1000
# Without bias
layer_no_bias = nn.Linear(2048, 1000, bias=False)
print(f"Without bias: {sum(p.numel() for p in layer_no_bias.parameters())}") # 2,048,000
Comparison: With vs. Without Bias
| Configuration | Parameters |
|---|---|
| Linear(2048, 1000) (with bias) | 2,049,000 |
| Linear(2048, 1000, bias=False) | 2,048,000 |
When using BatchNorm after a convolutional layer, the bias is redundant because BatchNorm has its own bias term. Setting bias=False saves 1000 parameters per layer.