What Is the Output Shape of a Bidirectional LSTM?
Bidirectional LSTM with hidden_size=256 outputs shape (batch, seq_len, 512). The output doubles because both forward and backward directions are concatenated along the last dimension: 2 × 256 = 512.
All Output Shapes
An LSTM returns three tensors: output, (h_n, c_n). For a bidirectional LSTM with batch_first=True:
# LSTM(input_size=128, hidden_size=256, bidirectional=True, batch_first=True)
output: (batch, seq_len, 2 * hidden_size) = (batch, seq_len, 512)
h_n: (2 * num_layers, batch, hidden_size) = (2, batch, 256)
c_n: (2 * num_layers, batch, hidden_size) = (2, batch, 256)
PyTorch Code
import torch
import torch.nn as nn
lstm = nn.LSTM(input_size=128, hidden_size=256,
bidirectional=True, batch_first=True)
x = torch.randn(32, 50, 128) # (batch, seq_len, input_size)
output, (h_n, c_n) = lstm(x)
print(output.shape) # torch.Size([32, 50, 512])
print(h_n.shape) # torch.Size([2, 32, 256])
print(c_n.shape) # torch.Size([2, 32, 256])
Connecting to a Linear Layer
When feeding a bidirectional LSTM into a Linear layer, remember that the input size must be 2 × hidden_size:
# Use the last timestep output for classification
fc = nn.Linear(512, 10) # NOT 256!
last_output = output[:, -1, :] # (batch, 512)
logits = fc(last_output) # (batch, 10)