What Is the Output Shape of a Bidirectional LSTM?

Q: What is the output shape of a bidirectional LSTM?

A bidirectional LSTM with hidden_size=256 outputs shape (batch, seq_len, 512) with batch_first=True. The last dimension is 2 * hidden_size = 512 because the forward and backward direction outputs are concatenated. The hidden state h_n has shape (2 * num_layers, batch, hidden_size), where the factor of 2 accounts for both directions.

Bidirectional LSTM with hidden_size=256 outputs shape (batch, seq_len, 512). The output doubles because both forward and backward directions are concatenated along the last dimension: 2 × 256 = 512.

All Output Shapes

An LSTM returns three tensors: output, (h_n, c_n). For a bidirectional LSTM with batch_first=True:

# LSTM(input_size=128, hidden_size=256, bidirectional=True, batch_first=True)

output:  (batch, seq_len, 2 * hidden_size) = (batch, seq_len, 512)
h_n:     (2 * num_layers, batch, hidden_size) = (2, batch, 256)
c_n:     (2 * num_layers, batch, hidden_size) = (2, batch, 256)

PyTorch Code

import torch
import torch.nn as nn

lstm = nn.LSTM(input_size=128, hidden_size=256,
               bidirectional=True, batch_first=True)
x = torch.randn(32, 50, 128)  # (batch, seq_len, input_size)
output, (h_n, c_n) = lstm(x)

print(output.shape)  # torch.Size([32, 50, 512])
print(h_n.shape)     # torch.Size([2, 32, 256])
print(c_n.shape)     # torch.Size([2, 32, 256])

Connecting to a Linear Layer

When feeding a bidirectional LSTM into a Linear layer, remember that the input size must be 2 × hidden_size:

# Use the last timestep output for classification
fc = nn.Linear(512, 10)  # NOT 256!
last_output = output[:, -1, :]  # (batch, 512)
logits = fc(last_output)        # (batch, 10)

Try the LSTM Calculator

What Is the Output Shape of a Bidirectional LSTM?

All Output Shapes

PyTorch Code

Connecting to a Linear Layer

Related Questions