Question 1

What is the output shape of MultiheadAttention?

Accepted Answer

For input [seq_len, batch, embed_dim] (PyTorch default) or [batch, seq_len, embed_dim] (with batch_first=True), the output attention shape is the same: the embed_dim dimension is preserved. The attention weights have shape [batch, num_heads, seq_len, seq_len].

Question 2

Why must embed_dim be divisible by num_heads?

Accepted Answer

Each head operates on embed_dim / num_heads dimensions. If embed_dim=512 and num_heads=8, each head processes 64 dimensions. If this division isn't even, PyTorch raises an error.

Question 3

How many parameters does MultiheadAttention have?

Accepted Answer

With in_proj (default), it has 3 * embed_dim * embed_dim (for Q, K, V projections) + embed_dim * embed_dim (output projection) + biases. For embed_dim=512, that's about 1.05M parameters.

Question 4

Is this tool free?

Accepted Answer

Yes. All HeyTensor tools are free, run in your browser, and require no signup.

Question 5

Does this work offline?

Accepted Answer

Once loaded, the tool runs entirely in your browser. No internet needed after the initial page load.

MultiHead Attention Shape Calculator

Frequently Asked Questions

About This Tool

Contact

MultiHead Attention Shape Calculator

Frequently Asked Questions

Related Tools

Embedding Layer Shape Calculator

Linear Layer Shape Calculator

LSTM Output Shape Calculator

About This Tool

Contact