MultiHead Attention Shape Calculator
Calculate the output shape of PyTorch MultiheadAttention. Enter embed_dim, num_heads, and sequence length to verify your transformer layer configuration.
Built by Michael Lip
Frequently Asked Questions
What is the output shape of MultiheadAttention?
For input [seq_len, batch, embed_dim] (PyTorch default) or [batch, seq_len, embed_dim] (with batch_first=True), the output attention shape is the same: the embed_dim dimension is preserved. The attention weights have shape [batch, num_heads, seq_len, seq_len].
Why must embed_dim be divisible by num_heads?
Each head operates on embed_dim / num_heads dimensions. If embed_dim=512 and num_heads=8, each head processes 64 dimensions. If this division isn't even, PyTorch raises an error.
How many parameters does MultiheadAttention have?
With in_proj (default), it has 3 * embed_dim * embed_dim (for Q, K, V projections) + embed_dim * embed_dim (output projection) + biases. For embed_dim=512, that's about 1.05M parameters.
Is this tool free?
Yes. All HeyTensor tools are free, run in your browser, and require no signup.
Does this work offline?
Once loaded, the tool runs entirely in your browser. No internet needed after the initial page load.
About This Tool
This tool is part of HeyTensor, a free suite of PyTorch and deep learning utilities. All calculations run entirely in your browser — no data is sent to any server. The source code is open on GitHub.
Contact
HeyTensor is built and maintained by Michael Lip. For questions or feedback, email [email protected].