MultiHead Attention Shape Calculator

Calculate the output shape of PyTorch MultiheadAttention. Enter embed_dim, num_heads, and sequence length to verify your transformer layer configuration.

Built by Michael Lip

Frequently Asked Questions

What is the output shape of MultiheadAttention?

For input [seq_len, batch, embed_dim] (PyTorch default) or [batch, seq_len, embed_dim] (with batch_first=True), the output attention shape is the same: the embed_dim dimension is preserved. The attention weights have shape [batch, num_heads, seq_len, seq_len].

Why must embed_dim be divisible by num_heads?

Each head operates on embed_dim / num_heads dimensions. If embed_dim=512 and num_heads=8, each head processes 64 dimensions. If this division isn't even, PyTorch raises an error.

How many parameters does MultiheadAttention have?

With in_proj (default), it has 3 * embed_dim * embed_dim (for Q, K, V projections) + embed_dim * embed_dim (output projection) + biases. For embed_dim=512, that's about 1.05M parameters.

Is this tool free?

Yes. All HeyTensor tools are free, run in your browser, and require no signup.

Does this work offline?

Once loaded, the tool runs entirely in your browser. No internet needed after the initial page load.

About This Tool

This tool is part of HeyTensor, a free suite of PyTorch and deep learning utilities. All calculations run entirely in your browser — no data is sent to any server. The source code is open on GitHub.

Contact

HeyTensor is built and maintained by Michael Lip. For questions or feedback, email [email protected].

📊 Based on real data from our Most Common PyTorch Errors research — 20 errors ranked by frequency