Loss Functions Guide

Compare PyTorch loss functions: CrossEntropyLoss, MSELoss, BCELoss, and more. Formulas, when to use, code examples, and common pitfalls for each loss function.

Built by Michael Lip

Frequently Asked Questions

Which loss function should I use?

Multi-class classification: CrossEntropyLoss (no Softmax needed — it's built in). Binary classification: BCEWithLogitsLoss. Regression: MSELoss (L2) or L1Loss (more robust to outliers). Regression with outliers: HuberLoss.

Why does CrossEntropyLoss include Softmax?

PyTorch's CrossEntropyLoss combines LogSoftmax + NLLLoss for numerical stability. Do NOT apply Softmax before CrossEntropyLoss — you'll get wrong gradients. Your model's final layer should output raw logits.

What is the difference between BCE and CrossEntropy?

BCELoss is for binary classification (one output per sample, 0 or 1). CrossEntropyLoss is for multi-class classification (one-of-N classes). For multi-label classification (multiple labels per sample), use BCEWithLogitsLoss.

About This Tool

This tool is part of HeyTensor, a free suite of PyTorch and deep learning utilities. All calculations run entirely in your browser — no data is sent to any server. The source code is open on GitHub.

Contact

HeyTensor is built and maintained by Michael Lip. For questions or feedback, email [email protected].

📊 Based on real data from our Most Common PyTorch Errors research — 20 errors ranked by frequency