Mixup

mixup: Beyond Empirical Risk Minimization

Arxiv link
Another study on calibration

Add in virtual training examples

\tilde{x} \tilde{y} = λ x_{i} + (1 - λ) x_{j} = λ y_{i} + (1 - λ) y_{j}

$x_{i}$ , $x_{j}$ are raw input vectors, $y_{i}$ , $y_{j}$ are one-hot label encodings. $λ \sim Beta (α, α)$ . $α \in [0.1, 0.4]$ gave the best result for classification.

The important stuff is here label need to be mixed too, so not simple data augmentation. It makes everything better, from classification result to confidence score.

Yanda's Random Notes

Explorer

Mixup

Graph View

Backlinks