mixup: Beyond Empirical Risk Minimization
Add in virtual training examples
, are raw input vectors, , are one-hot label encodings. . gave the best result for classification.
The important stuff is here label need to be mixed too, so not simple data augmentation. It makes everything better, from classification result to confidence score.