This provide a way for self-paced multitask learning. Or, for a model that have muitiple task, how to weight the loss terms. If picked right, a model with multi-task target can perform better than focusing on a single task.

This method uses “task uncertainty”, which is

Task-dependent or Homoscedastic uncertainty is aleatoric uncertainty which is not dependent on the in- put data. It is not a model output, rather it is a quantity which stays constant for all input data and varies between different tasks. It can therefore be described as task-dependent uncertainty.

So basically even if the input is the same, the output would vary, and that’s this uncertainty.

The formula

Very very simple. Recall regression loss, MSE (mean squared error) can be intepreted as MLE of Gaussian. Let be the output of a neural network with weights on input , then

So if we have two regression target, it’s

We learn the and together with our usual . Note that they cannot be too big because the last term is regulating them.

If there’s classification, we can represent it with temperatured softmax, which is a Boltzman distribution:

And you’ll get another formula. See the OG paper for details.