A Little Talk on Label Noise

Recently, we are doing research on personal context recognition with sensor data collected from the user’s mobile devices. We investigate other context recognition solutions and find that supervised machine learning is applied in many solutions. These supervised learning solutions have implicit assumption that annotators are experts and they would provide perfectly labeled training data, which is however rarely met in real-world scenarios. Those mislabeled data could have damaging effects for the applications’ performance. For example, it might decrease the accuracy of classification and increase the size of required training dataset. Therefore, we investigate how other works do about the label noise, and here we would like to share an overview of the researches on this problem.

What is Label Noise?

First of all, what exactly is label noise? In different works, it might be referred to different definitions. In some works such as paper by Victoria Hodge, Varun Chandola, Jiangwen Sun, it is referred to Outliers and Anomaly, and in Dragan Gamberger’s work and Noise Filtering in a Medical Domain, labels noise means those instances which disproportionately increase the model complexity. Frénay’s work gives a comprehensive survey on different types of label noise. In their work, label noise is considered to be the observed labels which are classified incorrectly. But where does the label noise come from? Their work summarizes the following sources of label noise: 1) Insufficient information, such as limited description language, or poor quality data. For example, if only a part of the disease symptoms are given to the annotator, he may give incorrect disease labels. 2) Mistakes made by experts. Experts are not always correct, they could also make mistakes. 3) Classification is subjective in some cases. Doctors could diagnose the illness of one patient differently. 4) Encoding or communication problems. Real-world databases are estimated to contain around 5% of encoding errors.

Need to notice that label noise here is different from outlier and feature (or attribute) noise, and also not the dirty labels induced intentionally and maliciously. Moreover, their work also provides taxonomy of label noise and describes three statistical models.

NCAR, short for Noisy Completely at Random Model: the occurrence of an error is independent of the other random variables, including the true class itself.
NAR, short for Noisy at Random Model: error is still independent of label’s feature, but this model allows modeling asymmetric label noise. For example, in some cases, one certain classed are more prone to be mislabeled.
NNAR, short for Noisy Not at Random Model, the error depends on the feature and true label itself. In other words, errors happens more frequently in some certain class and also the classification boundary or in low density regions.

How about the Non-expert Label Noise?

All these types of label noise we mentioned above usually is only a small fraction of data. And the labels provided by domain experts are always with high quality. However, it is time consuming and expensive to involve domain experts to do the annotation. Moreover, it is non-scalable if a huge number of labels are needed. Many applications starts to use crowd-sourcing technology to achieve labels. You probably have contributed hundreds or thousands of labels without consciousness, if you have ever played some games online which encourage you to recognize that fluffy and cute thing in the picture is a cat or dog. Do any applications ever ask you to input the number which looks fuzzy in the box? Congratulations, you probably just contributed to computer vision research again! See, you are contributing annotations even if you are not computer vision experts.

However, the quality of those non-expert annotations is hard to control. According to the research in Social Science, people are not reliable and would give incorrect labels when asked to fill questionaries. It might be because of the user’s response biases, e.g., memory bias or unwillingness to report, or cognitive bias, e.g., careless and so on. Think about when you are asked to fill some questionaries, you could however give incorrect labels when it asks about your location 30 minutes ago. You do not mean to lie, you just confuse the locations. And you are probably serious and careful for the first 15 questions, but as questionary getting long, you start to choose randomly after that because you have already lost patience. This non-expert label noise is a common issue especially in pervasive and ubiquitous computing or lifelong learning.

How Can We Solve It?

What is the potential consequence of label noise? Label noise inside the training data would decrease the performance of classifications and increase the complexity of learning model. It would also change the requirement of learning model and need much more samples for training. Then how can we solve it? Frénay’s work provides an survey of all the methods which deal with label noise, and classifies these methods into three categories.

Label Noise-Robust Models. In practice, some learning algorithms are naturally robust to label noise, e.g. bagging, boosting, AdaBoost and so on.
Data Cleansing Methods. It is a simple method to deal with label noise by removing mislabeled samples. Relevant methods include outlier detection, anomaly detection, voting filtering and so on.
Label Noise-Tolerant Learning Algorithms. This kind of methods uses prior information to detect label noise. Some of them chose Bayesian priors, Beta priors and Dirichlet priors. And some use Logical Regression model, Hidden Markov Models and graphical models while some other use probabilistic models.

However, traditional machine learning methods may not be able to solve the problem caused by non-expert label noise. Ramesh Maruthi Nallapati’s work provides a solution called CorrActive Learning. It is similar to Active Learning, and the main idea is to learn a classifier from the noisy training data first, and then iteratively present only potentially mislabeled examples to the user while also learning from the user’s corrections. Their experiment results also show that corrActive learner significantly improves the performance of a supervised classifier learned on noisy data, and it is able to learn much faster than a learner that chooses examples using random sampling. This work is similar to our work, we both try to involve users directly to handle this mislabeling problems.

Label noise is a prevalent problem, and it is not easy to present all its details in short words. We just try to give a quick look at the label noise problem and existing solutions in this post, and hope it is helpful for your work. Thanks for reading.