What Is Self-supervised Learning? An Understandable Definition
Self-supervised learning is a new type of algorithm in the field of artificial intelligence. This describes the procedure in which labels are automatically generated from the data to carry out tasks from supervised learning. The advantages include, among other things, a high gain in efficiency and an increase in quality standards.
Definition of self-supervised learning
In German, ” self-supervised learning, “self-supervised learning,” describes a procedure in which machine learning labels for supervised learning are generated without human intervention.
In concrete terms, this means that there is a phase (“pretext”) before the actual learning (training) in which the input (e.g., images) is automatically annotated. These labels can then be used for training.
The idea behind this is that the training material can be generated from the existing data instead of being laboriously created by humans. Automatic labeling is, of course, very attractive, especially for data science applications with a very large amount of data.
Self-supervised learning is mainly used in neural networks or deep learning, as these can react very flexibly to such input.
A simple example can be shown in the field of natural language processing (NLP). A sentence is used as input from which a word is removed (pretext). A neural network should then try to predict this word, which is supplied as an output label.
Because we have removed the word from an intact text beforehand, we, of course, know what the correct answer is and train accordingly.
What are the advantages of this approach?
The most important advantage of self-supervised learning is the elimination of human labeling. Creating machine learning labels for large amounts of data or difficult data sets can be very time-consuming.
Automated generation of annotations directly from the available data helps to save effort, time, and money.
Furthermore, there are often problems with data quality, so people create labels. If the data quality decreases, the quality of the algorithms used also decreases. Consequently, self-monitored learning reduces the effort and, thus, implicitly the quality.
Another advantage of self-supervised learning is a very high level of flexibility when changing the labels: If new labels can be created automatically, the training process can also be much more flexible.
What is the difference between self-supervised learning and supervised learning?
Labels must be provided so that supervised learning can be used. In the case of regression (e.g., the forecast of sales), these are numerical values; in the case of classification (e.g., sorting images into “reject” and “no reject”), they are categories.
These labels are usually generated by humans and attached to other data. This process, known as “machine learning labeling,” is very complex and sometimes error-prone. Self-supervised learning shortens this process by generating the labels directly from the data.
What is the difference between self-supervised learning and unsupervised learning?
Unsupervised learning does not require prior labels and uses a purely algorithmic approach to recognize patterns in data. Thus it overlaps with self-supervised learning.
It does not depend on the annotation of people; however, the types of algorithms (regression, classification, .) cannot be achieved with classic unsupervised methods.
Consistency Loss – better training through artificially deteriorated input
A conventional approach for better results of machine learning models in the area of image recognition is “Noise adding.” Here, input images are given “deteriorations,” such as setting pixels to incorrect color values or constructing exposure filters.
The idea is clear: due to the higher input variance with the same output label, the model generalizes more. Although this may lead to lower training accuracy, it may also lead to better results in the real world.
The same principle applies to self-supervised learning. There it is called either “consistency loss” (NLP) or “contrastive noise estimation” (vision). Methods of consistency loss include, for example, deleting letters, rotating word positions, changing the color balance, deleting individual pixels, rotating the image, and many more.
This addition of noise with the same (self-generated) label increases the generalization of the models and thus the performance in real use.
Examples of self-supervised learning
Self-supervised learning in the area of NLP ( Natural Language Processing ):
- Prediction of the next word: In the pretext, a word is removed, and the removed word is used as a label for training.
- Positioning of words: words are removed from the text and used as input to predict the position.
- Word or text distortion: Randomization of individual words or the order of a text.
Self-supervised learning in the field of vision/image processing :
- Colorization: The images are decolorized for the training to then predict the colors.
- Image patch positioning: The removal and prediction of the position of part of an image.
- Inpainting / filling holes: filling gaps in images by initially creating them (e.g., setting pixel values to black).
- Correct corrupt images: Create distortions, blurring, or misalignments within an image and then repair them.
Self-supervised learning in the field of video:
- Correct Order: Put video frames (back) in the correct order.
- Fill gaps: Predict individual frames within a video.
Conclusion on the subject of self-supervised learning
Self-monitored learning describes the process of algorithmically generating machine learning labels from existing stimulus material to train a model.
The goal is the automated annotation of training material to avoid effort and loss of quality. The areas of application are primarily language processing (NLP), image processing, and video processing.