Representation Learning#
Representation Learning involves learning an effective way to represent data, typically by mapping it into a feature space where meaningful patterns and structures are preserved. This approach aims to automatically discover useful features from raw data, enhancing the performance of various machine learning tasks. Techniques like contrastive loss and triplet loss are often used in this context to ensure that similar data points are close together and dissimilar ones are far apart in the learned representation space.
So, Representation learning involves:
Encoding static observations (e.g., images) into a latent space.
Evolving weights that align with the ground truth in the observation space.
Representation learning can be divided into:
Supervised representation learning: learning representations on task A using annotated data and used to solve task B
Unsupervised representation learning: learning representations on a task in an unsupervised way (label-free data)
Whats Downstream tasks#
Downstream tasks refer to specific applications or problems that benefit from the features or representations learned by a model. In the context of machine learning, once a model learns to represent data (e.g., identifying patterns or features in images, text, etc.), these representations can be used to improve performance on related tasks, known as downstream tasks.
For example:
In Natural Language Processing (NLP), if a model learns to understand the structure and meaning of sentences, it can then be applied to downstream tasks like sentiment analysis, translation, or summarization.
In Computer Vision (CV), if a model learns to recognize basic features in images (like edges, shapes, etc.), these features can be used for downstream tasks like object detection, image classification, or facial recognition.
Latent -space#
In machine learning, the goal is typically to predict a target value
In representation learning, the approach shifts from directly learning a function
Goal: The aim is to develop representations
Advantage: Learning the representation
Learning #
We start by learning a mapping
, where is the original input space, and is the learned representation space.The goal is to transform the input data
into a new representation that captures the essential features needed for various tasks.
Learning Simple Predictors #
Once we have the representation
, we learn a function for a specific task. For instance, if the task is classification, could be a linear model or a simple neural network that maps to the output .
Mathematical Explanation#
Assume we have multiple tasks
, each requiring us to predict different outputs .Instead of learning separate functions
for each task directly from the original input space , we first learn a common representation regardless of Task.Then, for each task
, we learn a simpler function on top of the shared representation .
or
Latent with Deep#
In deep learning, a model
Where:
are individual layers or transformations. is the overall model that maps the input to the output .
Learning the Representation #
When the model
Here,
Cost function for Representation Learning#
In representation learning, the cost function, or loss function, is crucial for guiding the learning process. The choice of cost function depends on the specific goals of the learning task.
Triplet Loss#
Objective: To ensure that the distance between an anchor and a positive sample (from the same class) is smaller than the distance between the anchor and a negative sample (from a different class) by a margin.
Cost Function: $
D_{\text{anchor, positive}} D_{\text{anchor, negative}} \alpha $ is a margin.
NCA (Neighborhood Component Analysis) Loss#
Objective: To learn a distance metric that improves nearest neighbor classification accuracy.
Cost Function: $
L \tau $ is a temperature parameter, and the sum is over all training instances.
Combining simple concepts to derive complex structures#
Deep neural networks are fundamentally representation learning models, particularly in supervised settings. These networks typically consist of two main components:
Encoder: transforms the input data into a low-level representation subspace.
Linear classifier: to separate classes
These learned representations are dense, compact, and transferable to similar data modalities. From edge and corner transfer to label of image.
Example:
Train a Convolutional Neural Network (CNN), then transfer the learned knowledge to another task, often one with insufficient labels to train a deep network from scratch. This approach, known as transfer learning, has been successfully applied across various domains and is widely used in commercial applications.
Example for Understanding the Term “From Scratch”
Building a Model from Scratch: Instead of relying on libraries like TensorFlow or PyTorch, you would write your own code to define the model architecture, initialize parameters, and implement training algorithms.
Insufficient label in pixel level#
For tasks such as object classification, detection, and segmentation, the number of labeled examples diminishes substantially as we move from whole-image labels to pixel-level annotations.
In healthcare applications like breast cancer metastasis detection, annotating large datasets is both expensive and time-consuming, often requiring hours of work from specialized expert pathologists with extensive training
Soloution of Problem#
All of these examples highlight the importance of learning generalizable representations from non-annotated data. Many research areas, including
semi-supervised
self-supervised learning try to learn representations that can be transferred to new tasks using just a few or not using annotated examples at all.
Deep Unsupervised Representation Learning
Key Focus:
Deep unsupervised representation learning is centered on extracting useful features from unlabeled data. The main goal is to improve downstream tasks while reducing dependence on human annotations.
Recent Developments:
Importance of Unsupervised Learning:
Grown due to advances in NLP, specifically with models like BERT and GPT (Miniproject). These models showcase the power of label-free training.
Prototypical Contrastive Learning of Unsupervised Representations
Self-supervised learning involves devising a predictive task (pretext task)
pretext task A pretext task in self-supervised learning is a task designed to help the model learn useful features from unlabeled data.
it doesn’t require manual annotations. For example, a common pretext task in image processing might involve predicting the rotation angle of an image. Although the model doesn’t have explicit labels for what each rotation represents, learning to predict this rotation angle helps the model understand important features of the image, such as shapes and textures. These learned features can then be applied to more traditional tasks like image classification or object detection.
Another examples:
Text processing (BERT)
Image Inpainting: unsupervised Learning Self-supervised pretext tasks consist of taking out some parts of the data and challenging the network to predict that missing part.
Labeling data can indeed be expensive#
Complexity of Labeling Task
Simple Labels: Tasks like labeling images with straightforward categories (e.g., “cat” or “dog”) may be relatively inexpensive if you can use crowd-sourcing platforms.
Complex Labels: Tasks requiring detailed annotation or expert knowledge (e.g., medical imaging, legal documents) can be much more costly due to the need for specialized skills and time.
Volume of Data
Large Datasets: For large datasets, the cost can add up quickly. For example, labeling thousands or millions of images or text samples can require significant resources.
Quality of Labels
Consistency and Accuracy: Ensuring high-quality, accurate labels often requires multiple rounds of verification and validation, which adds to the cost.
Training and Quality Control: Training annotators and implementing quality control measures can further increase expenses.
Domain Expertise
Specialized Knowledge: Data that requires domain expertise (e.g., medical images, legal documents) demands annotators with specialized training, which can be more expensive than general labeling tasks.
Time Constraints
Speed of Labeling: Fast turnaround times for large-scale labeling projects can lead to higher costs, as additional resources may be needed to meet deadlines.
Tools and Infrastructure
Annotation Tools: Developing or purchasing sophisticated annotation tools and infrastructure can also add to the overall cost.