Stochastic Deep Networks with Linear Competing Units for Transfer Learning
Date Issued
January 2024
Author(s)
Advisor
Abstract
Deep Learning (DL) has become the preferred approach to addressing various challenging machine learning (MLe) tasks, like computer vision, natural language processing, and speech recognition. Deep Neural Networks (DNN's) have achieved superior performance in those tasks compared to traditional MLe methods. However, they entail a huge number of weights causing them to make over-confident predictions, that may reduce their generalization capacity in hard problems, e.g. a Meta-Learning (ML) scenario. To mitigate this issue, researchers have applied Bayesian modeling to DNN's, where they employ Bayesian Neural Networks (BNN's) with more robust and tractable estimates of uncertainty in a model's predictions. That way, we can build safer MLe systems in safety-critical applications, such as healthcare, video recognition, and autonomous vehicle control.
DL models trained to solve a single task suffer from a common drawback: they cannot combine data from diverse tasks in order to learn new tasks in a future training round. Such models often require extensive training and data collection for each task individually, which can be time-consuming and data-intensive. This limitation has given rise to the importance of research in ML. This field aims to address these shortcomings by developing methods that allow existing models to efficiently learn from and adapt quickly to new tasks, by leveraging knowledge gained from previous tasks. Therefore, ML methods aim to make models more capable of generalizing well to unseen tasks with just a small amount of examples; this is the so-called problem of few-shot learning.
In this thesis, we aim to study how some existing DL methods for ML are used to tackle this phenomenon, and suggest a novel ML method regarding improving generalization capacity, predictive performance, and computational efficiency. Specifically, our proposed approach relies on the concepts of stochastic and sparse learned representations. In that way, we aim to define a sparse and stochastic network paradigm for ML, with novel network design principles compared to currently used ML models; we use stochastic deep networks with linear competing units in the context of model-agnostic ML. As we empirically show, our approach produces state-of-the-art predictive accuracy on few-shot image classification and regression experiments, as well as reduced predictive error on an active learning setting; these improvements come with an immensely reduced computational cost.
These encouraging results, further motivate us to also examine the case where we do not have all tasks available beforehand, but they come in sequentially. In such a case, a DNN should learn to adapt to this continuous stream of data, effectively handling a major problem that affects DNN's in such settings, namely catastrophic forgetting. Continual Learning (CL) methods are designed to mitigate or reduce this issue. Specifically, such a method learns the DNN to accumulate new knowledge after a few training iterations on a new data distribution, and avoid drastically forgetting previously learned information from older tasks. Recently, researchers have developed various approaches in order to counteract this problem.
To address this challenge, this thesis proposes a radically different regard toward addressing catastrophic forgetting in CL tasks, and especially in a famous variant of CL called class-incremental learning (CIL). Our approach is founded upon the framework of stochastic local competition which is implemented in a task-wise manner. We have shown that it produces state-of-the-art predictive accuracy on few-shot image classification experiments, and imposes a considerably lower computational overhead compared to the current state-of-the-art.
DL models trained to solve a single task suffer from a common drawback: they cannot combine data from diverse tasks in order to learn new tasks in a future training round. Such models often require extensive training and data collection for each task individually, which can be time-consuming and data-intensive. This limitation has given rise to the importance of research in ML. This field aims to address these shortcomings by developing methods that allow existing models to efficiently learn from and adapt quickly to new tasks, by leveraging knowledge gained from previous tasks. Therefore, ML methods aim to make models more capable of generalizing well to unseen tasks with just a small amount of examples; this is the so-called problem of few-shot learning.
In this thesis, we aim to study how some existing DL methods for ML are used to tackle this phenomenon, and suggest a novel ML method regarding improving generalization capacity, predictive performance, and computational efficiency. Specifically, our proposed approach relies on the concepts of stochastic and sparse learned representations. In that way, we aim to define a sparse and stochastic network paradigm for ML, with novel network design principles compared to currently used ML models; we use stochastic deep networks with linear competing units in the context of model-agnostic ML. As we empirically show, our approach produces state-of-the-art predictive accuracy on few-shot image classification and regression experiments, as well as reduced predictive error on an active learning setting; these improvements come with an immensely reduced computational cost.
These encouraging results, further motivate us to also examine the case where we do not have all tasks available beforehand, but they come in sequentially. In such a case, a DNN should learn to adapt to this continuous stream of data, effectively handling a major problem that affects DNN's in such settings, namely catastrophic forgetting. Continual Learning (CL) methods are designed to mitigate or reduce this issue. Specifically, such a method learns the DNN to accumulate new knowledge after a few training iterations on a new data distribution, and avoid drastically forgetting previously learned information from older tasks. Recently, researchers have developed various approaches in order to counteract this problem.
To address this challenge, this thesis proposes a radically different regard toward addressing catastrophic forgetting in CL tasks, and especially in a famous variant of CL called class-incremental learning (CIL). Our approach is founded upon the framework of stochastic local competition which is implemented in a task-wise manner. We have shown that it produces state-of-the-art predictive accuracy on few-shot image classification experiments, and imposes a considerably lower computational overhead compared to the current state-of-the-art.
File(s)![Thumbnail Image]()
Name
Thesis_Kalais.pdf
Size
8.18 MB
Format
Adobe PDF
Checksum (MD5)
7b682cde1e1dbc0c7d4d397b22225697

