Bayesian inference techniques for Deep Learning

Partaourides, Charalampos

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: https://hdl.handle.net/20.500.14279/10742

Πεδίο DC	Τιμή	Γλώσσα
dc.contributor.advisor	Chatzis, Sotirios P.	-
dc.contributor.author	Partaourides, Charalampos	-
dc.date.accessioned	2018-03-07T09:37:58Z	-
dc.date.available	2018-03-07T09:37:58Z	-
dc.date.issued	2018-01	-
dc.identifier.uri	https://hdl.handle.net/20.500.14279/10742	-
dc.description	Το διεπιστημονικό πεδίο της πολυεπίπεδης μάθησης διερευνά την εκπαίδευση πολύπλοκων ιεραρχικών μαθηματικών μοντέλων με τη χρήση συνόλων πολυδιάστατων δεδομένων και έχει επιφέρει μεγάλες αποδόσεις σε προβλήματα τεχνητής νοημοσύνης, όπως αναγνώριση αντικειμένων, κατανόηση φυσικής γλώσσας, αναγνώριση ομιλίας και ρομποτικής. Η ενσωμάτωση Μπεϋζιανής συμπερασματολογίας στην εκπαίδευση μπορεί να επιφέρει ακόμη μεγαλύτερες αποδόσεις λόγω της έμφυτης ιδιαιτερότητας να αντιμετωπίζει την επιστημική αβεβαιότητα που διακατέχει τα πολυεπίπεδα μοντέλα λόγω των επιλογών στη δομή των μοντέλων και στις παραμέτρους. Για το σκοπό αυτό, διερευνήθηκε η εφαρμογή Μπεϋζιανής συμπερασματολογίας στα τρία κύρια δομικά στοιχεία των πολυεπίπεδων μοντέλων, στα συναπτικά βάρη (synaptic weights), στις κρυμμένες/λανθάνουσες καταστάσεις (hidden/latent units) και στις μη γραμμικές συναρτήσεις (feature functions), με τη χρήση καινοτόμων μαθηματικών μοντέλων. Τα αποτελέσματα έδειξαν πως η χρήση προσεγγιστικής Μπεϋζιανής συμπερασματολογίας και η χρήση πιο πλούσιων και εκφραστικών κατανομών παρουσιάζουν στατιστικά σημαντικές βελτιώσεις στις αποδόσεις σε διάφορα σύνολα δεδομένων. Αυτό επιτυγχάνεται λόγω της δυνατότητας που προσφέρουν στα μοντέλα να συλλάβουν σε καλύτερο βαθμό τις δυσεύρετες κρυμμένες εξαρτήσεις των πολυδιάστατων δεδομένων εκπαίδευσης.	en_US
dc.description.abstract	Deep learning has achieved state of the art performance in various challenging machine learning tasks pushing the Artificial Intelligence frontier into new heights. Tasks like object recognition, speech perception, language understanding and robotics are improving year by year. This is mainly due to the recent breakthroughs in Bayesian inference, the increased volume of datasets and the increased computational power. These make it feasible to tractably train these challenging hierarchical structured models that contain millions of parameters. Deep Learning is an umbrella term which entails numerous deep architecture models that are able to capture even the most complex dynamics of the environment. Typically, they are trained under the maximum likelihood estimation paradigm. Unfortunately, in many real world tasks the high dimensionality of the observations results in even the largest datasets to being sparse. As such, there is an immense need for the training algorithm to compensate the uncertainty introduced by the data sparsity, overcome the model’s overfitting tendencies and in result generalize well. The statistical method of Bayesian inference provides a mathematically coherent way of dealing with data sparsity and overfitting. It essentially uses the Bayes theorem to accumulate evidence-based knowledge. This is achieved by postulating probability distributions over the parameters instead of trying to derive point estimates of them. Under the Bayesian view, we impose a prior distribution that encapsulates our initial belief about the model’s dynamics and we correct that belief as we are presented with more data; this consists in inferring the posterior distribution. It is conspicuous that the choice of the distribution heavily controls the expressiveness of the model. In this thesis, we present innovative approaches to train deep networks by considering sparsity, skewness and heavy tails on the form of the parameters distribution. Specifically, among our contributions, we impose a sparsity inducing distribution over the network synaptic weights to improve generalization. On a different vein, we consider the imposition of a skew normal distribution over the latent variables to increase the deep networks capacity. In parallel, we examine the efficacy of inferring the feature functions by devising a novel random sampling rational combined by an optimizable sample weighting scheme. The models derived by the aforementioned approaches are trained by means of approximate Bayesian inference scheme to allow for scalability in large datasets. We exhibit the advantages of these methods over existing approaches by conducting an extensive experimental evaluation using benchmark	en_US
dc.format	pdf	en_US
dc.language.iso	en	en_US
dc.publisher	Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής, Σχολή Μηχανικής και Τεχνολογίας, Τεχνολογικό Πανεπιστήμιο Κύπρου	en_US
dc.rights	Απαγορεύεται η δημοσίευση ή αναπαραγωγή, ηλεκτρονική ή άλλη χωρίς τη γραπτή συγκατάθεση του δημιουργού και κατόχου των πνευματικών δικαιωμάτων.	en_US
dc.subject	Deep learning	en_US
dc.subject	Machine learning	en_US
dc.subject	Bayesian inference	en_US
dc.subject	Variational Bayes	en_US
dc.subject	Regularization	en_US
dc.title	Bayesian inference techniques for Deep Learning	en_US
dc.type	PhD Thesis	en_US
dc.affiliation	Cyprus University of Technology	en_US
dc.relation.dept	Department of Electrical Engineering, Computer Engineering and Informatics	en_US
dc.description.status	Completed	en_US
cut.common.academicyear	2018-2019	en_US
dc.relation.faculty	Faculty of Engineering and Technology	en_US
item.fulltext	With Fulltext	-
item.cerifentitytype	Publications	-
item.grantfulltext	open	-
item.openairecristype	http://purl.org/coar/resource_type/c_db06	-
item.openairetype	doctoralThesis	-
item.languageiso639-1	en	-
crisitem.author.dept	Department of Electrical Engineering, Computer Engineering and Informatics	-
crisitem.author.dept	Department of Electrical Engineering, Computer Engineering and Informatics	-
crisitem.author.faculty	Faculty of Engineering and Technology	-
crisitem.author.faculty	Faculty of Engineering and Technology	-
crisitem.author.orcid	0000-0002-8555-260X	-
crisitem.author.orcid	0000-0002-4956-4013	-
crisitem.author.parentorg	Faculty of Engineering and Technology	-
crisitem.author.parentorg	Faculty of Engineering and Technology	-
Εμφανίζεται στις συλλογές:	Διδακτορικές Διατριβές/ PhD Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
Abstract.pdf	Abstract	116.1 kB	Adobe PDF	Δείτε/ Ανοίξτε
Παρταουρίδης Χαράλαμπος.pdf	Πλήρες κείμενο	1.72 MB	Adobe PDF	Δείτε/ Ανοίξτε

CORE Recommender

Δείξε τη σύντομη περιγραφή του τεκμηρίου

Page view(s) 1

561

Last Week
0

Last month
12

checked on 17 Μαϊ 2024

Download(s) 10

346

checked on 17 Μαϊ 2024

Google Scholar^TM

Check

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα

Αρχεία σε αυτό το τεκμήριο:

Page view(s) 1

Download(s) 10

Google ScholarTM

Google Scholar^TM