AutoCache: Employing machine learning to automate caching in distributed file systems

Herodotou, Herodotos

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14279/19036

Title:	AutoCache: Employing machine learning to automate caching in distributed file systems
Authors:	Herodotou, Herodotos
Major Field of Science:	Natural Sciences
Field Category:	Computer and Information Sciences
Keywords:	Automated caching;Distributed file systems
Issue Date:	1-Jul-2019
Source:	IEEE 35th International Conference on Data Engineering Workshops, 2019, 8-12 April, Macao, China
Conference:	IEEE International Conference on Data Engineering Workshops
Abstract:	The use of computational platforms such as Hadoop and Spark is growing rapidly as a successful paradigm for processing large-scale data residing in distributed file systems like HDFS. Increasing memory sizes have recently led to the introduction of caching and in-memory file systems. However, these systems lack any automated caching mechanisms for storing data in memory. This paper presents AutoCache, a caching framework that automates the decisions for when and which files to store in, or remove from, the cache for increasing system performance. The decisions are based on machine learning models that track and predict file access patterns from evolving data processing workloads. Our evaluation using real-world workload traces from a Facebook production cluster compares our approach with several other policies and showcases significant benefits in terms of both workload performance and cluster efficiency.
URI:	https://hdl.handle.net/20.500.14279/19036
ISSN:	978-1-7281-0890-2
Rights:	© IEEE Attribution-NonCommercial-NoDerivatives 4.0 International
Type:	Conference Papers
Affiliation :	Cyprus University of Technology
Publication Type:	Peer Reviewed
Appears in Collections:	Δημοσιεύσεις σε συνέδρια /Conference papers or poster or presentation