Automating distributed tiered storage management in cluster computing

Herodotou, Herodotos; Kakoulli, Elena

doi:10.14778/3357377.3357381

Automating distributed tiered storage management in cluster computing

Journal

Proceedings of the VLDB Endowment

Date Issued

2020

Author(s)

Herodotou, Herodotos

Kakoulli, Elena

DOI

10.14778/3357377.3357381

Abstract

Data-intensive platforms such as Hadoop and Spark are routinely used to process massive amounts of data residing on distributed le systems like HDFS. Increasing memory sizes and new hardware technologies (e.g., NVRAM, SSDs) have recently led to the introduction of storage tiering in such settings. However, users are now burdened with the additional complexity of managing the multiple storage tiers and the data residing on them while trying to optimize their workloads. In this paper, we develop a general framework for automatically moving data across the available storage tiers in distributed le systems. Moreover, we employ machine learning for tracking and predicting le access patterns, which we use to decide when and which data to move up or down the storage tiers for increasing system performance. Our approach uses incremental learning to dynamically rene the models with new le accesses, allowing them to naturally adjust and adapt to workload changes over time. Our extensive evaluation using realistic workloads derived from Facebook and CMU traces compares our approach with several other policies and showcases signicant bene ts in terms of both workload performance and cluster effciency.

Subjects

Hadoop

Distributed File Syst...

Mapreduce

File(s)

Name

3357377.3357381.pdf

Size

974.4 KB

Format

Adobe PDF

Checksum (MD5)

a6eefb24d7e41d8e9c8a7a040d383da1

Automating distributed tiered storage management in cluster computing

Explore by

Useful Links

Copyright Policies

Deposit your work to Ktisis