Exploiting Metadata Semantics in Data Lakes Using Blueprints
Date Issued
January 1, 2023
Author(s)
DOI
10.1007/978-3-031-36597-3_11
Abstract
Smart processing of Big Data has been recently emerged as a field that provides quite a few challenges related to how multiple heterogeneous data sources that produce massive amounts of structured, semi-structured and unstructured data may be handled. One solution to this problem is manage this fusion of disparate data sources through Data Lakes. The latter, though, suffers from the lack of a disciplined approach to collect, store and retrieve data to support predictive and prescriptive analytics. This chapter tackles this challenge by introducing a novel standardization framework for managing data in Data Lakes that combines mainly the 5Vs Big Data characteristics and blueprint ontologies. It organizes a Data Lake using a ponds architecture and describes a metadata semantic enrichment mechanism that enables fast storing to and efficient retrieval. The mechanism supports Visual Querying and offers increased security via Blockchain and Non-Fungible Tokens. The proposed approach is compared against other known metadata systems utilizing a set of functional properties with very encouraging results.

