Data Lake Semantic Enrichment via Traditional Systems and LLMs

Papageorgiou, Panagiotis

Data Lake Semantic Enrichment via Traditional Systems and LLMs

Date Issued

May 2025

Author(s)

Papageorgiou, Panagiotis

Advisor

Andreou, Andreas S.

Abstract

In todays era, Big Data is often referred to as the ”new oil”. Businesses heavily rely on Data Lakes to
store massive amounts of heterogeneous data, however without proper metadata mechanisms in place,
these repositories turn into data swamps. This thesis explores alternative systems for traditional semantic
enrichment of data sources by adapting a pre-existing semantic blueprint model in Apache Hive and comparing
it in terms of insertion time, query performance and storage efficiency against a well established
system, Apache Jena. Experimental results show that Hive offers significant scalability and storage efficiency
benefits over Jena, however Jena is more suitable for small to medium-size Data Lakes that require
dynamic schema evolution, complex relationships between data sources and perform infrequent queries
for metadata retrieval. Additionally, this thesis explores the feasibility of LLM-driven approaches for
semantic enrichment by proposing two novel pipelines and evaluating four different configurations. The
results demonstrate that LLMs can be used as an alternative solution but often rely on high quality metadata
to produce maximum accuracy. Expert-curated metadata produced the highest accuracy and low
response times, while LLM-generated metadata offered a promising, semi-automated alternative with
important trade offs. Finally, FAISS-based retrieval excelled in reducing operational costs as well as
response times.

Subjects

Large Language Models...

FAISS

Data Lakes

Semantic Enrichment

File(s)

Name

Papageorgiou-BSC-2025-abstract.pdf

Size

139.81 KB

Format

Adobe PDF

Checksum (MD5)

bc2a4fff77934b2000badfb6c7395dde

Data Lake Semantic Enrichment via Traditional Systems and LLMs

Explore by

Useful Links

Copyright Policies

Deposit your work to Ktisis