Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.14279/35008| Title: | Data Lake Semantic Enrichment via Traditional Systems and LLMs | Authors: | Papageorgiou, Panagiotis | Keywords: | Large Language Models;FAISS;Data Lakes;Semantic Enrichment | Advisor: | Andreou, Andreas S. | Issue Date: | May-2025 | Department: | Department of Electrical Engineering, Computer Engineering and Informatics | Faculty: | Faculty of Engineering and Technology | Abstract: | In todays era, Big Data is often referred to as the ”new oil”. Businesses heavily rely on Data Lakes to store massive amounts of heterogeneous data, however without proper metadata mechanisms in place, these repositories turn into data swamps. This thesis explores alternative systems for traditional semantic enrichment of data sources by adapting a pre-existing semantic blueprint model in Apache Hive and comparing it in terms of insertion time, query performance and storage efficiency against a well established system, Apache Jena. Experimental results show that Hive offers significant scalability and storage efficiency benefits over Jena, however Jena is more suitable for small to medium-size Data Lakes that require dynamic schema evolution, complex relationships between data sources and perform infrequent queries for metadata retrieval. Additionally, this thesis explores the feasibility of LLM-driven approaches for semantic enrichment by proposing two novel pipelines and evaluating four different configurations. The results demonstrate that LLMs can be used as an alternative solution but often rely on high quality metadata to produce maximum accuracy. Expert-curated metadata produced the highest accuracy and low response times, while LLM-generated metadata offered a promising, semi-automated alternative with important trade offs. Finally, FAISS-based retrieval excelled in reducing operational costs as well as response times. | URI: | https://hdl.handle.net/20.500.14279/35008 | Rights: | Attribution-NonCommercial-NoDerivatives 4.0 International | Type: | Bachelors Thesis | Affiliation: | Cyprus University of Technology |
| Appears in Collections: | Πτυχιακές Εργασίες/ Bachelor's Degree Theses |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Papageorgiou-BSC-2025-abstract.pdf | abstract | 139.81 kB | Adobe PDF | View/Open |
CORE Recommender
Page view(s)
84
Last Week
2
2
Last month
41
41
checked on Nov 11, 2025
Download(s)
36
checked on Nov 11, 2025
Google ScholarTM
Check
This item is licensed under a Creative Commons License

