"Implementing Query Operations for Knowledge Graph OLAP in Apache Spark"
, in Masterarbeit am Institut für Wirtschaftsinformatik - Data & Knowledge Engineering, Betreuung: Assoz.-Prof. Mag. Dr. Christoph Schütz, unter Anleitung von Bashar Ahmad, MSc, 2-2023
Implementing Query Operations for Knowledge Graph OLAP in Apache Spark
Sprache des Titels:
Knowledge Graph OLAP combines the concept of knowledge graphs (KG) and a multidimensional view on data as employed in online analytical processing (OLAP). KG-OLAP cubes contain knowledge in the form of RDF triples that are context-dependent, defined through hierarchically structured dimen-sions creating contextualized knowledge graphs. The model enables contextual and graph operations on the data for various kinds of analyses. A SPARQL-based implementation has proven not to be applicable for big volumes of data, accentuating the need for a scalable implementation. This thesis therefore aims at providing an implementation that is scalable for large amounts of data within the KG-OLAP setting and can perform the required graph operations on contextualized knowledge in the form of RDF data. Consequently, a prototypical implementation using the distributed processing framework Apache Spark is proposed that executes KG-OLAP graph operations on RDF quadruples. More spe-cifically, the graph processing framework GraphX built on top of Spark is used. Thus, RDF quadruples are mapped to the Apache Spark GraphX graph representation. The Java implementation then allows for the construction of the initial graph from the RDF source data as well as for performing the following KG-OLAP graph operations on the base graph: individual-generating abstraction, triple-generating ab-straction, value-generating abstraction, reification and pivot. The functionality and applicability of the Spark-based prototype is demonstrated in experiments on a provided large benchmark dataset con-taining data regarding air traffic management.