Andreas Langegger,
"A Flexible Architecture for Virtual Information Integration based on Semantic Web Concepts"
, 2009
Original Titel:
A Flexible Architecture for Virtual Information Integration based on Semantic Web Concepts
Sprache des Titels:
Englisch
Original Kurzfassung:
In this dissertation a novel approach for virtual information integration based on Semantic Web technologies is presented. Compared to traditional approaches based on the relational model, the system is able to integrate distributed, heterogeneous data sources based on ontologies. This strategy enables a concept-based integration approach, where data is described based on its meaning, instead of a functional data model. The system is based on the mediator-wrapper architecture. Wrappers are used to translate source data to the Resource Description Framework (RDF), which is the core data model of the SemanticWeb. In order to accurately represent all the integrated information from different kinds of information systems (relational databases, XML files and databases, spreadsheets, CSV files, web services, etc.), the global metamodel requires a high level of expressiveness, which RDF provides.
The proposed approach is very flexible, since no explicit global schema needs to be maintained and data sources can be easily added and removed. A major contribution is the federation approach based on RDF graph statistics, which are generated by a sub-component called RDFStats. Based on histograms, it is possible to estimate query pattern cardinalities offline which enables scalable query federation and optimization at the mediator. Two other contributions are the optimization of D2R-Server, an RDF wrapper for relational database systems and XLWrap, which is currently the only spreadsheet-to-RDF wrapper that is able to wrap any spreadsheet layouts (including multidimensional cross tables) to arbitrary RDF target graphs.
Combined with latest research towards new graphical user interfaces for linked data on the Semantic Web, the proposed approach is well suited for large-scale collaborative knowledge sharing in research as well as in industry.