Lisa Ehrlinger, Bernhard Werth, Wolfram Wöß,
"Automating Data Quality Monitoring with Reference Data Profiles"
, in Alfredo Cuzzocrea, Oleg Gusikhin, Slimane Hammoudi, Christoph Quix: Data Management Technologies and Applications, Serie Communications in Computer and Information Science, Vol. 1860, Springer Nature Switzerland, Cham, Switzerland, Seite(n) 24-44, 2023, ISBN: 978-3-031-37890-4
Original Titel:
Automating Data Quality Monitoring with Reference Data Profiles
Sprache des Titels:
Englisch
Original Buchtitel:
Data Management Technologies and Applications
Original Kurzfassung:
Data quality is of central importance for the qualitative evaluation of decisions taken by AI-based applications. In practice, data from several heterogeneous data sources is integrated, but complete, global domain knowledge is often not available. In such heterogeneous scenarios, it is particularly difficult to monitor data quality (e.g., completeness, accuracy, timeliness) over time. In this paper, we formally introduce a new data-centric method for automated data quality monitoring, which is based on reference data profiles. A reference data profile is a set of data profiling statistics that is learned automatically to model the target quality of the data. In contrast to most existing data quality approaches that require domain experts to define rules, our method can be fully automated from initialization to continuous monitoring. This data-centric method has been implemented in our data quality tool DQ-MeeRKat and evaluated with six real-world telematic device data streams.
Sprache der Kurzfassung:
Englisch
Veröffentlicher:
Springer Nature Switzerland
Verlagsanschrift:
Cham, Switzerland
Serie:
Communications in Computer and Information Science