Michal Bechny, Florian Sobieczky, Jürgen Zeindl, Lisa Ehrlinger,
"Missing Data Patterns: From Theory to an Application in the Steel Industry"
: SSDBM 2021: 33rd International Conference on Scientific and Statistical Database Management, Association for Computing Machinery, New York, NY, USA, Seite(n) 214?219, 2021, ISBN: 9781450384131
Missing Data Patterns: From Theory to an Application in the Steel Industry
Sprache des Titels:
SSDBM 2021: 33rd International Conference on Scientific and Statistical Database Management
Missing data (MD) is a prevalent problem and can negatively affect the trustworthiness of data analysis. In industrial use cases, faulty sensors or errors during data integration are common causes for systematically missing values. The majority of MD research deals with imputation, i.e., the replacement of missing values with ?best guesses?. Most imputation methods require missing values to occur independently, which is rarely the case in industry. Thus, it is necessary to identify missing data patterns (i.e., systematically missing values) prior to imputation (1) to understand the cause of the missingness, (2) to gain deeper insight into the data, and (3) to choose the proper imputation technique. However, in literature, there is a wide varity of MD patterns without a common formalization. In this paper, we introduce the first formal definition of MD patterns. Building on this theory, we developed a systematic approach on how to automatically detect MD patterns in industrial data. The approach has been developed in cooperation with voestalpine Stahl GmbH, where we applied it to real-world data from the steel industry and demonstrated its efficacy with a simulation study.