Title:Recognizing Input Space and Target Concept Drifts with Scarcely Labelled and Unlabelled InstancesAuthor(s):Edwin Lughofer,  Eva Weigl,  Wolfgang Heidl,  Christian Eitzinger,  Thomas RadauerAbstract:Drift detection is an important issue in classification-based stream mining in order to be able to inform the operators in case of unintended changes in a system resp. to be able to increase flexibility of classifieer updates in case of intended changes. Usually, current detection approaches rely on the assumption to have fully supervised labeled streams available in order to monitor the (change in) classifier's performance. This is an unrealistic scenario in many on-line real-world applications as true class labels need to be known, usually requiring uncomfortable feedback efforts for operators working with the systems. We propose two ways to improve economy and applicability of current drift detection techniques: 1.) a semi-supervised approach employing single-pass active learning filters for selecting the most interesting samples for supervising classifier performance and 2.) a fully unsupervised approach based on the overlap degree of classifier's output certainty distributions, applicable to any unlabelled classification stream. In both variants, a specific handling of imbalanced class distributions in the streams is proposed, assuring to see also possible classifier behavior downtrends in under-represented classes. The statistical monitoring of classifier behavior relies on a modified version of the Page-Hinkley test, where a fading factor and an automatic thresholding concept (based on the Hoeffding bound) are introduced to make it more flexible for detecting successive drift occurrences in a stream. The approaches are compared with the fully supervised variant on two real-world on-line applications, including a systematic analysis of the capabilities of our methods. The semi-supervised approach is able to detect three real-occurring as well as artificially built-in drifts in these streams with a similar delay (of about 5-6 minutes) as the supervised variant, and this with only 10-20% actively selected samples.Journal:Information SciencesPublisher:ElsevierISSN:1872-6291Page Reference:page 127-151, 25 page(s)Publishing:2016Volume:355-356

go back