"On-line Active Learning: A New Paradigm to Improve Practical Useability of Data Stream Modeling Methods"
, in Information Sciences, Vol. on-line and in press, Elsevier, 2017
On-line Active Learning: A New Paradigm to Improve Practical Useability of Data Stream Modeling Methods
Sprache des Titels:
The central purpose of this survey is to provide readers an insight into the recent advances and challenges in on-line active learning. Active learning has attracted the data mining and machine learning community since around 20 years. This is because it served for important purposes to increase practical applicability of machine learning techniques, such as (i) to reduce annotation and measurement costs for operators and measurement equipments, (ii) to reduce manual labelling effort for experts and (iii) to reduce computation time for model training. Almost all of the current techniques focus on the classical pool-based approach. For the on-line, stream mining case, the challenge is that the sample selection strategy has to operate in a fast, ideally single-pass manner. Some first approaches have been proposed during the last decade (starting from around 2005) with the usage of machine learning (ML) oriented incremental classifiers, which are able to update their parameters based on selected samples, but not their structures. Since 2012, on-line active learning concepts have been proposed in connection with the paradigm of evolving models, which are able to expand their knowledge into feature space regions so far unexplored. This opened the possibility to address a particular type of uncertainty, namely that one which stems from a significant novelty content in streams, as, e.g., caused by drifts, new operation modes, changing system behaviors or non-stationary environments. We will provide an overview about the concepts and techniques for sample selection and active learning within these two principal major research lines (incremental ML models versus evolving systems), a comparison of their essential characteristics and properties (raising some advantages and disadvantages), and a study on possible evaluation techniques for them.