Edwin Lughofer,
"Hybrid Active Learning (HAL) for Reducing Annotation Effort of Operators at Classification Systems"
, in Pattern Recognition, Vol. 45, Nummer 2, Seite(n) 884-896, 2012, ISSN: 1873-5142
Original Titel:
Hybrid Active Learning (HAL) for Reducing Annotation Effort of Operators at Classification Systems
Sprache des Titels:
Englisch
Original Kurzfassung:
Active learning is understood as any form of learning in which the learning algorithm has some control over the
input samples due to a specific sample selection process based on which it builds up the model. In this paper, we propose a novel active learning strategy for
data-driven classifiers, which is based on unsupervised criterion during off-line training phase, followed by a supervised certainty-based criterion
during incremental on-line training. In this sense, we call the new strategy hybrid active learning. Sample selection in the first phase is conducted from
scratch (i.e. no initial labels/learners are needed) based on purely unsupervised criteria obtained from clusters: samples lying near cluster centers and near the
borders of clusters are expected to represent the most informative ones regarding the distribution
characteristics of the classes. In the second phase, the task is to update already trained classifiers during on-line mode with the most important samples in order
to dynamically guide the classifier to more predictive power. Both strategies are essential for reducing the annotation and supervision effort of operators in off-line
and on-line classification systems, as operators only have to label an exquisite subset of the off-line training data resp. give feedback only on specific occasions
during on-line phase. The new active learning strategy is evaluated based on real-world data sets from UCI repository and collected at on-line quality control systems.
The results show that an active learning based selection of training samples 1.) does not weaken the classification accuracies compared to when using all samples
in the training process and 2.) can out-perform classifiers which are built on randomly selected data samples.