Edwin Lughofer, Mahardhika Pratama,
"On-line Active Learning in Data Stream Regression using Uncertainty Sampling based on Evolving Generalized Fuzzy Models"
, in IEEE Transactions on Fuzzy Systems, Vol. 26, Nummer 1, IEEE Press, Seite(n) 292-309, 2018, ISSN: 1941-0034
On-line Active Learning in Data Stream Regression using Uncertainty Sampling based on Evolving Generalized Fuzzy Models
Sprache des Titels:
In this paper, we propose three criteria for efficient sample selection in case of data stream regression
problems within an on-line active learning context. The selection becomes important whenever the target values, which guide the update of the regressors as well as the implicit model structures, are costly or time-consuming to measure and also in case when very fast models updates are
required to cope with stream mining real-time demands. Reducing the selected samples as much as possible while keeping the predictive accuracy of the models on a high level is thus a central challenge. This should be ideally achieved in unsupervised and single-pass manner. Our selection criteria rely on
three aspects: 1.) the extrapolation degree combined with the model?s non-linearity degree, which is measured in terms of a new specific homogeneity criterion among adjacent local approximators, 2.) the uncertainty in model outputs which can be measured in terms of confidence intervals using so-called
adaptive local error bars ? we integrate a weighted localization of an incremental noise level estimator
and propose formulas for on-line merging of local error bars; 3.) the uncertainty in model parameters
which is estimated by the so-called A-optimality criterion which relies on the Fisher information matrix.
The selection criteria are developed in combination with evolving generalized Takagi-Sugeno (TS) fuzzy
models (containing rules in arbitrarily rotated position), as it could be shown in previous publications
that these outperform conventional evolving TS models (containing axis-parallel rules). The results
based on three high-dimensional real-world streaming problems show that a model update based on
only 10%-20% selected samples can still achieve similar accumulated model errors over time to the
case when performing a full model update on all samples. This can be achieved with a negligible sensitivity on the size of the active learning latency buffer (ALLB).