Unsupervised Drift Detection on High-Speed Data Streams

Abstract

Changes in data distribution of streaming data (i.e., concept drifts), constitute a central issue in online data mining. The main reason is that these changes are responsible for outdating stream learning models, reducing their predictive performance over time. A common approach adopted by real-time adaptive systems to deal with concept drifts is to employ detectors that indicate the best time for updates. However, an unrealistic assumption of most detectors is that the labels become available immediately after data arrives. In this paper, we introduce an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams in realistic scenarios with the scarcity of labels. We propose a straightforward two-dimensional representation of the data aiming faster processing for detection. We develop a simple adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts, including abrupt, oscillating, and incremental. Experimental evaluation demonstrates the versatility of the method in several domains, including astronomy, entomology, public health, political science, and medical science.

Publication
In 2020 IEEE International Conference on Big Data(IEEE BigData), 2020