AWarp: Warping Distance for Sparse Time Series


Abdullah Mueen, Nikan Chavoshi, Noor Abu-El-Rub, Hossein Hamooni, Amanda Minnich


Dynamic Time Warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced non-zero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively large computational cost for long time series.

We derive a new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of application include bot detection, human activity classification, search trend analysis, seismic analysis, and unusual review pattern mining.


Pdf version of the paper is available here.

Source Code:

The source code can be downloaded here.

The main part of the project is implemented in C++. In order to run the code, you should use "mex" function in Matlab like the following (after unzipping

mex 'AWarp.cpp' mex 'constrainedAWarp.cpp'

Given two run length encoded time series X and Y:

To calculate global AWarp distance by using UBCost algorithm:


To calculate global AWarp distance by using LBCost algorithm:


To calculate costrained AWarp distance:


where win is the size of the window in number of points.


X = [1 2 3 -1 1]; %run length encoded format
Y = [1 -2 4 1]; %run length encoded format
ans = 2.8284
ans = 2.4495
ans = 2.8284


A sample of the house dataset (used in the paper) can be downloaded here. The first number in each row is the label, and the remaining of the row is the time series of the washing machine's power consumption (in a day) in run length encoded format.

A sample of the Twitter dataset can be downloaded here. Each row is the activity of a Twitter user (millisecond resolution) in run length encoded format.