广东省基础与应用基础研究基金(2021A1515110761); 中央高校基本科研业务费专项(N2104002, N2016009)
时间序列分割是数据挖掘领域中的一个重要研究方向. 目前基于矩阵轮廓(matrix profile, MP)的时间序列分割技术得到了越来越多研究人员的关注, 并且取得了不错的研究成果. 不过该技术及其衍生算法仍然存在不足: 首先, 基于矩阵轮廓的快速低代价语义分割算法中对给定活动状态的时间序列分割时, 最近邻之间通过弧进行连接, 会出现弧跨越非目标活动状态匹配相似子序列问题; 其次, 现有提取分割点算法在提取分割点时采用给定长度窗口, 容易得到与真实值偏差较大的分割点, 降低准确性. 针对以上问题, 提出一种限制弧跨越的时间序列分割算法(limit arc curve cross-FLOSS, LAC-FLOSS), 该算法给弧添加权重, 形成一种带权弧, 并通过设置匹配距离阈值解决弧的跨状态子序列误匹配问题. 此外, 提出一种改进的提取分割点算法(improved extract regimes, IER), 它通过纠正弧跨越(corrected arc crossings, CAC)序列的形状特性, 从波谷中提取极值, 避免直接使用窗口在非拐点处取到分割点的问题. 在公开数据集datasets_seg和MobiAct上面进行对比实验, 验证以上两种解决方案的可行性和有效性.
Time series segmentation is an important research direction in the field of data mining. At present, the time series segmentation technique based on matrix profile (MP) has received increasing attention from researchers and has achieved great research results. However, this technique and its derivative algorithms also have their own short comings. For one thing, the matching of similar subsequences in the case of arcs crossing non-target activity states arises when the fast low-cost semantic segmentation algorithm based on MP is employed for time series segmentation of a given activity state and the nearest neighbors are connected by arcs. For another, the existing segmentation point extraction algorithm uses a given length window when extracting segmentation points. In this case, the segmentation points obtained are highly likely to exhibit large deviations from the real values, which reduces the accuracy. To address the above problems, this study proposes a time series segmentation algorithm limiting the arc cross, namely limit arc curve cross-FLOSS (LAC-FLOSS). This algorithm adds weights to arcs to obtain a kind of weighted arcs and solves the subsequence mismatch problem caused by the state crossing of the arcs by setting a matching distance threshold. In addition, an improved segmentation point extraction algorithm, namely, the improved extract regimes (IER) algorithm, is proposed. This algorithm extracts the extremes from the troughs according to the shape properties of the sequence of corrected arc crossings (CAC), thereby avoiding the problem that segmentation points are obtained at non-inflection points when the windows are used directly. Comparative experiments are conducted on the public datasets datasets_seg and MobiAct, and the results verify the feasibility and effectiveness of the above two solutions.