###
DOI:
Journal of Software:2005.16(7):1252-1261

数据流上的预测聚集查询处理算法
李建中,郭龙江,张冬冬,王伟平
(哈尔滨工业大学,计算机科学与技术学院,黑龙江,哈尔滨,150001)
Processing Algorithms for Predictive Aggregate Queries over Data Streams
LI Jian-Zhong,GUO Long-Jiang,ZHANG Dong-Dong,WANG Wei-Ping
()
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 3497   Download 5897
Received:May 17, 2004    Revised:February 03, 2005
> 中文摘要: 实时数据流未来趋势的预测具有重要的实际应用意义.例如,在环境监测传感器网络中,通过对感知数据流进行预测聚集查询,观察者可以预测网络覆盖的区域在未来一段时间内的平均温度和湿度,以确定是否会发生异常事件.目前的研究工作多数集中在数据流上当前数据的查询,数据流上预测查询的研究工作还很少.采用多元线性回归方法,给出了数据流上的聚集值预测模型,提出了一种数据流预测聚集查询处理方法.当预测失败的次数大于预先给定的阈值时,给出了一种预测模型自动调整策略,以降低预测误差.还提出了滑动窗口的更新周期、数据流的流速对预测精度影响的数学模型.理论分析与实验结果表明,提出的预测聚集查询处理算法具有较高的性能,并且能够返回满足用户精度要求的预测查询结果.在实验中,采用TPC-H国际标准测试数据和TAO(tropical atmosphere ocean)测量的海洋表面空气温度数据来构造数据流.
Abstract:It is very important in a lot of applications to forecast future trend of data streams. For example, using predictive queries to a sensor network for monitoring environment, observers can forecast future average temperature and humidity in the area covered by the network to determine abnormal events. Recent works on query processing over data streams mainly focused on approximate queries over newly arriving data. To the best of the knowledge, there is nothing to date in the literature on predictive query processing over data streams. Adopting multivariable linear regression, a predictive mathematical model for forecasting the aggregate value over data streams is first proposed. Then, based on the model, a predictive aggregate query processing method over data streams is proposed in the paper. When the frequency of forecast failing is greater than a predefined threshold, an adaptive strategy for the predictive mathematical model is proposed. A mathematical model that characterizes the affects of the updating cycle of sliding window and data stream rate on predictive accuracy is also presented.Analytical and experimental results show that the proposed method is very effective, and the proposed algorithms have higher performance and provide better prediction of aggregate values over data streams to users. In experiments the TPC-H data and ocean air temperature data measured by TAO (tropical atmosphere ocean) are used to construct data streams.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60473075 (国家自然科学基金); the Key Project of the Natural Science Foundation of Heilongjiang Province under Grant No.ZJG03-05 (黑龙江省自然科学基金重点项目) Supported by the National Natural Science Foundation of China under Grant No.60473075 (国家自然科学基金); the Key Project of the Natural Science Foundation of Heilongjiang Province under Grant No.ZJG03-05 (黑龙江省自然科学基金重点项目)
Foundation items:
Reference text:

李建中,郭龙江,张冬冬,王伟平.数据流上的预测聚集查询处理算法.软件学报,2005,16(7):1252-1261

LI Jian-Zhong,GUO Long-Jiang,ZHANG Dong-Dong,WANG Wei-Ping.Processing Algorithms for Predictive Aggregate Queries over Data Streams.Journal of Software,2005,16(7):1252-1261