Journal of Software:2010.21(zk):194-204

(山东大学 计算机科学与技术学院,山东 济南 250101)
Dimensional Density and Clustering in Scatterplots
TANG Lei,LI Xue-Qing,LIU Yang
(School of Computer Science and Technology, Shandong University, Ji’nan 250101, China)
Received:July 20, 2010    Revised:November 03, 2010
> 中文摘要: 散点图矩阵由于其简单有效的优点而成为开发大规模数据集的一种流行和广泛使用的方法.然而,这种技术存在着一些缺陷,在处理大规模数据时,可能会因为数据点的交叉重叠产生视图混乱现象.另外,这种技术很难表现除二维分布之外的其他信息.为了解决上述问题,对当前的散点图技术进行了改进和扩展:a) 利用overview+detail 技术同时展现全局信息和局部信息;b) 利用聚类算法对散点图中的数据进行分组,避免视图混乱.c) 用棒状轴代替直线轴表达各维的数据分布密度,表现更多信息特性.d) 用直方图作为另一种方法表现各维密度信息.e) 开发了一些交互技术来调整视图.最后,设计了一组实验来说明该方法的正确性和有效性.该方法适用于工业,金融业等领域的大规模多维数据集的展示和分析.
中文关键词: 散点图  可视化  密度  直方图  聚类  维度
Abstract:Scatterplots matrix is still one of the most popular and widely used approaches to explore multi-dimensional datasets with the advantages of simplicity and clarity. However, this technique is suffering from some shortages. It will result in clutter when displaying large complex datasets, because the data points overlap each other. In addition, it’s difficult to convey more information except the distributions between two dimensions. This paper improves and extends the current scatterplots to address these shortcomings. a) It glances at the scatterplots matrix and emphasize its single unit by overview + detail. b) It uses clustering algorithm to divide all the points in a scatterplots into several groups to avoid confusion. c) Bar axis instead of line axis is used to illustrate the density on each dimension, conveying more information. d) Histogram is another approach to express the same data feature with bar axis. e) Several interaction techniques are adopted to adjust the visualization. Finally, some scenarios are created to argue that this approach is available and effective. This approach is helpful in visualizing and analyzing the large complex data sets in the area of finance and industry.
基金项目:Supported by the Science and Technology R&D Program of Shandong Province of China under Grant No.2010G0020114 (山东省科技攻关计划) Supported by the Science and Technology R&D Program of Shandong Province of China under Grant No.2010G0020114 (山东省科技攻关计划)
TANG Lei,LI Xue-Qing,LIU Yang.Dimensional Density and Clustering in Scatterplots.Journal of Software,2010,21(zk):194-204