###
Journal of Software:2016.27(11):2870-2883

基于集成聚类的流量分类架构
鲁刚,余翔湛,张宏莉,郭荣华
(中国洛阳电子装备试验中心, 河南 洛阳 471003;哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001)
Traffic Classification Framework Based on Ensemble Clustering
LU Gang,YU Xiang-Zhan,ZHANG Hong-Li,GUO Rong-Hua
(Chinese Luoyang Electronic Equipment Center, Luoyang 471003, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 1157   Download 1259
Received:March 16, 2015    Revised:April 07, 2015
> 中文摘要: 流量分类是优化网络服务质量的基础与关键.机器学习算法利用数据流统计特征分类流量,对于识别加密私有协议流量具有重要意义.然而,特征偏置和类别不平衡是基于机器学习的流量分类研究所面临的两大挑战.特征偏置是指一些数据流统计特征在提高部分应用识别准确率的同时也降低了另外一部分应用识别的准确率.类别不平衡是指机器学习流量分类器对样本数较少的应用识别的准确率较低.为解决上述问题,提出了基于集成聚类的流量分类架构(traffic classification framework based on ensemble clustering,简称TCFEC).TCFEC由多个基于不同特征子空间聚类的基分类器和一个最优决策部件构成,能够提高流量分类的准确率.具体而言,与传统的机器学习流量分类器相比,TCFEC的平均流准确率最高提升5%,字节准确率最高提升6%.
Abstract:Traffic classification is the basis and key for optimizing network quality of service. Machine learning algorithms apply flow statistics in traffic classification, which are significant for identifying both encrypted and private traffic. However, the discriminator bias problem and the class imbalance problem are two main challenges in traffic classification. The discriminator bias problem denotes that some flow statistics can improve the accuracies for some applications but reduce the accuracies for other applications. The class imbalance problem denotes that machine learning based traffic classifier identifies the minority application with a low accuracy. To address the above two issues, traffic classification framework based on ensemble clustering (TCFEC) is proposed in this paper. TCFEC is composed of several base classifiers trained by clustering in different feature subspaces and an optimal decision component. It is able to improve accuracy in traffic classification. Specifically, compared with the traffic classifier based on traditional machine learning algorithms, TCFEC improves average flow accuracy by 5% as well as average byte accuracy by 6%.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61303061,61402485);高性能计算国家重点实验室开放课题(201513-01) 国家自然科学基金(61303061,61402485);高性能计算国家重点实验室开放课题(201513-01)
Foundation items:National Natural Science Foundation of China (61303061, 61402485); Open Fund from HPCL (201513-01)
Reference text:

鲁刚,余翔湛,张宏莉,郭荣华.基于集成聚类的流量分类架构.软件学报,2016,27(11):2870-2883

LU Gang,YU Xiang-Zhan,ZHANG Hong-Li,GUO Rong-Hua.Traffic Classification Framework Based on Ensemble Clustering.Journal of Software,2016,27(11):2870-2883