###
Journal of Software:2012.23(6):1500-1516

分类不平衡协议流的机器学习算法评估与比较
张宏莉,鲁刚
(哈尔滨工业大学 计算机科学与技术学院 计算机网络与信息安全技术研究中心,黑龙江 哈尔滨 150001)
Machine Learning Algorithms for Classifying the Imbalanced Protocol Flows: Evaluation and Comparison
ZHANG Hong-Li,LU Gang
(Computer Network and Information Security Technology Research Center, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 3506   Download 4390
Received:June 24, 2010    Revised:June 20, 2011
> 中文摘要: 网络协议流不平衡环境下,流样本分布的变化对基于机器学习的流量分类器准确性及稳定性有较大的影响.选择合适的机器学习算法以适应网络协议流不平衡环境下的在线流量分类,显得格外重要.为此,首先通过单因子实验设计,验证了C4.5 决策树、贝叶斯核估计(NBK)和支持向量机(SVM)这3 种分类算法统计TCP 连接开始的前4 个数据包足以分类流量.接着,比较了上述3 种分类算法的性能,发现C4.5 决策树的测试时间最短,SVM 分类算法最稳定.然后,将Bagging 算法应用到流量分类中.实验结果表明,Bagging 分类算法的稳定性与SVM 相似,且测试时间与建模时间接近于C4.5 决策树,因此更适于在线分类流量.
Abstract:In the case of the imbalanced protocol flows, the changes of flow distribution have a huge impact on the accuracy and stability of traffic classifiers that use machine learning algorithms. It is very important to select a suitable machine learning algorithm to classify the imbalanced protocol flows on line. By means of single-factor experiment design, this paper verifies that it is possible for C4.5 decision tree, Na?ve Bayes with kernel density estimation (NBK) and support vector machine (SVM) to classify traffic with the first four packets of the TCP connection. After comparing the performances of the three classifiers abovementioned, the study finds that the testing time of C4.5 decision tree is the shortest and SVM is the most stable. Finally, Bagging algorithm is applied to classify traffic. The experimental results show that, the stability of Bagging is similar to SVM and the testing time and modeling time of Bagging is close to C4.5 decision tree. Therefore, Bagging classifier is the most suitable to classify traffic on line.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(60903166); 国家重点基础研究发展计划(973)(2007CB311101, 2011CB302605); 国家高技术研究发展计划(863)(2010AA012504, 2011AA010705) 国家自然科学基金(60903166); 国家重点基础研究发展计划(973)(2007CB311101, 2011CB302605); 国家高技术研究发展计划(863)(2010AA012504, 2011AA010705)
Foundation items:
Reference text:

张宏莉,鲁刚.分类不平衡协议流的机器学习算法评估与比较.软件学报,2012,23(6):1500-1516

ZHANG Hong-Li,LU Gang.Machine Learning Algorithms for Classifying the Imbalanced Protocol Flows: Evaluation and Comparison.Journal of Software,2012,23(6):1500-1516