Journal of Software:2009.20(10):2692-2704

(中国科学院 软件研究所,北京 100190;中国科学院 研究生院,北京 100049)
Internet Traffic Classification Using C4.5 Decision Tree
XU Peng,LIN Sen
Chart / table
Similar Articles
Article :Browse 4211   Download 6416
Received:October 23, 2007    Revised:August 07, 2008
> 中文摘要: 近年来,利用机器学习方法处理流量分类问题成为网络测量领域一个新兴的研究方向.在现有研究中,朴素贝叶斯方法及其改进算法以其实现简单、分类高效的特点而被广泛应用.但此类方法过分依赖于样本在样本空间的分布,具有潜在的不稳定性.为此,引入C4.5决策树方法来处理流量分类问题.该方法利用训练数据集中的信息熵来构建分类模型,并通过对分类模型的简单查找来完成未知网络流样本的分类.理论分析和实验结果都表明,利用C4.5决策树来处理流量分类问题在分类稳定性上均具有明显的优势.
Abstract:In recent years, Internet traffic classification using machine learning has become a new direction in network measurement. Being simple and efficient Na?ve Bayes and its improved methods have been widely used in this area. But these methods depend too much on probability distribution of sample spacing, so they have connatural instability. To handle this problem, a new method based on C4.5 decision tree is proposed in this paper. This method builds a classification model using information entropy in training data and classifies flows just by a simple search of the decision tree. The theoretical analysis and experimental results show that there are obvious advantages in classification stability when C4.5 decision tree method is used to classify Internet traffic.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Basic Research Program of China under Grant No.2007CB307100 (国家重点基础研究发展计划(973)) Supported by the National Basic Research Program of China under Grant No.2007CB307100 (国家重点基础研究发展计划(973))
Foundation items:
Reference text:

徐 鹏,林 森.基于C4.5决策树的流量分类方法.软件学报,2009,20(10):2692-2704

XU Peng,LIN Sen.Internet Traffic Classification Using C4.5 Decision Tree.Journal of Software,2009,20(10):2692-2704