| P.O.Box 8718, Beijing 100080, China | Journal of Software, Sept. 2004,15(9):1328-1335 |
| E-mail: jos@iscas.ac.cn | ISSN 1000-9825, CODEN RUXUEW, CN 11-2560/TP |
| http://www.jos.org.cn | Copyright © 2004 by The Editorial Department of Journal of Software |
基于多示例学习的中文Web目录页面推荐
黎 铭, 薛晓冰, 周志华
黎 铭, 薛晓冰, 周志华
(南京大学 计算机软件新技术国家重点实验室,江苏 南京 210093)
作者简介: 黎铭(1980-),男,湖南长沙人,硕士生,主要研究领域为机器学习,数据挖掘;薛晓冰(1982-),男,硕士生,主要研究领域为机器学习,数据挖掘;周志华(1973-),男,博士,教授,博士生导师,主要研究领域为机器学习,数据挖掘,模式识别,信息检索,神经计算.
联系人:
周志华 Phn: +86-25-83686268, E-mail: zhouzh@nju.edu.cn, http://cs.nju.edu.cn/people/zhouzh/
Received
2004-02-12; Accepted
2004-05-09
Abstract
Multi-Instance learning provides a new way to the mining of Chinese web pages. In this paper, a particular web mining task, i.e. Chinese web index page recommendation, is presented and then addressed through transforming it to a multi-instance learning problem. Experiments on the real world dataset show that the proposed method is an effective solution to the Chinese web index page recommendation problem.
Li M, Xue XB, Zhou ZH. Chinese Web index page recommendation based on multi-instance learning.
Journal of Software, 2004,15(9):1328~1335.
http://www.jos.org.cn/1000-9825/15/1328.htm
摘要
多示例学习为中文Web挖掘提供了一种新的思路.提出中文Web目录页面推荐这种特殊的Web挖掘任务,并且将其转化为多示例学习问题来解决.在真实世界数据集上的实验结果显示,该方法能够有效地解决该问题.
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60105004 (国家自然科学基金); the National Outstanding Youth Foundation of China under Grant No. 60325207 (国家杰出青年科学基金); the National Grand Fundamental Research 973 Program of China under Grant No.2002CB312002 (国家重点基础研究发展规划(973))
References:
[1] Etzioni O. The world wide web: Quagmire or gold mine. Communications of the ACM, 1996,39(11):65~68.
[2] Kosala R, Blockeel H. Web mining research: A survey. ACM SIGKDD Explorations, 2000,2(1):1~15.
[3] Dietterich TG, Lathrop RH, Lozano-Pérez T. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 1997,89(1-2):31~71.
[4] Maron O. Learning from ambiguity [Ph.D. Thesis]. Cambridge: Massachusetts Institute of Technology, 1998.
[5] Maron O, Lozano-Pérez T. A framework for multiple-instance learning. In: Jordan MI, Kearns MJ, Solla SA, eds. Advances in Neural Information Processing Systems 10. Cambridge: MIT Press, 1998. 570~576.
[6] Wang J, Zucker JD. Solving the multiple-instance problem: A lazy learning approach. In: Langley P, ed. Proc. of 17th Int'l Conf. on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 2000. 1119~1125.
[7] Chevaleyre Y, Zucker JD. Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In: Stroulia E, Matwin S, eds. Lecture Notes in Artificial Intelligence 2056, Berlin: Springer-Verlag, 2001. 204~214.
[8] Zhou ZH, Zhang ML. Solving the multi-instance problem with neural networks. Technical Report, Nanjing: AI Laboratory, Department of Computer Science and Technology, Nanjing University, 2002.
[9] Zhou ZH, Zhang ML. Ensembles of multi-instance learners. In: Lavrac N, Gamberger D, Blockeel H, Todorovski L, eds. Lecture Notes in Artificial Intelligence 2837, Berlin: Springer-Verlag, 2003. 492~502.
[10] Long PM, Tan L. PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples. Machine Learning, 1998,30(1):7~21.
[11] Auer P, Long PM, Srinivasan A. Approximating hyper-rectangles: Learning and pseudo-random sets. Journal of Computer and System Sciences, 1998,57(3):376~388.
[12] Han KS, Wang YC, Chen GL. Research on fast high-frequency strings extracting and statistics algorithm with no thesaurus. Journal of Chinese Information Processing, 2001,15(2):23~30 (in Chinese with English abstract).
[13] Jin XY, Sun ZX, Zhang FY. A domain-independent dictionary-free lexical acquisition model for Chinese document. Journal of Chinese Information Processing, 2001,15(6):33~39 (in Chinese with English abstract).
[14] Aha DW. Lazy learning: Special issue editorial. Artificial Intelligence Review, 1997,11(1-5):7~10.
[15] Edgar GA. Measure, Topology, and Fractal Geometry. Berlin: Springer-Verlag, 1990.
[16] Joachims T. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Fisher D, ed. Proc. of the 14th Int'l Conf. on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1997. 143~151.
附中文参考文献:
[12] 韩客松,王永成,陈桂林.无词典高频字串快速提取和统计算法研究.中文信息学报,2001,15(2):23~30.
[13] 金翔宇,孙正兴,张福炎.一种非受限中文文档抽词方法.中文信息学报,2001,15(6):33~39.