| P.O.Box 8718, Beijing 100080, China | Journal of Software, February 2008,19(2):257-266 |
| E-mail: jos@iscas.ac.cn | ISSN 1000-9825, CODEN RUXUEW, CN 11-2560/TP |
| http://www.jos.org.cn | Copyright © 2008 by Journal of Software |
基于知识的Deep Web集成环境变化处理的研究
徐和祥, 王鑫印, 王述云, 胡运发
Abstract
Based on the research on the dependence of the components in the deep Web integration (executive partial order and knowledge dependency), a knowledge-based method is given to process the changes in such integration, which includes environmental changes processing model, a self-adaptive software architecture and algorithm. This method can provide a reference to the further research or toward application for the large-scale deep Web integration. The experimental results show that the method can not only process the changes, but also highly improve the performance of the integrated system.
Xu HX, Wang XY, Wang SY, Hu YF. Study on environmental changes processing in deep Web integration based on knowledge.
Journal of Software, 2008,19(2):257?266.
DOI:
10.3724/SP.J.1001.2008.00257
http://www.jos.org.cn/1000-9825/19/00257.htm
摘要
研究了Deep Web集成环境中构件的依赖关系(执行偏序依赖和知识依赖),并在此基础上提出了一种基于知识的环境变化的处理方法,包括Deep Web集成环境变化处理模型以及适应Deep Web环境变化的动态体系结构和处理算法,可以对大规模Deep Web集成的进一步探索和走向应用提供参考.实验结果表明,该方法不仅可以处理Deep Web环境的变化,还可以大幅度提高集成系统的性能.
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60473070 (国家自然科学基金)
References:
[1] He B, Patel M, Zhang Z, Chang KCC. Accessing the deep Web. Communications of the ACM, 2007,50(5):95-101.
[2] Bergman MK. The deep Web: Surfacing hidden value. Technical Report, BrightPlanet LLC, 2001. http://www.brightplanet.com/pdf/deepwebwhitepaper.pdf
[3] Chang KCC, He B, Li CK, Patel M, Zhang Z. Structured databases on the Web: Observations and implications. SIGMOD Record, 2004,33(3):61-70.
[4] Zhang Z, He B, Chang KCC. Understanding Web query interfaces: Best effort parsing with hidden syntax. In: Proc. of the SIGMOD Conf. 2004. Paris: ACM Press, 2004. 107-118.
[5] He B, Chang KCC. Statistical schema matching across Web query interfaces. In: Proc. of the SIGMOD Conf. 2003. San Diego: ACM Press, 2003. 217-228.
[6] He B, Chang KCC, Han J. Discovering complex matching across Web query interfaces: A correlation mining approach. In: Proc. of the SIGKDD Conf. 2004. Seattle: ACM Press, 2004. 148-157.
[7] He B, Chang KCC. Automatic complex schema matching across Web query interfaces: A correlation mining approach. ACM Trans. on Database Systems, 2006,13(1):1-45.
[8] Wu WS, Yu C, Doan AH, Meng WY. An interactive clustering based approach to integrating source query interfaces on the deep Web. In: Proc. of the SIGMOD Conf. 2004. Paris: ACM Press, 2004. 95-106.
[9] He B, Tao T, Chang KCC. Organizing structured Web sources by query schemas: A clustering approach. In: Proc. of the CIKM 2004. Washington: ACM Press, 2004. 22-31.
[10] He H, Meng WY, Yu C, Wu ZH. Wise-Integrator: An automatic integrator of Web search interfaces for e-commerce. In: Proc. of the VLDB Conf. 2003. Berlin: VLDB Endowment, 2003. 357-368.
[11] Kabra G, Li CK, Chang KCC. Query routing: Finding ways in the maze of the deep Web. In: Proc. of the 2005 Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIRI 2005). IEEE CNF, 2005. 64-73.
[12] Zhang Z, He B, Chang KCC. Light-Weight domain-based form assistant: Querying Web databases on the fly. In: Proc. of the 31st VLDB Conf. Trondheim, 2005. 97-108.
[13] Arasu A, Garcia-Molina H. Extracting structured data from Web pages. In: Proc. of the SIGMOD Conf. 2003. San Diego: ACM Press, 2003. 337-348.
[14] Crescenzi V, Mecca G, Merialdo P. Roadrunner: Towards automatic data extraction from large Web sites. In: Proc. of the VLDB Conf. Rome: VLDB Endowment, 2001. 109-118.
[15] Chang KCC, He B, Zhang Z. Toward large scale integration: Building a metaquerier over databases on the Web. In: Proc. of the 2nd Int'1Conf. on Innovative Data Systems Research. Asilomar, 2005. 44-55.
[16] Liu W, Li X, Ling YY, Zhang XY, Meng XF. A deep Web data integration system for job search. Wuhan University Journal of Natural Sicences, 2006,11(5):1197-1201.
[17] Ullman JD. Principles of Database and Knowledge: Base Systems, Vol.1. Stanford: Computer Science Press, 1988.
[18] Mei H, Shen JR. Progress of research on software architecture. Journal of Software, 2006,17(6):1257-1275 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/17/1257.htm
[19] Meng T, Yan HF, Wang JM. Characterizing temporal locality in changes of Web documents. Journal of the China Society for Scientific and Technical Information, 2005,24(4):398-406 (in Chinese with English abstract).
[20] Meng T, Wang JM, Yan HF. Web evolution and incremental crawling. Journal of Software, 2006,17(5):1051-1067 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/17/1051.htm
[21] Hollingsworth D. The workflow reference model. WfMC-TC-1003, Workflow Management Coalition, 1995. http://www.wfmc.org/standards/docs/tc003v11.pdf
[22] Li HB, Zhan DC, Xu XF. Architecture of component composition based on workflow engine. Journal of Software, 2006,17(6): 1401-1410 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/17/1401.htm
[23] Sharma A, Capretz MAM. Application maintenance using software Agents. In: Proc. of the 1st IEEE Int'l Workshop on Source Code Analysis and Manipulation. IEEE CNF, 2001. 55-64.
[24] Zhang J. Research on CSCW and data mining in the Internet environment [Ph.D. Thesis]. Shanghai: Fudan University, 2005 (in Chinese with English abstract).
[25] Zhang J, Tang L, Long F, Hu YF. A new CLIPS-based script system. Computer Engineering, 2004,30(5):55-57 (in Chinese with English abstract).
附中文参考文献:
[18] 梅宏,申峻嵘.软件体系结构研究进展.软件学报,2006,17(6):1257-1275.
http://www.jos.org.cn/1000-9825/17/1257.htm
[19] 孟涛,闫宏飞,王继民.Web网页信息变化的时间局部性规律及其验证.情报学报,2005,24(4):398-406. http://www.jos.org.cn/1000-9825/17/1401.htm
[20] 孟涛,王继民,闫宏飞.网页变化与增量搜集技术.软件学报,2006,17(5):1051-1067. http://www.jos.org.cn/1000-9825/17/1051.htm
[22] 李海波,战德臣,徐晓飞.基于工作流引擎的构件组装体系结构.软件学报,2006,17(6):1401-1410. http://www.jos.org.cn/1000-9825/17/1401.htm
[24] 张锦.Internet环境下协同工作与数据挖掘研究[博士学位论文].上海:复旦大学,2005.
[25] 张锦,唐亮,龙峰,胡运发.一种基于CLIPS的轻量级规则语言系统实现.计算机工程,2004,30(5):55-57.