P.O.Box 8718, Beijing 100080, China Journal of Software,  August  2008,19(8):2018-2031
E-mail: jos@iscas.ac.cn ISSN 1000-9825,  CODEN RUXUEW,  CN 11-2560/TP
http://www.jos.org.cn  Copyright © 2008 by Journal of Software

数据空间技术研究

李玉坤, 孟小峰, 张相於

 Full-Text PDF    Submission   Back


李玉坤, 孟小峰, 张相於
(中国人民大学 信息学院,北京 100872)
作者简介: 李玉坤(1969-),男,河北冀州人,博士生,高级工程师,主要研究领域为数据空间,个人信息管理.孟小峰(1964-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为Web数据集成,XML数据库,移动数据管理.张相於(1986-),男,硕士生,主要研究领域为数据空间.
联系人:
孟小峰 E-mail: xfmeng@ruc.edu.cn
Received 2008-01-20; Accepted 2008-05-19

Abstract
This paper introduces the concept and characters of dataspace, and presents a framework for dataspace integration and management system. Based on the framework, this paper further summarizes research works on data model, integration, query, update, storage, index, evolution and systems of dataspace. Challenges and future work on dataspace research are analyzed.

Li YK, Meng XF, Zhang XY. Research on dataspace. Journal of Software, 2008,19(8):2018-2031. 
10.3724/SP.J.1001.2008.02018
http://www.jos.org.cn/1000-9825/19/2018.htm


摘要
阐述了数据空间概念及其特性,提出了数据空间集成与管理系统框架.在此基础上,进一步从数据模型、数据集成、数据查询、数据更新、存储索引、数据演化和系统实现几个方面对数据空间研究工作进行了总结分析.讨论了数据空间研究面临的挑战和未来的研究工作.

基金项目:Supported by the National High-Tech Research and Development Plan of China under Grant No.2007AA01Z155 (国家高技术研究发展计划(863)); the National Basic Research Program of China under Grant No.2003CB317000 (国家重点基础研究发展计划(973); the Program for New Century Excellent Talents in University of China under Grant No.NCET-04-0051 (新世纪优秀人才支持计划)

References: 

[1] Meng XF. From Database to Dataspace, From Enterprise to People. Annual Report of WAMDM Lab., School of Information, Renmin University of China, 2006. 2-7 (in Chinese). http://idke.ruc.edu.cn

[2] Franklin M, Halevy A, Maier D. From databases to dataspaces: A new abstraction for information management. SIGMOD Record, 2005,34(4):27-33.

[3] Jones W, Bruce H. A report on the NSF-sponsored workshop on personal information management. Seattle, 2005. http://pim.ischool.washington.edu/pim05home.htm

[4] Blunschi L, Dittrich JP, Girard OR, Karakashian SK, Salles MAV. A dataspace odyssey: The iMeMex personal dataspace management system. In: Proc. of the 3rd Conf. on Innovative Data Systems Research (CIDR 2007). 2007. 114-119. http://www.cidrdb.org/

[5] Dong X, Halevey A. Data integration with uncertainty. In: Proc. of the 33rd Int'l conf. on Very Large Data Bases (VLDB 2007). New York: ACM Press, 2007. 687-698.

[6] Zhao HK, Meng WY, Yu C. Automatic extraction of dynamic record sections from search engine result pages. In: Proc. of the 32nd Int'l Conf. on Very Large Data Bases (VLDB 2006). New York: ACM Press, 2006. 989-1000.

[7] Cohen WW, Ravikumar P, Fienberg S. A comparison of string distance metrics for name-matching tasks. In: Proc. of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03). 2003. 73-78.

[8] Minkov E, Cohen WW, Ng AY. Contextual search and name disambiguation in email using graphs. In: Proc. of the 29th Int'l conf. on Research and Development in Information Retrieval (SIGIR 2006). New York: ACM Press, 2006. 27-34.

[9] Dredze M, Lau TA, Kushmerick N. Automatically classifying emails into activities. In: Proc. of the 2006 Int'l Conf. on Intelligent User Interfaces (IUI 2006). New York: ACM Press, 2006. 70-77.

[10] Halevy AY, Franklin MJ, Maier D. Principles of dataspace systems. In: Proc. of the 32nd Int'l Conf. on Principles of Database Systems (PODS 2006). New York: ACM Press, 2006. 1-9.

[11] Freeman E, Gelernter D. Lifestreams: A storage model for personal data. SIGMOD Record, 1996,25(1):80-86.

[12] Dittrich JP, Antonio M, Salles MAV. iDM: A unified and versatile data model for personal dataspace management. In: Proc. of the 32nd Int'l conf. on Very Large Data Bases (VLDB 2006). New York: ACM Press, 2006. 367-378.

[13] Dong X, Halevy A. Indexing dataspaces. In: Proc. of the 27th Int'l Conf. on Management of Data (SIGMOD 2007). New York: ACM Press, 2007. 43-54.

[14] Levy A, Rajaraman A, Ordille J. Querying heterogeneous information sources using source descriptions. In: Proc. of the 22nd Int'l Conf. on Very Large Data Bases (VLDB 1996). San Fransisco: Morgan Kaufmann Publishers, 1996. 251-262.

[15] Dong X, Halevy A. A platform for personal information management and integration. In: Proc. of the 2nd Conf. on Innovative Data Systems Research (CIDR 2005). 2005. 119-130. http://www.cidrdb.org/

[16] Karger DR, Bakshi K, Huynh D, Quan D, Sinha V. Haystack: A customizable general-purpose information management tool for end users of semistructured data. In: Proc. of the 2nd Conf. on Innovative Data Systems Research (CIDR 2005). 2005. 13-26. http://www.cidrdb.org/

[17] Gemmell J, Bell G, Lueder R, Drucker SM, Wong C. MyLifeBits: Fulfilling the Memex vision. In: Proc. of the 10th ACM International Conference on Multimedia. New York :ACM, 2002. 235-238.

[18] Abiteboul S. On views and XML. In: Proc. of the 18th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS 1999). New York: ACM Press, 1999. 1-9.

[19] Zhuge H. Resource space model, its design method and applications. The Journal of Systems and Software, 2004,72(1):71-81.

[20] Singla P, Domingos P. Object identification with attribute-mediated dependences. In: Proc. of the 9th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005). LNCS 3721, Springer-Verlag, 2005. 297-308.

[21] Tejada S, Knoblock C, Minton S. Learning domain-independent string transformation weights for high accuracy object identification. In: Proc. of the 8th Int'l conf. on Knowledge Discovery and Data Mining (SIGKDD 2002). New York: ACM Press, 2002. 350-359.

[22] Halevy A, Rajaraman A, Ordille J. Data integration: The teenage years. In: Proc. of the 32nd Int'l Conf. on Very Large Data Bases (VLDB 2006). New York: ACM Press, 2006. 9-16.

[23] Naumann F, Leser U, and Freytag JC. Quality-Driven integration of heterogenous information systems. In: Proc. of the 25th Int'l Conf. on Very Large Data Bases (VLDB 1999). San Fransisco: Morgan Kaufmann Publishers, 1999. 447-458.

[24] Papakonstantinou Y, Garcia-Molina H, Widom J. Object exchange across heterogeneous information sources. In: Proc. of the 11th Int'l Conf. on Data Engineering (ICDE 1995). Dallas: IEEE Computer Society, 1995. 251-260.

[25] Calvanese D, Giacomo GD, Lenzerini M, Nardi D, Rosati R. Source integration in data warehousing. In: Proc. of the 9th Int'l Workshop on Database and Expert Systems Applications (DEXA'98). Dallas: IEEE Computer Society Press, 1998. 92-197.

[26] Halevy AY, Ashish N, Bitton D, Carey M, Draper D, Pollock J, Rosenthal A, Sikka V. Enterprise information integration: Successes, challenges and controversies. In: Proc. of the 25th Int'l Conf. on Management of Data (SIGMOD 2005). New York: ACM Press, 2005. 778-787.

[27] Ng WS, Ooi BC, Tan KL, Zhou AY. PeerDB: A P2P-based system for distributed data sharing. In: Proc. of the 19th Int'l Conf. on Data Engineering (ICDE 2003). Dallas: IEEE Computer Society, 2003. 633-644.

[28] Yan XF, Yu PS, Han JW. Graph Indexing: A frequent structure-based approach. In: Proc. of the 24th Int'l Conf. on Management of Data (SIGMOD 2004). New York: ACM Press, 2004. 335-346.

[29] He H, Singh AK. Closure-Tree: An index structure for graph queries. In: Proc. of the 22nd Int'l Conf. on Data Engineering (ICDE 2006). Dallas: IEEE Computer Society, 2006. 38.

[30] Holder L, Cook D, Djoko S. Substructure discovery in the subdue system. In: Proc. of the AAAI Workshop of Conf. on Knowledge Discovery in Databases. Menlo Park: AAAI Press, 1994. 169-180.

[31] Jiang HL,Wang HX, Yu PS, Zhou SG. GString: A novel approach for efficient search in graph databases. In: Proc. of the 23rd Int'l Conf. on Data Engineering (ICDE 2007). Dallas: IEEE Computer Society, 2007. 566-575.

[32] Halevy AY. Answering queries using views: A survey. VLDB Journal, 2001,10(4):270-294.

[33] Kolaitis P. Schema mappings, data exchange, and metadata management. In: Proc. of the 24th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS 2005). New York: ACM Press, 2005. 61-75. 

[34] Lenzerini M. Data integration: A theoretical perspective. In: Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS 2002). New York: ACM Press, 2002. 233-246.

[35] Hristidis V, Gravano L, Papakonstantinou Y. Efficient IR-style keyword search over relational databases. In: Proc. of the 29th Int'l Conf. on Very Large Data Bases (VLDB 2003). New York: ACM Press, 2003. 850-861.

[36] Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S. Keyword searching and browsing in databases using BANKS. In: Proc. of the 18th Int'l Conf. on Data Engineering (ICDE 2002). Dallas: IEEE Computer Society, 2002. 431-440.

[37] Shao F, Guo L, Botev C, Bhaskar A, Chettiar M, Yang F. Efficient keyword search over virtual XML views. In: Proc. of the 33rd Int'l Conf. on Very Large Data Bases (VLDB 2007). New York: ACM Press, 2007. 1057-1068.

[38] Guo L, Shao F, Botev C, Shanmugasundaram J. XRANK: Ranked keyword search over XML documents. In: Proc. of the 23rd Int'l Conf. on Management of Data (SIGMOD 2003). New York: ACM Process, 2003. 16-27.

[39] Levy AY, Rajaraman A, Ordille JJ. Querying heterogeneous information sources using source descriptions. In: Proc. of the 22nd Int'l Conf. on Very Large Data Bases (VLDB 1996). San Fransisco: Morgan Kaufmann Publishers, 1996. 251-262.

[40] Qiu F, Cho J. Automatic identification of user interest for personalized search. In: Proc. of the 15th Int'l World Wide Web Conf. (WWW2006). New York: ACM Press, 2006. 727-736.

[41] Copeland GP, Khoshafian S. A decomposition storage model. SIGMOD Record, 1985,14(4):268-279.

[42] Agrawal R, Somani A, Xu Y. Storage and querying of e-commerce data. In: Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB 2001). San Fransisco: Morgan Kaufmann Publishers, 2001. 149-158.

[43] Bast H, Weber I. Type less, find more: Fast autocompletion search with a succinct index. In: Proc. of the 29th Int'l Conf. on Research and Development in Information Retrieval (SIGIR 2006). New York: ACM Press, 2006. 364-371.

[44] Cooper BF, Sample N, Franklin MJ, Hjaltason GR, Shadmon M. A fast index for semistructured data. In: Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB 2001). San Fransisco: Morgan Kaufmann Publishers, 2001. 341-350.

[45] Chen Q, Lim A, Ong KW. D(k)-Index: An adaptive structural summary for graph-structured data. In: Proc. of the 23rd Int'l Conf. on Management of Data (SIGMOD 2003). New York: ACM Press, 2003. 134-144.

[46] Rao P, Moon B. PRIX: Indexing and querying XML using prufer sequences. In: Proc. of the 20th Int'l Conf. on Data Engineering (ICDE 2004). Dallas: IEEE Computer Society, 2004. 288-300.

[47] Ntoulas A, Cho J, Olston C. What's new on the Web- The evolution of the Web from a search engine perspective. In: Proc. of the 13th Int'l World Wide Web Conf. (WWW 2004). New York: ACM Press, 2004. 1-12..

[48] Marcel S, May RM, Bonhoeffer S. The evolution of network topology by selective removal. Journal of the Royal Society Interface, 2005,2(5):533-536.

[49] Song X, Tseng BL, Lin CY, Sun MT. Personalized recommendation driven by information flow. In: Proc. of the 29th Int'l Conf. on Research and Development in Information Retrieval (SIGIR 2006). New York: ACM Press, 2006. 509-516.

[50] Salles MAV, Dittrich J-P, Karakashian S.K, Girard OR, Blunschi L. iTrails: Pay-as-You-Go information integration in dataspaces. In: Proc. of the 33rd Int'l conf. on Very Large Data Bases (VLDB 2007). New York: ACM Press, 2007. 663-674.

[51] Chirita PA, Costache S, Nejdl W, Paiu R. Beagle++: Semantically enhanced searching ranking on the desktop. In: Proc. of the 3rd Int'l Conf. on European Semantic Web Conf. (ESWC 2006). LNCS 4011, Springer-Verlag, 2006. 348-362.

[52] Chirita PA, Firan CS, Nejdl W. Pushing task relevant Web links down to the desktop. In: Proc. of the 8th ACM Int'l Workshop on Web Information and Data Management (WIDM 2006). New York: ACM Press, 2006. 59-66.

[53] Dong X, Halevy A, Madhavan J. Reference reconciliation in complex information spaces. In: Proc. of the 25th Int'l Conf. on Management of Data (SIGMOD 2005). New York: ACM Press, 2005. 85-96.

附中文参考文献:
[1] 孟小峰.从企业到个人,从数据库到数据空间.网络与移动数据管理实验室年报,中国人民大学信息学院,2006.2-7. http://idke.ruc.edu.cn