Journal of Software:2001.12(2):167-172

(西安交通大学 计算机科学与工程系,陕西 西安 710049)
A Practical Algorithm for Converting Unstructured Hypertext to Structured Database
ZHENG Qing-hua,YOU Yuan-xia,YUAN Wen-bin
Received:April 13, 2000    Revised:April 13, 2000
> 中文摘要: 超文本是一种非结构化的文档.它虽然不支持跨页查询和全文检索,但却是Internet上信息组织与存储的重要方式.提出了一种将超文本转换为结构化数据库的算法.分析了超文本结构化转换的需求,运用图论分析并描述了超文本的转换模型与实现算法.该算法在鲁迅数字图书馆系统中得到了实际应用和验证.
Abstract:Hypertext is a kind of unstructured document. It is impossible to realize the search based on content and topic for hypertext documents. However, hypertext is one of the most important ways of information storage and organization in the Internet. Therefore, in order to realize the effective management and the search of hypertext documents, a new and practical method named HtoDB for converting unstructured hypertext to database is presented. In the paper, the requirements and functions for converting hypertext to database are analyzed, the converting model and algorithm are also put forward according to the graph theory. The algorithm and model presented in this paper are verified in the project of “LU XUN digital library system”.
