Journal of Software:2016.27(5):1114-1126

(CAD & CG国家重点实验室(浙江大学), 浙江杭州 310058)
Interactive Topic Modeling Based on Hierarchical Dirichlet Process
(State Key Laboratory of CAD & CG(Zhejiang University), Hangzhou 310058, China)
Chart / table
Similar Articles
Article :Browse 3083   Download 1625
Received:July 24, 2015    Revised:November 09, 2015
> 中文摘要: 随着信息技术的快速发展,大量的文本数据产生、被收集和存储.主题模型是文本分析的重要工具之一,被广泛地应用于分析大规模文本集.然而,主题模型通常无法直观而有效地结合用户的领域专业知识对模型结果进行修正.针对这一问题,提出了一个交互式可视分析系统,帮助用户对主题模型进行交互修正.首先对层次狄利克雷过程进行了改进,使其支持单词约束;然后,使用矩阵视图对主题模型进行展示,并使用语义相关的词云布局帮助用户寻找单词约束,用户通过添加单词约束迭代优化主题模型;最后,通过案例分析及用户研究来评价该系统的可用性.
Abstract:With the rapid development of information technology, large amounts of text data have been produced, collected and stored. Topic modeling is one of the important tools in text analysis, and is widely used for large text collection analysis. However, the topic model usually cannot be combined with users' domain knowledge intuitively and effectively during the topic modeling process. In order to solve this problem, this paper proposes an interactive visual analysis system to help users refine generated topic models. First, the hierarchical Dirichlet process is modified to support the word constraints. Then, the generated topic models is displayed via a matrix view to visually reveal the underlying relationship between words and topics, and semantic-preserving word clouds is used to help users find word constraints effectively. User can interactively refine the topic models by adding word constraints. Finally, the applicability of this new system is demonstrated via case studies and user studies.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61472354);国家高技术研究发展计划(863)(2012AA12A404) 国家自然科学基金(61472354);国家高技术研究发展计划(863)(2012AA12A404)
Foundation items:National Natural Science Foundation of China (61472354); National High-Tech R&D Program of China (863) (2012AA12A404)
Reference text:


YAN Yu-Yu,TAO Yu-Bo,LIN Hai.Interactive Topic Modeling Based on Hierarchical Dirichlet Process.Journal of Software,2016,27(5):1114-1126