###
Journal of Software:2017.28(2):246-261

基于动态主题模型融合多维数据的微博社区发现算法
刘冰玉,王翠荣,王聪,王军伟,王兴伟,黄敏
(东北大学 计算机科学与工程学院, 辽宁 沈阳 110819;东北大学 软件学院, 辽宁 沈阳 110819)
Microblog Community Discovery Algorithm Based on Dynamic Topic Model with Multidimensional Data Fusion
LIU Bing-Yu,WANG Cui-Rong,WANG Cong,WANG Jun-Wei,WANG Xing-Wei,HUANG Min
(School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;School of Software, Northeastern University, Shenyang 110819, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 2103   Download 2478
Received:December 26, 2015    Revised:March 17, 2016
> 中文摘要: 随着微博用户的不断增加,微博网络已成为用户进行信息交流的平台.针对由于博文长度受限,传统的社区发现算法无法有效解决微博网络的稀疏性等问题,提出了DC-DTM(discovery community by dynamic topic model)算法.DC-DTM算法首先将微博网络映射为有向加权网络,网络中边的方向反映节点之间的关注关系,利用所提出的DTM(dynamic topic model)计算出节点之间的语义相似度,并将其作为节点间连边的权重.DTM是一种微博主题模型.该模型不仅能够挖掘博客的主题分布,而且能够计算出某一主题中用户的影响力大小.其次,利用所提出的复杂度较低的标签传播算法WLPA(weighted lebel propagation)进行微博网络的社区发现.该算法的初始化阶段将影响力大的用户节点作为初始节点,标签按照节点的影响力从大到小进行传播,避免了传统标签传播算法逆流现象的发生,提高了标签传播算法的稳定性.真实数据上的实验结果表明,DTM模型能够很好地对微博进行主题挖掘,DC-DTM算法能够有效地挖掘出微博网络的社区.
Abstract:With the dramatic increase of microblog users, microblog websites have become the platform for a wide spectrum of users to get information. Due to the fact that blog is a special type of text with restricted length, traditional community detection algorithms cannot effectively solve the sparse problem of micro blog. To address the issue, the DC-DTM (discovery community by dynamic topic model) algorithm is proposed in this paper. First, the algorithm maps microblog as a directed-weighted network, in which the direction is the concerned relationship, and the weight is the topic's similarity of different nodes calculated by DTM (dynamic topic model). DTM is a microblog topic model which can not only mine the topics of each microblog accurately but also calculate author's influence a topic. Second, the algorithm uses label propagation WLPA (weighted lebel propagation), with low complexity, to find communities in microblog. The initial process selects nodes with the largest influence as the initial nodes, and propagates the label in the order of node's influences, from large to small. The algorithm overcomes the adverse phenomenon in the traditional label propagation algorithm, and has better stability. Experiments on real data show that the DTM model can be very good for the topic mining in microblog and DC-DTM algorithm can effectively discover the communities of microblog.
文章编号:     中图分类号:    文献标志码:
基金项目:国家杰出青年科学基金(61225012,71325002);国家自然科学基金(61572123,61300195);高等学校博士学科点专项科研基金(20120042130003);辽宁省百千万人才工程项目(2013921068);河北省自然科学基金(F2014501078);河北省科技计划(15210146) 国家杰出青年科学基金(61225012,71325002);国家自然科学基金(61572123,61300195);高等学校博士学科点专项科研基金(20120042130003);辽宁省百千万人才工程项目(2013921068);河北省自然科学基金(F2014501078);河北省科技计划(15210146)
Foundation items:National Science Foundation for Distinguished Young Scholars of China (61225012, 71325002); National Natural Science Foundation of China (61572123, 61300195); Specialized Research Fund of the Doctoral Program of Higher Education (20120042 130003); Liaoning BaiQianWan Talents Program (2013921068); Natural Science Foundation of Hebei Province (F2014501078); Technology Planning Project of Hebei Province (15210146)
Reference text:

刘冰玉,王翠荣,王聪,王军伟,王兴伟,黄敏.基于动态主题模型融合多维数据的微博社区发现算法.软件学报,2017,28(2):246-261

LIU Bing-Yu,WANG Cui-Rong,WANG Cong,WANG Jun-Wei,WANG Xing-Wei,HUANG Min.Microblog Community Discovery Algorithm Based on Dynamic Topic Model with Multidimensional Data Fusion.Journal of Software,2017,28(2):246-261