Journal of Software:2018.29(2):396-416

(华南理工大学 软件学院, 广东 广州 510006;广州中国科学院 软件应用技术研究所, 广东 广州 511458)
Merging Event Logs for Process Mining with a Hybrid Artificial Immune Algorithm
XU Yang,YUAN Feng,LIN Qi,TANG De-You,LI Dong
(School of Software Engineering, South China University of Technology, Guangzhou 510006, China;Institute of Software Application Technology, Guangzhou & Chinese Academy of Sciences, Guangzhou 511458, China)
Chart / table
Similar Articles
Article :Browse 2070   Download 1460
Received:October 10, 2016    Revised:December 12, 2016
> 中文摘要: 流程挖掘是流程管理和数据挖掘交叉领域中的一个研究热点.在实际业务环境中,流程执行的数据往往分散记录到不同的事件日志中,需要将这些事件日志融合成单一事件日志文件,才能应用当前基于单一事件日志的流程挖掘技术.然而,由于流程日志间存在着执行实例的多对多匹配关系、融合所需信息可能缺失等问题,导致事件日志融合问题具有较高的挑战性.对事件日志融合问题进行了形式化定义,指出该问题是一个搜索优化问题,并提出了一种基于混合人工免疫算法的事件日志融合方法:以启发式方法生成初始种群,以人工免疫系统的克隆选择理论作为基础,通过免疫进化获得"最佳"的融合解,从而支持包含多对多的实例匹配关系的日志融合;考虑两个实例级别的因素——流程执行路径出现的频次和流程实例间的时间匹配关系,分别从"量"匹配和"时间"匹配两个维度来评价进化中的个体;通过设置免疫记忆库、引入模拟退火机制,保证新一代种群的多样性,减少进化早熟几率.实验结果表明:该方法能够实现多对多的实例匹配关系的事件日志融合的目标,相对于随机方法生成初始种群,启发式方法能够加快免疫进化的速度.另外,针对利用分布式技术提高事件日志融合性能,探讨了大规模事件日志分布式融合中的数据划分问题.
Abstract:Process mining is an active research topic in the cross field of process management and data mining. In an actual business environment, the recorded data of a process execution that may be supported by different computer systems is scattered into different event log files. It is necessary to merge the scattered data into one single event log file when applying current process mining techniques and tools for process mining. This mission is still challenging, however, because of the complex relationships between cases in two logs and the possible lack of information for the merging. In this paper, event log merging for process mining is regard as a type of search and optimization problems based on the formal definition, and a merging approach with a hybrid artificial immune algorithm is presented in order to achieve the event log merging with many to many relationship between cases in the two event logs. In the merging approach, the clonal selection principle is selected as its underlying principle, which requires the matching process to undergo iterations of clonal selection, hypermutation and receptor editing in order to get the best solution. The algorithm starts from an initial population produced with a heuristic approach. Two factors, occurrence frequency and temporal relation, are designed in the affinity function to evaluate the individuals in the population. In addition, immunological memory and simulated annealing are exploited to make the artificial immune merging jumping out from the trap of local optima. Experimental results show that the hybrid algorithm has good performance in merging logs with complex cases relationships, and the heuristic approach for initial population can speed the process of the evolution. This paper also discusses the data distribution methods in which the log merging problems can be distributed.
文章编号:     中图分类号:TP181    文献标志码:
基金项目:国家自然科学基金(71090403);广东省科技计划(2014B090901001,2015B010103002,2016B090918062,2016B050502001);广州市科技计划(201604010127);华南理工大学软件学院985学科建设基金(x2rjD615015III) 国家自然科学基金(71090403);广东省科技计划(2014B090901001,2015B010103002,2016B090918062,2016B050502001);广州市科技计划(201604010127);华南理工大学软件学院985学科建设基金(x2rjD615015III)
Foundation items:National Natural Science Foundation of China (71090403); Science and Technology Planning Projects of Guangdong Province (2014B090901001, 2015B010103002, 2016B090918062, 2016B050502001); Science and Technology Planning Projects of Guangzhou City (201604010127); Special Funds on "985 Project" Disciplinary Construction in School of Software Engineering of South China University of Technology (x2rjD615015Ⅲ)
Reference text:


XU Yang,YUAN Feng,LIN Qi,TANG De-You,LI Dong.Merging Event Logs for Process Mining with a Hybrid Artificial Immune Algorithm.Journal of Software,2018,29(2):396-416