《软件学报》《软件学报》软件学报Journal of Software1000-98251000-9825《软件学报》编辑部10.13328/j.cnki.jos.004972TP311论文自动程序修复方法研究进展Progress on Approaches to Automatic Program Repair玄跻峰*12jxuan@whu.edu.cn任志磊3王子元4谢晓园12江贺3XUANJi-Feng*12jxuan@whu.edu.cnRENZhi-Lei3WANGZi-Yuan4XIEXiao-Yuan12JIANGHe3软件工程国家重点实验室(武汉大学), 湖北 武汉 430072;武汉大学 计算机学院, 湖北 武汉 430072;大连理工大学 软件学院, 辽宁 大连 116621;南京邮电大学 计算机学院, 江苏 南京 210006State Key Laboratory of Software Engineering(Wuhan University), Wuhan 430072, China;Computer School, Wuhan University, Wuhan 430072, China;School of Software, Dalian University of Technology, Dalian 116621, China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing 210006, China玄跻峰(1984-),男,黑龙江哈尔滨人,博士,研究员,CCF会员,主要研究领域为软件分析与测试,软件数据分析,基于搜索的软件工程,E-mail:jxuan@whu.edu.cn任志磊(1984-),男,博士,讲师,CCF会员,主要研究领域为演化计算,基于搜索的软件工程;王子元(1982-),男,博士,副教授,CCF会员,主要研究领域为软件测试;谢晓园(1983-),女,博士,教授,CCF会员,主要研究领域为软件测试调试,程序分析,基于搜索的软件工程;江贺(1980-),男,博士,教授,博士生导师, CCF高级会员,主要研究领域为基于搜索的软件工程,软件仓库挖掘.2542016274771784020920151510201520112015
Automatic program repair helps developers reduce the cost of manual bug fixing. Approaches to test-suite based repair aim to generate code patches to pass the test suite as well as maintain the program execution. This paper reviews available literature on test-suite based repair and report the progress in two directions:Approaches to automatic repair and empirical foundations. First, existing approaches to automatic repair are described in three categories:Search based, exhaustion based, and constraint-solving based patch generation. Second, empirical foundations on repair are detailed, including the argumentation in the research field. Related techniques are then briefly introduced as the supplementation of program repair. Finally, opportunities and challenges are presented to summarize this review.
自动修复遗传规划基于搜索的软件工程测试集实证基础automatic repairgenetic programmingsearch based software engineeringtest suiteempirical foundation
为避免歧义,若无特别说明,本文的研究主题(即自动程序修复)专指基于测试集的修复(test-suite based repair)方法[9].在基于测试集的修复中,测试集(test suite)用于描述并限定程序的正确行为(program behavior).任何自动生成的补丁应至少保障测试集正确运行(passing the test suite),以确认未违反程序的需求说明.自动程序修复研究中的绝大多数成果都属于基于测试集的修复领域.
(1) 文献检索的主要途径是谷歌学术搜索引擎(Google scholar)以及ACM(ACM digital library)、IEEE (IEEE xplore digital library)和Springer(Springer link)的论文数据库.检索过程中倾向于软件工程的主流会议期刊(如ICSE,FSE等),并包括系统和程序语言相关会议期刊(如PLDI,OOPSLA等);
(4) 收入本文的文献类型包括期刊的长文和短文、会议的research track,experience track,industry track,new idea and emerging result track和short paper track以及技术报告(technical report).技术报告是该领域文献的重要组成部分,由于该领域发展活跃,部分论文尚未进入发表阶段,常以技术报告的形式呈现.但文献不包含会议的poster track,doctoral symposium,student competition,原因是这些部分的正文较短,不足以了解文章内容;同时也不包括博士论文(Ph.D. dissertation),因为所涉及的博士论文的内容大部分已经以会议或期刊论文的形式发表;
2012年,Qi等人[12, 13]提出通过弱重编译(weak recompilation)的方法优化GenProg的修复效率,能够较早地找到补丁.其后的2013年,他们[14]应用故障记录的测试用例排序(fault-recorded test prioritization)方法进一步提高了修复效率.此外,他们[15]还通过程序修复的结果来评估故障定位方法.
2012年,该类算法的先驱,Nguyen等人提出SemFix[22],一种C程序的基于约束的语义修复算法.该算法将测试用例转换为约束,并应用SMT(satisfiability modulo theories)求解器求解,最终转换为补丁并输出.生成补丁的过程源自基于组件的程序合成算法(component based program synthesis)[37],该算法将备选语句作为组件,提取预期的输入输出,并建立约束求解模型.另外,SemFix采用Tarantula算法[38]定位潜在的bug位置.与基于搜索的方法,如GenProg不同,SemFix不需要为生成的补丁调用测试用例验证修复性;约束机制已将测试用例作为输入,并进而转化为解输出.
2012年,Le Goues等人[42]发表了题为《每个8美元,修复105个bug中的55个》(fixing 55 out of 105 bugs for $8 each)的文章,首次以系统研究的形式展示了早期算法GenProg修复C程序bug的能力.该工作表明,GenProg可以自动修复超过半数的bug并消耗非常小的成本(相对于开发者人工修复的成本).随后的2013年,Fry等人[43]通过人员研究(human study)的形式探索补丁的可维护性(patch maintainability).该方法表明,自动生成的补丁仍具有良好的可读性.同是2013年,Le Goues等人[44]发表了展望文章,描述了自动程序修复面临的挑战.该文很好地总结了GenProg相关算法的优劣和当时程序修复的困难.
在2014年以前,程序修复研究以设计新的方法为主.这期间涌现了许多优秀的修复算法,例如GenProg[5],AE[20],RSRepair[16]等.其中,2013年的方法Par[6],即通过学习人工补丁模式指导补丁生成的方法,获得软件工程顶级会议ICSE的最佳论文奖(distinguished paper award).当时,该方法首次融合了自动算法GenProg和人工补丁模式来生成最终的补丁.
然而,在2014年,Monperrus[9]在ICSE上发表了题为《严重审视“从人工补丁学习自动化代码生成”》(a critical review of automatic patch generation learned from human-written patches)的文章.该文章针对Par[6]方法,批评Par所属论文的算法和实验中的不足,进而引申到如何划定程序修复问题和评价算法的探讨中.在后续的讨论中,Monperrus申明了基于测试集的程序修复研究的内容(即,修复算法核心为:在给定测试集的情况下获得高质量补丁,而低质量的测试集并非算法缺陷)和算法评价标准(即,算法比较应基于同样的数据集和同样的缺陷类型)等.尽管研究者对此文评价不一,但该文开启了基于测试集的程序修复研究自2008年以来持续6年研究后的第1次大规模学术争论.研究者们开始关注非算法部分,并深入探索问题的本质和修复实例的基础.
同是2014年,Martinez等人[47]和Barr等人[48]在互不知情的情况下,同时发表了两篇关于程序修复基础的文章.Martinez等人[47]发表了题为《代码修复的原料存在吗?》(do the fix ingredients already exist?)的文章,以实证调查(empirical inquiry)的形式探索了冗余假设(redundancy assumption)的存在性.所谓的“冗余假设”是指自动生成的补丁一定存在于程序的其他某处,而修复算法是将其他位置的代码重用(reuse)或组合(mix)而形成补丁的.早期算法,如GenProg,AE,RSRepair等,都基于该假设.文献[47]表明:冗余假设确实一定程度地存在着,但并非完全存在.几乎同一时间,Barr等人[48]发表了题为《整形外科手术猜想》(the plastic surgery hypothesis)的文章,探索程序修复的基础.所谓“整形外科手术猜想”与“冗余假设”类似,是指补丁中的代码能够从程序的其他位置(像植皮技术一样)移动到修复bug的位置.该文设计了复杂的实验,深入地讨论了猜想的存在性.
直至2015年,Qi等人[10]发表了题为《貌似可信和正确的补丁的分析》(an analysis of patch plausibility and correctness)的文章,手工检查了早期算法在105个真实的C程序bug的数据集上的修复结果.结果发现:由于实验设置错误,在GenProg文章[42]报道的55个可修复的bug中,有37个为修复后的程序是无法通过全部测试用例的;而在AE文章[20]报道的54个可修复的bug中,有27个为修复后的程序是无法通过全部测试用例的.此外,基于人工验证:在全部105个bug中,GenProg的修复结果只有2个是语义正确的,而AE的结果只有3个是语义正确的;其他的修复均无法满足原有程序的需求说明.文献[20]的发表在领域内引起轩然大波,相关研究者不得不重新检验相关结果的正确性.在2014年的第1次学术争论之后,该文引发了领域内的第2次集体争论,进而导致研究者深度关注对于修复真实bug的可行性的研究.
同年,Smith等人[11]发表了题为《适得其反?》(is the cure worse than the disease?)的文章,研究修复算法的过拟合(overfitting).该文设计了可控实验,通过新开发者引入的bug和补丁,分析影响GenProg和RSRepair算法效果的因素,并指出修复算法对于测试用例的过拟合行为.同年,Le Goues等人[49]介绍了两个C语言的被修复程序集合ManyBugs和IntroClass,共包含1 141个缺陷.基于该数据集,他们设计了量化实验,评估已有方法的修复效果.
Dallmeier等人[54]提出了Pachika,一种检测程序对象非正常行为(object behavior anomaly detection)的修复方法.该方法识别两个对象间的不同,进而增加或者删除方法调用.Carzaniga等人[55, 56]开发了一种用于避免Web应用错误的自动方法.该方法旨在找到可运行的程序变种(program variant),作为自动的临时变通版本(automatic workaround).Pei等人[57]设计了AutoFix,一个基于契约的修复(contract based repair)系统.该方法需要程序契约中的规约信息作为输入,例如函数的前置和后置条件等,可用于Effiel语言的bug修复.Tan和Roychoudhury[58]的近期工作Relifix针对程序回归中的问题进行修复.
Sidiroglou-Douskos等人[59]提出了CodePhage,一种通过从被修复应用中定位缺陷,并从其他应用迁移代码(code transfer)来消除缺陷的算法.该算法依赖于作为输入的应用中能消除缺陷且成熟的代码,可修复指定缺陷,如整数溢出、缓冲区溢出及零除数等.Raychev等人[60]提出了JSNice,一种面向JavaScript语言的自动程序美化工具.该工具通过从海量代码库学习代码特性,为程序预测变量名或生成一定的注释.该工具的特点是自动化,且生成的代码及注释具有一定程度的语义信息.另外一组相关工作是测试用例修复(test case repair)[61-64],旨在自动修复由于代码更新造成的测试用例无法正常运行的问题[65].
故障定位旨在通过收集测试用例运行信息,推断潜在的bug所处的位置.自动修复算法中借助已有的故障定位技术,根据疑似故障的风险给出语句的排列,进而逐个尝试修复.基于程序谱的故障定位(spectrum based fault localization)方法是其中一类常见的方法.故障定位的成果浩如烟海,这里只列出部分近期工作.
测试用例增强(test case augmentation)是近期涌现的一系列工作.尽管测试用例生成技术的研究历史悠久,但受限于测试路径组合爆炸及面向对象程序的复杂状态等问题,测试用例生成技术尚不能满足所有应用的需求.测试用例增强旨在基于给定的测试用例,获得更好的测试效果,如生成可以弥补当前测试用例不足的新测试用例或更好地使用当前测试用例.
Xu等人[74, 75]提出了早期的直接测试用例增强(directed test suite augmentation)技术,该方法在当前测试用例集的基础上生成新的测试用例,进而提高测试用例的覆盖率等指标.Yoo和Harman[76]设计了测试重生成(test data regeneration)算法.该方法对于一个测试集,基于遗传算法生成全新的测试用例,用以弥补已有测试用例的不足.Xuan和Monperrus[77]提出了测试用例提纯(test case purification)方法.该方法能够分离执行失败的测试用例,并切分为多个子测试用例,优化测试用例的执行,进而增强其识别故障的能力.
ReferencesPressmanRSSoftware Engineering:A Practitioner's Approach2010437443XieXChenTYKuoFCXuBA theoretical analysis of the risk evaluation formulas for spectrum-based fault localization201322431WenWZLiBXSunXBLiuCCTechnique of software fault localization based on hierarchical slicing spectrum2013245977992文万志李必信孙小兵刘翠翠一种基于层次切片谱的软件错误定位技术2013245977992XuanJJiangHRenZZouWDeveloper prioritization in bug repositories20122535LeGoues CNguyenTForrestSWeimerWGenProg:A generic method for automatic software repair2012385472KimDNamJSongJKimSAutomatic patch generation learned from human-written patches2013802811HarmanMMansouriAZhangYSearch based software engineering:Trends, techniques and applications201245111WeimerWNguyenTLeGoues CForrestSAutomatically finding patches using genetic programming2009364367MonperrusMA critical review of automatic patch generation learned from human-written patches:Essay on the problem statement and the evaluation of automatic software repair2014234242QiZLongFAchourSRinardMAn analysis of patch plausibility and correctness for generate-and-validate patch generation systems20152436SmithEKBarrELeGoues CBrunYIs the cure worse than the disease? Overfitting in automated program repair2015532543QiYMaoXLeiYMaking automatic repair for large-scale programs more efficient using weak recompilation2012254263QiYMaoXWenYDaiZGuBMore efficient automatic repair of large-scale programs using weak recompilation2012551227852799QiYMaoXLeiYEfficient automated program repair through fault-recorded testing prioritization2013180189QiYMaoXLeiYWangCUsing automated program repair for evaluating the effectiveness of fault localization techniques2013191201QiYMaoXLeiYDaiZWangCThe strength of random search on automated program repair2014254265ZhongHSuZAn empirical study on real bug fixes2015913923HarmanMJonesBSearch based software engineering20114314833839JiangHXuanJRenZApproximate backbone based multilevel algorithm for next release problem201013331340WeimerWFryZPForrestSLeveraging program equivalence for adaptive program repair:Models and first results2013356366DebroyVWongWEUsing mutation to automatically suggest fixes for faulty programs20106574NguyenHDTQiDRoychoudhuryAChandraSSemFix:Program repair via semantic analysis2013772781MechtaevSYiJRoychoudhuryADirectFix:Looking for simple program repairs2015448458DeMarcoFXuanJLeBerre DMonperrusMAutomatic repair of buggy if conditions and missing preconditions with SMT20143039XuanJFMartinezMDeMarcoFClémentMLamelas-MarcoteSDurieuxTLeBerre DMonperrusMNopol:Automatic repair of conditional statement bugs in Java programs2015122MartinezTDurieuxTXuanJFMonperrusMAutomatic repair of real bugs in Java:A large-scale experiment on the Defects4J dataset2015111ArcuriAYaoXA novel co-evolutionary approach to automatic software bug fixing2008162168ForrestSWeimerWNguyenTLeGoues CA genetic programming approach to automated software repair2009947954WeimerWForrestSLeGoues CNguyenTAutomatic program repair with evolutionary computation2010535109116FastELeGoues CForrestSWeimerWDesigning better fitness functions for automated program repair2010965972LeGoues CWeimerWForrestSRepresentations and operators for improving evolutionary software repair2012959966SchulteEForrestSWeimerWAutomated program repair through the evolution of assembly code2010313316MartinezMMonperrusMMining software repair models for reasoning on the search space of automated program fixing2015201176205LongFRinardMProphet:Automatic patch generation via learning from successful patches2015111JiaYHarmanMAn analysis and survey of the development of mutation testing2011375649678LongFRinardMStaged program repair with condition synthesis2015166178JhaSGulwaniSSeshiaSATiwariAOracle guided component-based program synthesis2010215224JonesJAHarroldMJEmpirical evaluation of the tarantula automatic fault-localization technique2005273282ZhangXGuptaNGuptaRLocating faults through automated predicate switching2006272281AbreuRZoeteweijPGemundAGVOn the accuracy of spectrum-based fault localization20078998KeYStoleeKTLeGoues CBrunYRepairing programs with semantic code search2015295306LeGoues CDewey-VogtMForrestSWeimerWA systematic study of automated program repair:Fixing 55 out of 105 bugs for $8 each2012313FryZPLandauBWeimerWA human study of patch maintainability2012177187LeGoues CForrestSWeimerWCurrent challenges in automatic software repair2013213421443TaoYKimJKimSXuCAutomatically generated patches as debugging aids:A human study20146474TaoYHanDKimSWriting acceptable patches:An empirical study of open source project patches2014271280MartinezMWeimerWMonperrusMDo the fix ingredients already exist? An empirical inquiry into the redundancy assumptions of program repair approaches2014492495BarrETBrunYDevanbuPTHarmanMSarroFThe plastic surgery hypothesis2014306317LeGoues CHoltschulteNSmithEKBrunYDevanbuPForrestSWeimerWThe ManyBugs and IntroClass benchmarks for automated repair of C programs2015MartinezMMonperrusMAstor:Evolutionary automatic software repair for Java201516JustRJalaliDErnstMDDefects4J:A database of existing faults to enable controlled testing studies for Java programs2014437440JobstmannBGriesmayerABloemRProgram repair as a game2005226238PerkinsJHKimSLarsenSAmarasingheSBachrachJCarbinMPachecoCSherwoodFSidiroglou-DouskosSSullivanGWongWFZibinYErnstMDRinardMAutomatically patching errors in deployed software200987102DallmeierVZellerAMeyerBGenerating fixes from object behavior anomalies2009550554CarzanigaAGorlaAMattavelliAPerinoNPezzèMAutomatic recovery from runtime failures2013782791CarzanigaAGorlaAMattavelliAPerinoNPezzèMAutomatic workarounds for Web applications2010237246PeiYFuriaCANordioMWeiYMeyerBZellerAAutomated fixing of programs with contracts2014405427449TanSHRoychoudhuryARelifix:Automated repair of software regressions2015471482Sidiroglou-DouskosSLahitnenELongFRinardMAutomatic error elimination by horizontal code transfer across multiple applications20154354RaychevVVechevMKrauseAPredicting program properties from "big code"2015111124DanielBJagannathVDigDMarinovDReassert:Suggesting repairs for broken unit tests2009433444MirzaaghaeiMPastoreFPezzèMSupporting test suite evolution through test case adaptation2012231240MirzaaghaeiMPastoreFPezzèMAutomatic test case evolution2014245386411GaoZChenZZouYMemonASitar:GUI test script repair2016422170186ZhangZYChenZYXuBWYangRResearch progress on test case evolution2013244663674张智轶陈振宇徐宝文杨瑞测试用例演化研究进展2013244663674ZhangZChanWKTseTHFault localization based only on failed runs20124566471LuciaLLoDJiangLThungFBudiAExtended comprehensive study of association measures for fault localization2014262172219MasriWAssiRAPrevalence of coincidental correctness and mitigation of its impact on fault localization20142318XuanJFMonperrusMLearning to combine multiple ranking metrics for fault localization2014191200JiangBZhaiKChanWKTseTHZhangZOn the adoption of MC/DC and control-flow adequacy for a tight integration of program testing and statistical fault localization201355897917DebroyVWongWECombining mutation and fault localization for automated program debugging2014904560XuJZhangZChanWKTseTHLiSA general noise-reduction framework for fault localization of Java programs201355880896XuanJFXieXMonperrusMCrash reproduction via test case mutation:Let existing test cases help2015910913XuZKimYKimMRothermelGCohenMBDirected test suite augmentation:Techniques and tradeoffs2010257266XuZCohenMBRothermelGFactors affecting the use of genetic algorithms in test suite augmentation201013651372YooSHarmanMTest data regeneration:Generating new test data from existing test data2012223171201XuanJFMonperrusMTest case purification for improving fault localization20145263