面向神经机器翻译系统的多粒度蜕变测试
作者:
作者单位:

作者简介:

钟文康(1997-),男,学士,主要研究领域为软件工程,自然语言处理.
葛季栋(1978-),男,博士,副教授,CCF高级会员,主要研究领域为软件工程,分布式计算与边缘计算,业务过程管理,自然语言处理.
陈翔(1980-),男,博士,副教授,CCF高级会员,主要研究领域为软件缺陷预测,软件缺陷定位,回归测试,组合测试.
李传艺(1991-),男,博士,助理研究员,CCF专业会员,主要研究领域为软件工程,业务过程管理,自然语言处理.
唐泽(1994-),男,硕士,主要研究领域为代码摘要,API补全.
骆斌(1967-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为软件工程,人工智能.

通讯作者:

李传艺,E-mail:lcy@nju.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61802167,61972197,61802095);江苏省自然科学基金(BK20201250)


Multi-granularity Metamorphic Testing for Neural Machine Translation System
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61802167, 61972197, 61802095); Natural Science Foundation of Jiangsu Province of China (BK20201250)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    机器翻译是利用计算机将一种自然语言转换成另一种自然语言的任务,是人工智能领域研究的热点问题之一.近年来,随着深度学习的发展,基于序列到序列结构的神经机器翻译模型在多种语言对的翻译任务上都取得了超过统计机器翻译模型的效果,并被广泛应用于商用翻译系统中.虽然商用翻译系统的实际应用效果直观表明了神经机器翻译模型性能有很大的提升,但如何系统地评估其翻译质量仍是一项具有挑战性的工作.一方面,若基于参考译文评估翻译效果,其高质量参考译文的获取成本非常高;另一方面,与统计机器翻译模型相比,神经机器翻译模型存在更显著的鲁棒性问题,然而还没有探讨神经机器翻译模型鲁棒性的相关研究.面对上述挑战,提出了一种基于蜕变测试的多粒度测试框架,用于在没有参考译文的情况下评估神经机器翻译系统的翻译质量及其翻译鲁棒性.该测试框架首先在句子粒度、短语粒度和单词粒度上分别对源语句进行替换,然后将源语句和替换后语句的翻译结果进行基于编辑距离和成分结构分析树的相似度计算,最后根据相似度判断翻译结果是否满足蜕变关系.分别在教育、微博、新闻、口语和字幕这5个领域的中英文数据集上对6个主流商用神经机器翻译系统使用不同的蜕变测试框架进行了对比实验.实验结果表明,所提方法在与基于参考译文方法的皮尔逊相关系数和斯皮尔曼相关系数上分别比同类型方法高80%和20%,说明提出的无参考译文的测试评估方法与基于参考译文的评估方法的正相关性更高,验证了其在评估准确性上显著优于同类型其他方法.

    Abstract:

    Machine translation task focuses on converting one natural language into another. In recent years, neural machine translation models based on sequence-to-sequence models have achieved better performance than traditional statistical machine translation models on multiple language pairs, and have been used by many translation service providers. Although the practical application of commercial translation system shows that the neural machine translation model has great improvement, how to systematically evaluate its translation quality is still a challenging task. On the one hand, if the translation effect is evaluated based on the reference text, the acquisition cost of high-quality reference text is very high. On the other hand, compared with the statistical machine translation model, the neural machine translation model has more significant robustness problems. However, there are no relevant studies on the robustness of the neural machine translation model. This study proposes a multi-granularity test framework MGMT based on metamorphic testing, which can evaluate the robustness of neural machine translation systems without reference translations. The testing framework first replaces the source sentence on sentence-granularity, phrase-granularity, and word-granularity respectively, then compares the translation results of the source sentence and the replaced sentences based on the constituency parse tree, and finally judges whether the result satisfies the metamorphic relationship. The experiments are conducted on multi-field Chinese-English translation datasets and six industrial neural machine translation systems are evaluated, and compared with same type of metamorphic testing and methods based on reference translations. The experimental results show that the proposed method MGMT is 80% and 20% higher than similar methods in terms of Pearson's correlation coefficient and Spearman's correlation coefficient respectively. This indicates that the non-reference translation evaluation method proposed in this study has a higher positive correlation with the reference translation based evaluation method, which verifies that MGMT's evaluation accuracy is significantly better than other methods of the same type.

    参考文献
    相似文献
    引证文献
引用本文

钟文康,葛季栋,陈翔,李传艺,唐泽,骆斌.面向神经机器翻译系统的多粒度蜕变测试.软件学报,2021,32(4):1051-1066

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-09-12
  • 最后修改日期:2020-10-26
  • 录用日期:
  • 在线发布日期: 2021-01-22
  • 出版日期: 2021-04-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号