###
Journal of Software:2020.31(2):302-320

数据标注研究综述
蔡莉,王淑婷,刘俊晖,朱扬勇
(云南大学 软件学院, 云南 昆明 650091;复旦大学 计算机科学技术学院, 上海 200433)
Survey of Data Annotation
CAI Li,WANG Shu-Ting,LIU Jun-Hui,ZHU Yang-Yong
(School of Software, Yunnan University, Kunming 650091, China;School of Computer Science, Fudan University, Shanghai 200433, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 1152   Download 962
Received:June 22, 2019    Revised:September 17, 2019
> 中文摘要: 数据标注是大部分人工智能算法得以有效运行的关键环节.数据标注越准确、标注的数据量越大,算法的性能就越好.数据标注行业的发展带动了中国许多城市和城镇的就业,促使中国逐渐成为世界数据标注的中心.阐述了数据标注的发展概况,包括起源、应用场景、分类和任务;列举了目前常用的标注数据集、开源的数据标注工具和商业数据标注平台;提出了标注中的角色、标准和流程等数据标注规范;给出了一个情感分析场景中的数据标注实例;描述各类主流的标注质量评估算法及其特点,并对比它们优缺点;最后,从任务、工具、数据标注质量和安全性这4个方面对数据标注的研究方向和发展趋势进行了展望.
中文关键词: 数据标注  人工智能  众包  大数据
Abstract:Data annotation is a key part of the effective operation of most artificial intelligence algorithms. The better the annotation accuracy and quantity, the better the performance of the algorithm. The development of the data annotation industry boosts employment in many cities and towns in China, prompting China to gradually become the center of world data annotation. This study summarizes its development, including origin, application scenarios, classifications, and tasks; lists the commonly used annotation data sets, open source data annotation tools and commercial annotation platforms; proposes the data annotation specification including roles, standards, and processes; gives an example of data annotation in a sentiment analysis. Then, this paper describes the models and characteristics of state-of-the-art algorithms for evaluating annotation results, and compares their advantages and disadvantages. Finally, this paper prospects research focuses and development trends of data annotation from four aspects:tasks, tools, annotation quality, and security.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61663047,U1636207);云南大学服务云南行动计划(2016ZD05) 国家自然科学基金(61663047,U1636207);云南大学服务云南行动计划(2016ZD05)
Foundation items:National Natural Science Foundation of China (61663047, U1636207); Project of Yunnan University Serves Yunnan Initiatives (2016ZD05)
Reference text:

蔡莉,王淑婷,刘俊晖,朱扬勇.数据标注研究综述.软件学报,2020,31(2):302-320

CAI Li,WANG Shu-Ting,LIU Jun-Hui,ZHU Yang-Yong.Survey of Data Annotation.Journal of Software,2020,31(2):302-320