Research on Text Representation in Natural Language Processing
Author:
Affiliation:

Clc Number:

TP391

Fund Project:

National Natural Science Foundation of China (61773276; 61836007)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Natural language processing is the core technology of artificial intelligence. Text representation is the basic and necessary work of natural language processing, which affects or even determines the quality and performance of natural language processing systems. This study discusses the basic principle of text representation, the formalization of natural language, the language model, and the connotation and extension of text representation. The technical classification of text representation on a macro level is analyzed. The mainstreams of text representation technologies and methods are analyzed, induced and summarized, including vector space model, topic model, graph-based model, neural network-based model, and representation learning. Event-based, semantic-based, and knowledge-based text representation technologies are also introduced. The development trends and directions of text representation technology are predicted and further discussed. Neural network-based deep learning and representation learning on text will play an important role in natural language processing. The strategy of pre-training and fine-tune optimization will gradually become the mainstream technology. Text representation needs specific analysis according to specific problems. The integration of technology and application is the driving force.

    Reference
    Related
    Cited by
Get Citation

赵京胜,宋梦雪,高祥,朱巧明.自然语言处理中的文本表示研究.软件学报,2022,33(1):102-128

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 01,2020
  • Revised:January 10,2021
  • Adopted:
  • Online: April 21,2021
  • Published: January 06,2022
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063