口语理解(spoken language understanding, SLU)是面向任务的对话系统的核心组成部分, 其旨在提取用户查询的语义框架. 在对话系统中, 口语理解组件(SLU)负责识别用户的请求, 并创建总结用户需求的语义框架, SLU通常包括两个子任务: 意图检测(intent detection, ID)和槽位填充(slot filling, SF). 意图检测是一个语义话语分类问题, 在句子层面分析话语的语义; 槽位填充是一个序列标注任务, 在词级层面分析话语的语义. 由于意图和槽之间的密切相关性, 主流的工作采用联合模型来利用跨任务的共享知识. 但是ID和SF是两个具有强相关性的不同任务, 它们分别表征了话语的句级语义信息和词级信息, 这意味着两个任务的信息是异构的, 同时具有不同的粒度. 提出一种用于联合意图检测和槽位填充的异构交互结构, 采用自注意力和图注意力网络的联合形式充分地捕捉两个相关任务中异构信息的句级语义信息和词级信息之间的关系. 不同于普通的同构结构, 所提模型是一个包含不同类型节点和连接的异构图架构, 因为异构图涉及到更全面的信息和丰富的语义, 同时可以更好地交互表征不同粒度节点之间的信息. 此外, 为了更好地适应槽标签的局部连续性, 利用窗口机制来准确地表示词级嵌入表示. 同时结合预训练模型(BERT), 分析所提出模型应用预训练模型的效果. 所提模型在两个公共数据集上的实验结果表明, 所提模型在意图检测任务上准确率分别达到了97.98%和99.11%, 在槽位填充任务上F1分数分别达到96.10%和96.11%, 均优于目前主流的方法.
Spoken language understanding (SLU), as a core component of task-oriented dialogue systems, aims to extract the semantic framework of user queries. In dialogue systems, the SLU component is responsible for identifying user requests and creating a semantic framework that summarizes user requests. SLU usually includes two subtasks: intent detection (ID) and slot filling (SF). ID is regarded as a semantic utterance classification problem that analyzes the semantics of utterance at the sentence level, while SF is viewed as a sequence labeling task that analyzes the semantics of utterance at the word level. Due to the close correlation between intentions and slots, mainstream works employ joint models to exploit shared knowledge across tasks. However, ID and SF are two different tasks with strong correlation, and they represent sentence-level semantic information and word-level information of utterances respectively, which means that the information of the two tasks is heterogeneous and has different granularities. This study proposes a heterogeneous interactive structure for joint ID and SF, which adequately captures the relationship between sentence-level semantic information and word-level information in heterogeneous information for two correlative tasks by adopting self-attention and graph attention networks. Different from ordinary homogeneous structures, the proposed model is a heterogeneous graph architecture containing different types of nodes and links because a heterogeneous graph involves more comprehensive information and rich semantics and can better interactively represent the information between nodes with different granularities. In addition, this study utilizes a window mechanism to accurately represent word-level embedding to better accommodate the local continuity of slot labels. Meanwhile, the study uses a pre-trained model (BERT) to analyze the effect of the proposed model using BERT. The experimental results of the proposed model on two public datasets show that the model achieves an accuracy of 97.98% and 99.11% on the ID task and an F1 score of 96.10% and 96.11% on the SF task, which are superior to the current mainstream methods.