基于文本表达特征分析的大语言模型协议交互抽取

doi:10.13328/j.cnki.jos.007598

微信小程序

微信服务号

微信订阅号

首页 > 过刊浏览>2026年第37卷第8期 >1-18. DOI:10.13328/j.cnki.jos.007598

PDF HTML阅读 XML下载导出引用引用提醒

基于文本表达特征分析的大语言模型协议交互抽取
DOI:
                        10.13328/j.cnki.jos.007598
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP311
基金项目:国防基础科研项目(JCKY2022605C006)

Extracting Protocol Interactions via LLM Based on Linguistic Expression Pattern Analysis

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

文本协议交互抽取旨在从自然语言形式的说明文档中识别并提取协议有关的交互信息, 其可用于在协议代码实现前抽取模型验证协议的正确性、从协议规格描述构造测试用例等. 当前从文本中抽取协议主要采用深度学习、大语言模型等技术, 深度学习方法依赖大规模的高质量数据集, 且适用范围受限于训练数据集, 存在迁移困难的问题. 现有的大语言模型方法存在提示模板和示例构造较为简单、处理流程欠优化的局限. 针对以上问题, 提出一种增强的基于文本表达特征分析的大语言模型协议交互抽取方法. 首先, 从真实的协议描述案例出发, 总结出协议描述文本中存在的常见语言表达特征; 然后, 提取能够体现这些特征的典型协议描述案例, 提炼用于协议交互抽取的处理规则; 进一步地, 融合案例与规则, 提出一套规则回溯思维链方法; 最后, 使用多路推理和自我验证技术优化任务处理流程. 在多个协议数据集上的实验表明, 所提方法在协议交互的抽取精确率和召回率等方面均优于基线方法, 证实了所提方法的有效性.

Abstract:

Extracting protocol interactions from textual specification documents written in natural language is useful, especially when to verify the correctness of a protocol before its implementation and application, or when to generate test cases for protocol-connected systems directly from specification documents. Existing approaches for this purpose rely on deep learning or large language models (LLMs). The deep learning approaches require large-scale and high-quality annotated datasets. They may not work well across protocols in different domains due to limitations imposed by the training datasets, and suffer from difficulties in transfer. The LLM-based approaches offer better generalizability, but existing work only uses simple prompt templates. It does not carefully utilize extraction examples in LLM prompting, and the information extraction process lacks optimization, which affects the effectiveness of the proposed approaches. To address these challenges, this study proposes an enhanced LLM-based method for extracting protocol interactions from protocol texts, based on linguistic expression pattern analysis. Specifically, real-world protocol description texts are first analyzed to summarize common linguistic expression patterns in such texts. Then, representative protocol description examples exhibiting these patterns are selected, and corresponding extraction rules are distilled. Further, these examples and rules are integrated to design a rule retrospection chain-of-thought method for LLM-based protocol interaction extraction. Finally, multi-path inference and self-verification techniques are used to optimize the task execution process. Experimental results on multiple protocol datasets show that the proposed method outperforms the baseline methods in terms of precision and recall of protocol interaction extraction, which confirms the effectiveness of the proposed method.

参考文献

相似文献

引证文献

引用本文

张伯洋,钱巨,唐靖然,卫依.基于文本表达特征分析的大语言模型协议交互抽取.软件学报,2026,37(8):1-18

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-09-08
最后修改日期:2025-10-28
录用日期:
在线发布日期: 2025-12-24
出版日期: 2026-08-06

微信小程序

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码