基于文本表达特征分析的大语言模型协议交互抽取
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国防基础科研项目(JCKY2022605C006)


Extracting Protocol Interactions via LLM Based on Linguistic Expression Pattern Analysis
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    文本协议交互抽取旨在从自然语言形式的说明文档中识别并提取协议有关的交互信息, 其可用于在协议代码实现前抽取模型验证协议的正确性、从协议规格描述构造测试用例等. 当前从文本中抽取协议主要采用深度学习、大语言模型等技术, 深度学习方法依赖大规模的高质量数据集, 且适用范围受限于训练数据集, 存在迁移困难的问题. 现有的大语言模型方法存在提示模板和示例构造较为简单、处理流程欠优化的局限. 针对以上问题, 提出一种增强的基于文本表达特征分析的大语言模型协议交互抽取方法. 首先, 从真实的协议描述案例出发, 总结出协议描述文本中存在的常见语言表达特征; 然后, 提取能够体现这些特征的典型协议描述案例, 提炼用于协议交互抽取的处理规则; 进一步地, 融合案例与规则, 提出一套规则回溯思维链方法; 最后, 使用多路推理和自我验证技术优化任务处理流程. 在多个协议数据集上的实验表明, 所提方法在协议交互的抽取精确率和召回率等方面均优于基线方法, 证实了所提方法的有效性.

    Abstract:

    Extracting protocol interactions from textual specification documents written in natural language is useful, especially when to verify the correctness of a protocol before its implementation and application, or when to generate test cases for protocol-connected systems directly from specification documents. Existing approaches for this purpose rely on deep learning or large language models (LLMs). The deep learning approaches require large-scale and high-quality annotated datasets. They may not work well across protocols in different domains due to limitations imposed by the training datasets, and suffer from difficulties in transfer. The LLM-based approaches offer better generalizability, but existing work only uses simple prompt templates. It does not carefully utilize extraction examples in LLM prompting, and the information extraction process lacks optimization, which affects the effectiveness of the proposed approaches. To address these challenges, this study proposes an enhanced LLM-based method for extracting protocol interactions from protocol texts, based on linguistic expression pattern analysis. Specifically, real-world protocol description texts are first analyzed to summarize common linguistic expression patterns in such texts. Then, representative protocol description examples exhibiting these patterns are selected, and corresponding extraction rules are distilled. Further, these examples and rules are integrated to design a rule retrospection chain-of-thought method for LLM-based protocol interaction extraction. Finally, multi-path inference and self-verification techniques are used to optimize the task execution process. Experimental results on multiple protocol datasets show that the proposed method outperforms the baseline methods in terms of precision and recall of protocol interaction extraction, which confirms the effectiveness of the proposed method.

    参考文献
    相似文献
    引证文献
引用本文

张伯洋,钱巨,唐靖然,卫依.基于文本表达特征分析的大语言模型协议交互抽取.软件学报,2026,37(8):1-18

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-09-08
  • 最后修改日期:2025-10-28
  • 录用日期:
  • 在线发布日期: 2025-12-24
  • 出版日期: 2026-08-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号