基于语义分析的小程序代码与隐私声明一致性检测
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家自然科学基金(62172027, U24B20117); 国家重点研发计划(2020YFB1005601); 浙江省自然科学基金(LZ23F020013)


Code to Policy Consistency Detection for Mini Program Based on Semantic Analysis
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    小程序需要为用户提供隐私声明, 告知要使用的隐私信息种类和目的. 代码与隐私声明不一致的小程序可能会欺骗用户导致用户隐私泄露. 现有一致性检测方法中, 将二者转为预设的标签进行一致性判断的方法会损失信息导致漏报, 而仅依靠代码分析的方法也难以应对混淆处理的小程序代码. 针对上述问题, 提出基于语义分析的小程序代码与隐私声明一致性检测方法, 根据定制化污点分析结果提取代码行为, 使用代码语言处理模型将敏感资源使用代码表示为自然语言, 结合隐私声明中资源使用目的, 人工检测与代码行为的一致性. 实验结果表明, 污点分析模块覆盖小程序接口的全部3种数据返回方式和4种常见数据流, 较同类方法提升小程序敏感行为发现能力; 在上万个小程序语义分析中, 发现高频调用接口的部分行为存在隐私泄露风险, 识别出真实环境中代码与隐私声明不一致的小程序.

    Abstract:

    Mini programs are required to provide privacy policies to inform users about the types and purposes of the privacy data being collected and used. However, inconsistencies between the underlying codes and the privacy statements may occur, potentially deceiving users and leading to privacy leakage. Existing methods for detecting such inconsistencies typically rely on converting the code and policies into predefined labels for comparison. This approach introduces information loss during label conversion, resulting in underreporting. In addition, traditional code analysis methods are often ineffective against obfuscated mini program code. To address these limitations, a semantic-analysis-based method for code-to-policy consistency detection in mini programs is proposed. Customized taint analysis is utilized to capture code behaviors based on mini program coding paradigms, and a code language processing model is applied to represent these behaviors as natural language descriptions. By aligning the natural language representation of code behaviors with the stated purposes in privacy policies, expert reviewers can analyze the consistency between the two effectively. Experiments indicate that the proposed taint analysis module covers all three data return methods and four common data flow patterns within mini programs APIs, achieving superior sensitivity compared to existing methods. Semantic analysis of tens of thousands of mini programs reveals privacy leakage risks associated with certain high-frequency API calls. Case studies using the MiniChecker tool further identify real-world instances of mini programs where inconsistencies between code and privacy policies are detected.

    参考文献
    相似文献
    引证文献
引用本文

刘力沛,毛剑,林其箫,吕雨松,李嘉维,刘建伟.基于语义分析的小程序代码与隐私声明一致性检测.软件学报,2025,36(11):5102-5117

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-06-06
  • 最后修改日期:2024-08-29
  • 录用日期:
  • 在线发布日期: 2025-07-17
  • 出版日期: 2025-11-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号