基于异构依赖网络的开源AI资源可信性风险量化
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家自然科学基金青年基金(62202480)


Quantifying Credibility Risk of Open-source AI Resources Based on Heterogeneous Dependency Networks
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着开源模型与数据集规模快速扩张, Hugging Face生态中资源之间形成了复杂的模型-数据集为主体的异构依赖网络, 元数据缺失、依赖集中度等问题使链式风险更易累积与传播. 为刻画这一风险基础, 基于Hugging Face快照构建了一个AI资源依赖网络, 并从全局拓扑与时间演化两个维度分析其结构和演化特征; 进一步提出融合属性完整性、社区反馈的“可信风险”指标, 对模型与数据集节点进行连续化风险量化与排序. 结果表明, 该依赖网络呈显著“尖峰+长尾”结构, 依赖高度集中于少数枢纽节点, 大量资源在关键数据流关系中处于孤立或半孤立状态; 同时, 模型规模爆炸式增长而数据集与贡献者扩张滞后, 导致生态对有限核心数据源的路径依赖持续增强, 形成结构性系统风险. 在节点层面, 可信风险指标对参数扰动保持稳健, 能够在多类风险来源下显著区分高/低风险节点并优于基线方法; 风险耦合分析与专家盲评进一步验证了高风险数据集与高风险模型在局部结构中的聚集与传播效应. 为开源AI生态的风险筛查与治理提供了可复现的量化依据.

    Abstract:

    As the scale of open-source models and datasets continues to expand, the Hugging Face ecosystem forms a complex heterogeneous dependency network centered on model-dataset relationships. Issues such as missing metadata and high dependency concentration make chain-structured risks more likely to accumulate and propagate. To characterize this underlying risk landscape, this study constructs a resource dependency network of open-source AI resources based on a Hugging Face snapshot and analyzes its structural and evolutionary characteristics from the perspectives of global topology and temporal evolution. Furthermore, a “credibility risk” indicator is proposed by integrating metadata completeness and community feedback, enabling continuous risk quantification and ranking of model and dataset nodes. The results show that the dependency network exhibits a pronounced “spike-and-long-tail” structure, in which dependencies are highly concentrated on a small number of hub nodes, while a large proportion of resources remain isolated or semi-isolated within critical data-flow relationships. Meanwhile, the explosive growth of models, combined with the relatively slow expansion of datasets and contributors, strengthens the ecosystem’s path dependence on a limited set of core data sources, thereby giving rise to structural systemic risks. At the node level, the proposed credibility risk indicator demonstrates robustness to parameter perturbations and effectively distinguishes high-risk from low-risk nodes across multiple risk sources, outperforming baseline methods. Risk-coupling analysis and expert blind evaluation further confirm the clustering and propagation effects of high-risk datasets and high-risk models within local structures. Overall, this study provides a reproducible quantitative basis for risk screening and governance in open-source AI ecosystems.

    参考文献
    相似文献
    引证文献
引用本文

姚思梦,张洋,赵佳林,李俊辰,沈阳,王涛,张迅晖.基于异构依赖网络的开源AI资源可信性风险量化.软件学报,,():1-25

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-09-17
  • 最后修改日期:2025-11-09
  • 录用日期:
  • 在线发布日期: 2026-04-22
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号