TB-Match: 融合条件扩散模型与近端策略优化的弹性时间约束运单分配方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP301

基金项目:

国家自然科学基金(62461146205); 海南省重点研发项目(ZDYF2025GXJS179)


TB-Match: Elastic Time-constrained Transport Order Assignment Method Integrating Conditional Diffusion Model and Proximal Policy Optimization
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在动态环境下的双边匹配问题中, 对于时间约束与多目标优化的处理机制是影响匹配效率的重要因素之一, 网络货运平台的运单分配即为此类问题的典型实例. 现有方法在处理时间约束的刚性建模和多目标冲突的权衡机制方面存在显著局限性, 难以准确刻画决策主体在约束边界附近的行为特征. 提出一种基于条件扩散模型与分层强化学习的时间约束感知匹配框架TB-Match, 通过弹性约束量化、偏好表征学习、动态权衡优化和策略生成这4个协同模块实现系统性能提升. 该方法的核心贡献包括: (1)基于条件扩散概率模型的约束弹性化表征机制, 通过渐进噪声扩散与逆向去噪过程将确定性时间边界转化为连续概率分布, 精确建模决策主体在约束临界区域的接受概率; (2)融合动态目标权衡与近端策略优化的分层决策架构, 高层网络根据反馈信号自适应调节目标权重, 低层网络通过信任域约束实现长期累积收益最大化. 在两个大规模真实数据集上的实验验证表明, TB-Match在匹配率指标上比现有最优方法相对提升了17.66%, 同时在满意度等指标中均展现出显著的性能优势, 证明了该方法在复杂约束环境下的有效性和适用性.

    Abstract:

    In the bilateral matching problem under dynamic environments, the mechanism for handling time constraints and multi-objective optimization is one of the important factors affecting matching efficiency. The transport order assignment in online freight platforms serves as a typical instance of such problems. Existing methods exhibit significant limitations in rigid modeling of time constraints and in the trade-off mechanisms for multi-objective conflicts, making it difficult to accurately characterize the behavioral patterns of decision agents near constraint boundaries. To address these issues, this study proposes a time-constraint-aware transport order assignment framework called TB-Match. The framework consists of four collaborative modules: elastic constraint quantification, preference representation learning, dynamic objective trade-off optimization, and policy generation. The core contributions are as follows: (1) a constraint elasticity representation mechanism based on conditional diffusion probabilistic models, which converts deterministic time boundaries into continuous probabilistic distributions through progressive noise diffusion and reverse denoising processes, thus accurately modeling the acceptance probability of decision agents in boundary regions; (2) a hierarchical decision framework integrating dynamic objective trade-off and proximal policy optimization, where the high-level network adaptively adjusts objective weights according to feedback signals, and the low-level network maximizes long-term cumulative rewards under trust region constraints. Experimental results on two large-scale real-world logistics datasets demonstrate that TB-Match achieves a 17.66% relative improvement in matching rate compared with state-of-the-art methods. It also exhibits significant advantages in metrics such as satisfaction, verifying the effectiveness and applicability of the proposed method under complex constraint environments.

    参考文献
    相似文献
    引证文献
引用本文

廖家俊,董宜滔,毛嘉莉. TB-Match: 融合条件扩散模型与近端策略优化的弹性时间约束运单分配方法.软件学报,2026,37(5):2024-2042

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-18
  • 最后修改日期:2025-08-20
  • 录用日期:
  • 在线发布日期: 2026-01-28
  • 出版日期: 2026-05-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号