基于AdaGrad的自适应NAG方法及其最优个体收敛性

doi:10.13328/j.cnki.jos.006464

微信小程序

微信服务号

微信订阅号

首页 > 过刊浏览>2022年第33卷第4期 >1231-1243. DOI:10.13328/j.cnki.jos.006464

PDF HTML阅读 XML下载导出引用引用提醒

基于AdaGrad的自适应NAG方法及其最优个体收敛性
DOI:
                        10.13328/j.cnki.jos.006464
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:陇盛(1998－),男,硕士生,主要研究领域为机器学习,模式识别;
张泽东(1994－),男,硕士生,主要研究领域为机器学习,模式识别;
陶蔚(1991－),男,博士,助理研究员,主要研究领域为机器学习;
陶卿(1965－),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为机器学习,模式识别,应用数学.
通讯作者:陶卿,E-mail:qing.tao@ia.ac.cn
中图分类号:
基金项目:国家自然科学基金(62076252,62106281)

Adaptive NAG Methods Based on AdaGrad and Its Optimal Individual Convergence

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

与梯度下降法相比,自适应梯度下降方法(AdaGrad)利用过往平方梯度的算数平均保存了历史数据的几何信息,在处理稀疏数据时获得了更紧的收敛界.另一方面,Nesterov加速梯度方法(Nesterov’s accelerated gradient,NAG)在梯度下降法的基础上添加了动量运算,在求解光滑凸优化问题时具有数量级加速收敛的性能,在处理非光滑凸问题时也获得了最优的个体收敛速率.最近,已经出现了自适应策略与NAG相结合的研究,但现有代表性的自适应NAG方法AcceleGrad由于采取的自适应方式与AdaGrad不同,步长未能在不同维度上体现差异性,仅得到了加权平均方式的收敛速率,个体收敛速率的理论分析尚存在缺失.提出了一种自适应NAG方法,继承了AdaGrad的步长设置方式,证明了所提算法在解决约束非光滑凸优化问题时具有最优的个体收敛速率.在L1范数约束下,通过求解典型的hinge损失函数分类和L1损失函数回归优化问题.实验验证了理论分析的正确性,也表明了所提算法的性能优于AcceleGrad.

Abstract:

Compared with the gradient descent, the adaptive gradient descent (AdaGrad) keeps the geometric information of historical data by using the arithmetic average of squared past gradients, and obtains tighter convergence bounds when copingwith sparse data. On the other hand, by adding the momentum term to gradient descent, Nesterov’s accelerated gradient (NAG) not only obtains order of magnitude accelerated convergence for solving smooth convex optimization problems, but also achieves the optimal individual convergence rate for non-smooth convex problems. Recently, there have been studies on the combination of adaptive strategy and NAG. However, as a typical adaptive NAG algorithm, AcceleGrad fails to reflect the distinctions between dimensions due to using different adaptive variant from AdaGrad, and it only obtains the weighted averaging convergence rate. So far, there still lacks the theoretical analysis of individual convergence rate. In this study, an adaptive NAG method, which inherits AdaGrad’s step size setting, is proposed. It is proved that the proposed algorithm attains the optimal individual convergence rate when solving the constrained non-smooth convex optimization problems. The experiments are conducted on the typical optimization problem of hinge loss function for classification and L1 loss function for regression with L1 norm constraint, and experimental results verify the correctness of the theoretical analysis and superior performance of the proposed algorithm over AcceleGrad.

参考文献

相似文献

引证文献

引用本文

陇盛,陶蔚,张泽东,陶卿.基于AdaGrad的自适应NAG方法及其最优个体收敛性.软件学报,2022,33(4):1231-1243

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-03-09
最后修改日期:2021-07-16
录用日期:
在线发布日期: 2021-10-26
出版日期: 2022-04-06

微信小程序

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码