RJXB软件学报Journal of Software1000-9825软件学报编辑部中国北京rjxb-30-3-84510.13328/j.cnki.jos.005689TP18智能数据管理与分析技术专刊SPECIAL ISSUE ON TECHNIQUES FOR INTELLIGENT DATA MANAGEMENT AND ANALYSIS基于强化学习的金融交易系统研究与发展Review on Financial Trading System Based on Reinforcement Learning梁天新LIANGTian-Xin
In recent years, reinforcement learning has made great progress in the fields of electronic games, chess, and decision-making control. It has also driven the rapid development of financial transaction systems. The issue of financial transactions has become a hot topic in the field of reinforcement learning. Especially, it has wide application demand and academic research significance in the fields of stock, foreign exchange, and futures. This paper summarizes the research achievements of transaction systems, adaptive algorithms, and transaction strategies based on the progress of reinforcement learning models, which are commonly used in the financial field. Finally, the difficulties and challenges of reinforcement learning in financial trading system are discussed, and the future development trend is prospected.
强化学习深度学习金融交易系统自适应算法交易策略reinforcement learningdeep learningfinancial trading systemadaptive algorithmtrading strategy国家自然科学基金71531012国家自然科学基金(71531012)National Natural Science Foundation of China71531012National Natural Science Foundation of China (71531012)
ReferencesFama EugeneFRandom walks in stock market prices1965215555910.2469/faj.v21.n5.55
Fama Eugene F. Random walks in stock market prices. Financial Analysts Journal, 1965, 21(5):55-59.
FarmerJDMarket force, ecology and evolution199811589595310.1093/icc/11.5.895
Farmer JD. Market force, ecology and evolution. Computing in Economics & Finance, 1998, 11(5):895-953(59).[doi: 10.1093/icc/ 11.5.895]
LoAWThe adaptive markets hypothesis: Market efficiency from an evolutionary perspective200410.3905/jpm.2004.442611
Lo AW. The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. Social Science Electronic Publishing, 2004.[doi: 10.3905/jpm.2004.442611]
Moody J, Saffell M. Reinforcement learning for trading. In: Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 917-923.
MoodyJWuLLiaoYSaffellMPerformance functions and reinforcement learning for trading systems and portfolios1998175-644147010.1002/(sici)1099-131x(1998090)17:5/6<441::aid-for707>3.3.co;2-r
Moody J, Wu L, Liao Y, Saffell M. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 1998, 17(5-6):441-470.[doi: 10.1002/(sici)1099-131x(1998090)17:5/6<441::aid-for707>3.3.co;2-r]
HintonGESalakhutdinovRRReducing the dimensionality of data with neural networks2006313578650450710.1126/science.1127647
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507.[doi: 10.1126/science.1127647]
doi: 10.1145/2464576.2480773]]]>
ZhangJMaringerDUsing a genetic algorithm to improve recurrent reinforcement learning for equity trading201647455156710.1007/s10614-015-9490-y
Zhang J, Maringer D. Using a genetic algorithm to improve recurrent reinforcement learning for equity trading. Computational Economics, 2016, 47(4):551-567.[doi: 10.1007/s10614-015-9490-y]
WerbosPJAdvanced forecasting methods for global crisis warning and models of intelligence19772262538
Werbos PJ. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22(6):25-38.
doi: 10.1109/cdc.1995.478953]]]>
LewisFLVrabieDReinforcement learning and adaptive dynamic programming for feedback control200993325010.1109/MCAS.2009.933854
Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009, 9(3):32-50.[doi: 10.1109/MCAS.2009.933854]
doi: 10.1109/tnnls.2013.2281663]]]>
ZhaoHWangBLiaoJWangHTanGAdaptive dynamic programming for control: algorithms and stability2013544560196022
Zhao H, Wang B, Liao J, Wang H, Tan G. Adaptive dynamic programming for control: algorithms and stability. Communications & Control Engineering, 2013, 54(45):6019-6022.
doi: 10.1109/mwscas.2003.1562233]]]>
JangminOLeeJLeeJWZhangBTAdaptive stock trading with dynamic asset allocation using reinforcement learning2006176152121214710.1016/j.ins.2005.10.009
Jangmin O, Lee J, Lee JW, Zhang BT. Adaptive stock trading with dynamic asset allocation using reinforcement learning. Information Sciences, 2006, 176(15):2121-2147.[doi: 10.1016/j.ins.2005.10.009]
DempsterMAHLeemansVAn automated FX trading system using adaptive reinforcement learning200630354355210.1016/j.eswa.2005.10.012
Dempster MAH, Leemans V. An automated FX trading system using adaptive reinforcement learning. Expert Systems with Applications, 2006, 30(3):543-552.[doi: 10.1016/j.eswa.2005.10.012]
doi: 10.1007/978-3-540-74827-4_78].]]>
TanZQuekCChengPYKStock trading with cycles: A financial application of ANFIS and reinforcement learning20113854741475510.1016/j.eswa.2010.09.001
Tan Z, Quek C, Cheng PYK. Stock trading with cycles: A financial application of ANFIS and reinforcement learning. Expert Systems with Applications, 2011, 38(5):4741-4755.[doi: 10.1016/j.eswa.2010.09.001]
AlmahdiSYangSYAn adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown20178726727910.1016/j.eswa.2017.06.023
Almahdi S, Yang SY. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 2017, 87:267-279.[doi: 10.1016/j.eswa.2017. 06.023]
HamiltonJDA new approach to the economic analysis of nonstationary time series and the business cycle198957235738410.2307/1912559
Hamilton JD. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 1989, 57(2):357-384.[doi: 10.2307/1912559]
HamiltonJDSusmelRAutoregressive conditional heteroskedasticity and changes in regime1994641-230733310.1016/0304-4076(94)90067-1
Hamilton JD, Susmel R. Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics, 1994, 64(1-2):307-333.[doi: 10.1016/0304-4076(94)90067-1]
GraySFModeling the conditional distribution of interest rates as a regime-switching process1996421276210.1016/0304-405x(96)00875-6
Gray SF. Modeling the conditional distribution of interest rates as a regime-switching process. Journal of Financial Economics, 1996, 42(1):27-62.[doi: 10.1016/0304-405x(96)00875-6]
MaringerDRamtohulTRegime-switching recurrent reinforcement learning for investment decision making2012918910710.1007/s10287-011-0131-1
Maringer D, Ramtohul T. Regime-switching recurrent reinforcement learning for investment decision making. Computational Management Science, 2012, 9(1):89-107.[doi: 10.1007/s10287-011-0131-1]
Wierstra D, F rster A, Peters J, Schmidhuber J. Recurrent policy gradients. Logic Journal of Igpl, 2010, 18(2010):620-634.[doi: 10.1093/jigpal/jzp049]
Baird L, Moore A. Gradient descent for general reinforcement learning. In: Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 968-974.
WatkinsCJCHLearning from delayed rewards1989154233235
JaakkolaTJordanMISinghSPOn the convergence of stochastic iterative dynamic programming algorithms1993661185120110.21236/ada276517
Jaakkola T, Jordan MI, Singh SP. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 1993, 6(6):1185-1201.[doi: 10.21236/ada276517]
TsitsiklisJNAsynchronous stochastic approximation and Q-learning199416318520210.1007/bf00993306
MooreAWAtkesonCGPrioritized sweeping: Reinforcement learning with less data and less time199313110313010.1007/bf00993104
Moore AW, Atkeson CG. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 1993, 13(1): 103-130.[doi: 10.1007/bf00993104]
MahadevanSMaggioniMProto-value functions: A laplacian framework for learning representation and control in markov decision processes200782169223110.1145/1102351.1102421
Mahadevan S, Maggioni M. Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 2007, 8:2169-2231.[doi: 10.1145/1102351.1102421]
SuttonRSPolicy gradient methods for reinforcement learning with function approximation19991210571063
Sutton RS. Policy gradient methods for reinforcement learning with function approximation. Submitted to Advances in Neural Information Processing Systems, 1999, 12:1057-1063.
Q-learning framework for optimizing stock trading systems. In: Proc. of the Int'l Conf. on Database and Expert Systems Applications. Springer-Verlag, 2002. 153-162.[doi: 10.1007/3-540-46146-9_16]]]>
doi: 10.1109/tsmca.2007.904825]]]>
doi: 10.1109/ijcnn.2006.246728]]]>
BertoluzzoFCorazzaMReinforcement learning for automatic financial trading: Introduction and some applications201210.2139/ssrn.2192034
Bertoluzzo F, Corazza M. Reinforcement learning for automatic financial trading: Introduction and some applications. Working Papers, 2012.[doi: 10.2139/ssrn.2192034]
BertoluzzoFCorazzaMTesting different reinforcement learning configurations for financial trading: Introduction and applications20123338687710.1016/s2212-5671(12)00122-0
Bertoluzzo F, Corazza M. Testing different reinforcement learning configurations for financial trading: Introduction and applications. Procedia Economics & Finance, 2012, 3(338):68-77.[doi: 10.1016/s2212-5671(12)00122-0]
CorazzaMBertoluzzoFQ-learning-based financial trading systems with applications201410.2139/ssrn.2507826
Corazza M, Bertoluzzo F. Q-learning-based financial trading systems with applications. Social Science Electronic Publishing, 2014.[doi: 10.2139/ssrn.2507826]
EilersDDunisCLvon MettenheimHJBreitnerMHIntelligent trading of seasonal effects: A decision support algorithm based on reinforcement learning20146410010810.1016/j.dss.2014.04.011
Eilers D, Dunis CL, von Mettenheim HJ, Breitner MH. Intelligent trading of seasonal effects: A decision support algorithm based on reinforcement learning. Decision Support Systems, 2014, 64:100-108.[doi: 10.1016/j.dss.2014.04.011]
BekirosSDHeterogeneous trading strategies with adaptive fuzzy actor—Critic reinforcement learning: A behavioral approach20103461153117010.1016/j.jedc.2010.01.015
Bekiros SD. Heterogeneous trading strategies with adaptive fuzzy actor—Critic reinforcement learning: A behavioral approach. Journal of Economic Dynamics & Control, 2010, 34(6):1153-1170.[doi: 10.1016/j.jedc.2010.01.015]
MnihVKavukcuogluKSilverDGravesAAntonoglouIWierstraDPlaying atari with deep reinforcement learning2013
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D. Playing atari with deep reinforcement learning. Computer Science, 2013.
MnihVKavukcuogluKSilverDRusuAAVenessJHuman-level control through deep reinforcement learning2015518754052910.1038/nature14236
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529.[doi: 10.1038/nature14236]
LillicrapTPHuntJJPritzelAHeessNErezTTassaYContinuous control with deep reinforcement learning201586
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa, Y. Continuous control with deep reinforcement learning. Computer Science, 2015, 8(6):A187.
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T. Asynchronous methods for deep reinforcement learning. 2016.
doi: 10.1145/3065386]]]>
RussakovskyODengJSuHKrauseJSatheeshSMaSImage net large scale visual recognition challenge2015115321125210.1007/s11263-015-0816-y
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S. Image net large scale visual recognition challenge. Int'l Journal of Computer Vision, 2015, 115(3):211-252.[doi: 10.1007/s11263-015-0816-y]
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. In: Proc. of the Int'l Conf. on Machine Learning. 2014. 387-395.
https://arxiv.org/abs/1706.10059
]]>
SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAMastering the game of Go without human knowledge2017550767635435910.1038/nature24270
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A. Mastering the game of Go without human knowledge. Nature, 2017, 550(7676):354-359.[doi: 10.1038/nature24270]