文章摘要
丁维, 王渊, 丁达理, 磊, 周欢, 谭目来, 吕丞辉.基于LSTM-PPO算法的无人作战飞机近距空战机动决策[J].空军工程大学学报:自然科学版,2022,23(3):19-25
基于LSTM-PPO算法的无人作战飞机近距空战机动决策
Maneuvering Decision of UCAV in Close Air Combat Based on LSTM-PPO Algorithm
  
DOI:
中文关键词: 无人作战飞机  空战机动决策  深度强化学习  近谝策略伏化  长短时记忆网络
英文关键词: unmanned combat aerial vehicles  air combat maneuver decision  deep reinforcement learning  proximal policy optimization  short and long duration memory network
基金项目:陕西省自然科学基金(2020JQ-481)
作者单位
丁维, 王渊, 丁达理, 磊, 周欢, 谭目来, 吕丞辉 空军工程大学航空工程学院西安710038 
摘要点击次数: 48
全文下载次数: 30
中文摘要:
      近距空战中环境复杂、格斗态势高速变化,基于对策理论的方法因数据迭代量大而不能满足实时性要求,基于数据驱动的方法存在训练时间长、执行效率低的问题。对此,提出了一种基于深度强化学习算法的UCAV近距空战机动决策方法。首先,在UCAV三自由度模型的基础上构建飞行驱动模块,形成状态转移更新机制;然后在近端策略优化算法的基础上加入Ornstein-Uhlenbeck随机噪声以提高UCAV对未知状态空间的探索能力,结合长短时记忆网络(LSTM)增强对序列样本数据的学习能力,提升算法的训练效率和效果。最后通过设计3组近距空战仿真实验,并与PPO算法作性能对比,验证所提方法的有效性和优越性。
英文摘要:
      With the increasing military application of unmanned combat aircraft (UCAV), unmanned combat will become the main combat mode in the future air battlefield. In close range air combat, the environment is complex and the combat situation changes rapidly. The method based on game theory cannot meet the real time requirements due to the large amount of data iteration, and the data driven method has the problems of long training time and low execution efficiency. To solve this problem, a UCAV maneuver decision method based on deep reinforcement learning algorithm is proposed in this paper. Firstly, the flight drive module is constructed on the basis of UCAV three degree of freedom model to form the state transition updating mechanism. Then, on the basis of PPO algorithm, ornstein uhlenbeck (OU) random noise was added to improve UCAV's ability to explore unknown state space, and LSTM was combined to enhance UCAV's ability to learn sequence sample data, so as to improve the training efficiency and effect of the algorithm. Finally, the effectiveness and superiority of the proposed method are verified by designing three groups of close range air combat simulation experiments and comparing the performance with PPO algorithm.
查看全文   查看/发表评论  下载PDF阅读器
关闭