[author_cn_name].[cn_title][J].空军工程大学学报:自然科学版,[year_id],[volume]([issue]):[start_page]-[end_page] 基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策-Research on UAV Anti-Pursing Maneuvering Decision Based on Improved Twin Delayed Deep Deterministic Policy Gradient Method
文章摘要
郭万春,解武杰,尹晖,董文瀚.基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策[J].空军工程大学学报:自然科学版,2021,22(4):15-21
基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策
Research on UAV Anti-Pursing Maneuvering Decision Based on Improved Twin Delayed Deep Deterministic Policy Gradient Method
  
DOI:
中文关键词: 深度强化学习  近距空战  无人机  双延迟深度确定性策略梯度法
英文关键词: deep reinforcement learning  close air combat  UAV  twin delayed deep deterministic policy gradient method
基金项目:
作者单位
郭万春,解武杰,尹晖,董文瀚 1.空军工程大学航空工程学院 西安 710038 2.空军工程大学教研保障中心 西安 710051 
摘要点击次数: 97
全文下载次数: 59
中文摘要:
      针对近距空战下的自主机动反追击问题,建立了无人机反追击马尔科夫(Markov)决策过程模型;在此基础上,提出了一种采用深度强化学习的无人机反追击自主机动决策方法。新方法基于经验回放区重构,改进了双延迟深度确定性策略梯度(TD3)算法,通过拟合策略函数与状态动作值函数,生成最优策略网络。仿真实验表明,在随机初始位置/姿态条件下,与采用纯追踪法的无人机对抗,该方法训练的智能无人机胜率超过93%;与传统的TD3、深度确定性策略梯度(DDPG)算法相比,该方法收敛性更快、稳定性更高。
英文摘要:
      In view of the problem of autonomous maneuvering counter pursuing in close air combat, a Markov decision making process model for UAV counter pursuing is established, and for the above mentioned reasons, an autonomous maneuvering decision making method for unmanned aerial vehicles (UAVs) based on deep reinforcement learning is proposed. The new method is based on the empirical replay area reconstruction, and improves the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, and generates the optimal strategy network by fitting the strategy function and the state action value function. The simulation experiments show that under condition of random initial position/attitude, being confronted with the drones adopted by the pure pursuit methods, the winning rate of intelligent drones trained by this method exceeds 93%. Compared with traditional TD3 and Deep Deterministic policy gradient (DDPG), this method is faster at convergence and higher in stability.
查看全文   查看/发表评论  下载PDF阅读器
关闭