基于一种改进PPO算法的无人机空战自主机动决策方法研究

首页 > 过刊浏览>2024年第25卷第6期 >77-86

基于一种改进PPO算法的无人机空战自主机动决策方法研究
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:V279.3
基金项目:国家自然科学基金(62303362)

Research on Autonomous Maneuver Decision Method for Unmanned Aerial Combat Based on an Improved PPO Algorithm

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

深度强化学习的应用为无人机自主机动决策提供了新的可能。提出一种基于态势评估模型重构与近端策略优化(PPO)算法相结合的无人机自主空战机动决策方法,为一对一近距空战提供了有效策略选择。首先,建立高保真六自由度无人机模型与近距空战攻击模型;其次,基于空战状态划分重构角度、速度、距离和高度态势函数,提出一种描述机动潜力的新型态势评估指标;之后,基于态势函数设计塑形奖励,并与基于规则的稀疏奖励、基于状态转换的子目标奖励共同构成算法奖励函数,增强了强化学习算法的引导能力;最后,设计专家系统作为对手,在高保真空战仿真平台(JSBSim)中对本文工作进行了评估。仿真验证,应用本文方法的智能体在对抗固定机动对手与专家系统对手时算法收敛速度与胜率都得到了有效提升。

Abstract:

An application of deep reinforcement learning makes it possible for unmanned aerial vehicles to complete an autonomous maneuver decision-making.This paper proposes an unmanned combat aerial vehicle (UCAV) autonomous air combat maneuver decision-making method based on the reconstruction of situational assessment models in combination with the proximal policy optimization (PPO) algorithm, providing effective strategy choices for 1 vs 1 within visual range (WVR) air combat.In response to the problem of low model fidelity, this paper, firstly, establishes a dynamic model of a six degree of freedom UCAV and defines the attack mode of WVR air combat. And then, in order to improve the adequacy of the situational assessment model in describing air combat, this paper reconstructs the angle, speed, distance, and altitude situational functions based on the division of air combat states, and proposes a new situational function that describes the potential for maneuver. In terms of reward function design, in addition to rule-based sparse rewards, sub target rewards are established based on the transforma tion of air combat states, and shaping reward functions are designed based on situational functions to enhance guidance capabilities. Finally, an expert system is designed to be a competitor to evaluate the work presented in this paper on the high fidelity air combat simulation platform (JSBSim). The simulation verification shows that being confronted with the fixed maneuvering opponents and expert system opponents, the intelligent agent enables to effectively improve the convergence speed and winning rate of the algorithm by using the method proposed in this paper.

参考文献

相似文献

引证文献

引用本文

张欣, 董文瀚, 尹晖, 贺磊, 张聘, 李敦旺.基于一种改进PPO算法的无人机空战自主机动决策方法研究[J].空军工程大学学报,2024,25(6):77-86

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-12-06
出版日期:

欢迎访问《空军工程大学学报》官方网站!

首页

期刊简介

投审指南

过刊浏览

信息公告

出版伦理

OA政策声明

大学主页

联系我们

English

引用本文

分享

文章指标

历史