Abstract:An application of deep reinforcement learning makes it possible for unmanned aerial vehicles to complete an autonomous maneuver decision-making.This paper proposes an unmanned combat aerial vehicle (UCAV) autonomous air combat maneuver decision-making method based on the reconstruction of situational assessment models in combination with the proximal policy optimization (PPO) algorithm, providing effective strategy choices for 1 vs 1 within visual range (WVR) air combat.In response to the problem of low model fidelity, this paper, firstly, establishes a dynamic model of a six degree of freedom UCAV and defines the attack mode of WVR air combat. And then, in order to improve the adequacy of the situational assessment model in describing air combat, this paper reconstructs the angle, speed, distance, and altitude situational functions based on the division of air combat states, and proposes a new situational function that describes the potential for maneuver. In terms of reward function design, in addition to rule-based sparse rewards, sub target rewards are established based on the transforma tion of air combat states, and shaping reward functions are designed based on situational functions to enhance guidance capabilities. Finally, an expert system is designed to be a competitor to evaluate the work presented in this paper on the high fidelity air combat simulation platform (JSBSim). The simulation verification shows that being confronted with the fixed maneuvering opponents and expert system opponents, the intelligent agent enables to effectively improve the convergence speed and winning rate of the algorithm by using the method proposed in this paper.