Abstract:In view of the problem of autonomous maneuvering counterpursuing in close air combat, a Markov decisionmaking process model for UAV counterpursuing is established, and for the abovementioned reasons, an autonomous maneuvering decisionmaking method for unmanned aerial vehicles (UAVs) based on deep reinforcement learning is proposed. The new method is based on the empirical replay area reconstruction, and improves the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, and generates the optimal strategy network by fitting the strategy function and the state action value function. The simulation experiments show that under condition of random initial position/attitude, being confronted with the drones adopted by the pure pursuit methods, the winning rate of intelligent drones trained by this method exceeds 93%. Compared with traditional TD3 and Deep Deterministic policy gradient (DDPG), this method is faster at convergence and higher in stability.