Recognition of air target intent occupies a position of strategic importance in the realm of battlefield situational awareness. Nonetheless, how to quickly and accurately extract pertinent information from extensive situational data is still a question in this domain. The majority of prevalent research models are characterized by intricate architectures, hindering the efficient inference of target intentions within a concise timeframe. For the above-mentioned reasons, a model is introduced based on Transformer architecture. The model is optimized by Reverse method to adapt it further to handle time-series tasks. And, the integration of perturbation elements merged into the position encoding elevates the model’s robustness and generalization capabilities. Additionally, this paper implements lightweight enhancements to both the attention mechanism and the feedforward neural network. By a comprehensive evaluation encompassing comparative experiments, ablation studies, and an in-depth analysis of computational complexity, the efficacy of the proposed model is unequivocally substantiated within the domain of airborne target intent recognition.