Abstract:Aimed at the problems that while paying attention to the text-based person re-identification models often relying on global and local feature in alignment, very often the inter--modal and intra-modal correlations are in negative, a cross-modal pedestrian re-identification method is proposed based on relationship mining. The method includes a dual-stream network backbone, negative similarity mining module,and relationship encoder module. Firstly,the global and the local feature are in alignment through the dual-stream network backbone. Secondly the granularity of feature discrimination is enhanced by using the negative similarity mining module, and the similar incorrect results are filtered out. Finally, the relationship encoder module is utilized for respectively learning the latent relationship information in both the image and text, achieving relationship-level feature alignment. The experimental results on the CUHKPEDES dataset and the ICFG-PEDES dataset show that this method achieves recognition accuracy higher.