TY  - JOUR
T1  - Spatial multi-semantic features guided spectral-friendly transformer network for hyperspectral image classification
AU  - Yu, Xiaoyan
AU  - Tai, Mingzhu
AU  - Wang, Yuyang
AU  - Shu, Zhenqiu
AU  - Zhu, Liehuang
N1  - Publisher Copyright:
© 2025 Elsevier Ltd
PY  - 2026/4
Y1  - 2026/4
N2  - Hyperspectral image classification (HSIC) is a foundational topic in remote sensing. However, the high correlations between bands and the spectral correlations often result in redundant data. Moreover, traditional convolutional neural networks (CNNs) compress spatial dimensions through pooling layers or strides during spatial information extraction, resulting in the loss of spatial information. To overcome these challenges, we propose a spatial multi-semantic features guided spectral-friendly Transformer network (SFTN), which effectively extracts the spectral and spatial features of HSIs. Specifically, a multi-semantic spatial attention (MsSA) module applies unidirectional spatial compression along the height and width dimensions. Thus, this module maintains spatial structure in one direction while aggregating global spatial information, thereby minimizing information loss during compression. It then employs multi-scale depth-shared 1D convolutions to capture multi-semantic spatial information. Furthermore, the spectral-friendly Transformer replaces the traditional multi-head self-attention (MHSA) with spectral correlation self-attention (ECSa), which effectively captures spectral differences and thus reduces the redundancy of spectral information. Extensive experiments on several HSI datasets show that the proposed SFTN method outperforms other state-of-the-art methods in HSIC applications. The source code for this work will be released later.
AB  - Hyperspectral image classification (HSIC) is a foundational topic in remote sensing. However, the high correlations between bands and the spectral correlations often result in redundant data. Moreover, traditional convolutional neural networks (CNNs) compress spatial dimensions through pooling layers or strides during spatial information extraction, resulting in the loss of spatial information. To overcome these challenges, we propose a spatial multi-semantic features guided spectral-friendly Transformer network (SFTN), which effectively extracts the spectral and spatial features of HSIs. Specifically, a multi-semantic spatial attention (MsSA) module applies unidirectional spatial compression along the height and width dimensions. Thus, this module maintains spatial structure in one direction while aggregating global spatial information, thereby minimizing information loss during compression. It then employs multi-scale depth-shared 1D convolutions to capture multi-semantic spatial information. Furthermore, the spectral-friendly Transformer replaces the traditional multi-head self-attention (MHSA) with spectral correlation self-attention (ECSa), which effectively captures spectral differences and thus reduces the redundancy of spectral information. Extensive experiments on several HSI datasets show that the proposed SFTN method outperforms other state-of-the-art methods in HSIC applications. The source code for this work will be released later.
KW  - CNNs
KW  - Correlation self-attention
KW  - HSIC
KW  - Multi-semantic attention
KW  - Spectral correlation
KW  - Spectral-friendly transformer
UR  - http://www.scopus.com/pages/publications/105014934489
U2  - 10.1016/j.patcog.2025.112337
DO  - 10.1016/j.patcog.2025.112337
M3  - Article
AN  - SCOPUS:105014934489
SN  - 0031-3203
VL  - 172
JO  - Pattern Recognition
JF  - Pattern Recognition
M1  - 112337
ER  -