Spatial multi-semantic features guided spectral-friendly transformer network for hyperspectral image classification

Xiaoyan Yu; Mingzhu Tai; Yuyang Wang; Zhenqiu Shu; Liehuang Zhu

doi:10.1016/j.patcog.2025.112337

Spatial multi-semantic features guided spectral-friendly transformer network for hyperspectral image classification

Xiaoyan Yu, Mingzhu Tai, Yuyang Wang, Zhenqiu Shu^*, Liehuang Zhu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Hyperspectral image classification (HSIC) is a foundational topic in remote sensing. However, the high correlations between bands and the spectral correlations often result in redundant data. Moreover, traditional convolutional neural networks (CNNs) compress spatial dimensions through pooling layers or strides during spatial information extraction, resulting in the loss of spatial information. To overcome these challenges, we propose a spatial multi-semantic features guided spectral-friendly Transformer network (SFTN), which effectively extracts the spectral and spatial features of HSIs. Specifically, a multi-semantic spatial attention (MsSA) module applies unidirectional spatial compression along the height and width dimensions. Thus, this module maintains spatial structure in one direction while aggregating global spatial information, thereby minimizing information loss during compression. It then employs multi-scale depth-shared 1D convolutions to capture multi-semantic spatial information. Furthermore, the spectral-friendly Transformer replaces the traditional multi-head self-attention (MHSA) with spectral correlation self-attention (ECSa), which effectively captures spectral differences and thus reduces the redundancy of spectral information. Extensive experiments on several HSI datasets show that the proposed SFTN method outperforms other state-of-the-art methods in HSIC applications. The source code for this work will be released later.

Original language	English
Article number	112337
Journal	Pattern Recognition
Volume	172
DOIs	http://doi.org/10.1016/j.patcog.2025.112337
Publication status	Published - Apr 2026
Externally published	Yes

Keywords

CNNs
Correlation self-attention
HSIC
Multi-semantic attention
Spectral correlation
Spectral-friendly transformer

Access to Document

10.1016/j.patcog.2025.112337

Cite this

@article{525d5c68ac8443a08e8bbdf3692ba3a4,

title = "Spatial multi-semantic features guided spectral-friendly transformer network for hyperspectral image classification",

abstract = "Hyperspectral image classification (HSIC) is a foundational topic in remote sensing. However, the high correlations between bands and the spectral correlations often result in redundant data. Moreover, traditional convolutional neural networks (CNNs) compress spatial dimensions through pooling layers or strides during spatial information extraction, resulting in the loss of spatial information. To overcome these challenges, we propose a spatial multi-semantic features guided spectral-friendly Transformer network (SFTN), which effectively extracts the spectral and spatial features of HSIs. Specifically, a multi-semantic spatial attention (MsSA) module applies unidirectional spatial compression along the height and width dimensions. Thus, this module maintains spatial structure in one direction while aggregating global spatial information, thereby minimizing information loss during compression. It then employs multi-scale depth-shared 1D convolutions to capture multi-semantic spatial information. Furthermore, the spectral-friendly Transformer replaces the traditional multi-head self-attention (MHSA) with spectral correlation self-attention (ECSa), which effectively captures spectral differences and thus reduces the redundancy of spectral information. Extensive experiments on several HSI datasets show that the proposed SFTN method outperforms other state-of-the-art methods in HSIC applications. The source code for this work will be released later.",

keywords = "CNNs, Correlation self-attention, HSIC, Multi-semantic attention, Spectral correlation, Spectral-friendly transformer",

author = "Xiaoyan Yu and Mingzhu Tai and Yuyang Wang and Zhenqiu Shu and Liehuang Zhu",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2026",

month = apr,

doi = "10.1016/j.patcog.2025.112337",

language = "English",

volume = "172",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Spatial multi-semantic features guided spectral-friendly transformer network for hyperspectral image classification

AU - Yu, Xiaoyan

AU - Tai, Mingzhu

AU - Wang, Yuyang

AU - Shu, Zhenqiu

AU - Zhu, Liehuang

PY - 2026/4

Y1 - 2026/4

N2 - Hyperspectral image classification (HSIC) is a foundational topic in remote sensing. However, the high correlations between bands and the spectral correlations often result in redundant data. Moreover, traditional convolutional neural networks (CNNs) compress spatial dimensions through pooling layers or strides during spatial information extraction, resulting in the loss of spatial information. To overcome these challenges, we propose a spatial multi-semantic features guided spectral-friendly Transformer network (SFTN), which effectively extracts the spectral and spatial features of HSIs. Specifically, a multi-semantic spatial attention (MsSA) module applies unidirectional spatial compression along the height and width dimensions. Thus, this module maintains spatial structure in one direction while aggregating global spatial information, thereby minimizing information loss during compression. It then employs multi-scale depth-shared 1D convolutions to capture multi-semantic spatial information. Furthermore, the spectral-friendly Transformer replaces the traditional multi-head self-attention (MHSA) with spectral correlation self-attention (ECSa), which effectively captures spectral differences and thus reduces the redundancy of spectral information. Extensive experiments on several HSI datasets show that the proposed SFTN method outperforms other state-of-the-art methods in HSIC applications. The source code for this work will be released later.

AB - Hyperspectral image classification (HSIC) is a foundational topic in remote sensing. However, the high correlations between bands and the spectral correlations often result in redundant data. Moreover, traditional convolutional neural networks (CNNs) compress spatial dimensions through pooling layers or strides during spatial information extraction, resulting in the loss of spatial information. To overcome these challenges, we propose a spatial multi-semantic features guided spectral-friendly Transformer network (SFTN), which effectively extracts the spectral and spatial features of HSIs. Specifically, a multi-semantic spatial attention (MsSA) module applies unidirectional spatial compression along the height and width dimensions. Thus, this module maintains spatial structure in one direction while aggregating global spatial information, thereby minimizing information loss during compression. It then employs multi-scale depth-shared 1D convolutions to capture multi-semantic spatial information. Furthermore, the spectral-friendly Transformer replaces the traditional multi-head self-attention (MHSA) with spectral correlation self-attention (ECSa), which effectively captures spectral differences and thus reduces the redundancy of spectral information. Extensive experiments on several HSI datasets show that the proposed SFTN method outperforms other state-of-the-art methods in HSIC applications. The source code for this work will be released later.

KW - CNNs

KW - Correlation self-attention

KW - HSIC

KW - Multi-semantic attention

KW - Spectral correlation

KW - Spectral-friendly transformer

UR - http://www.scopus.com/pages/publications/105014934489

U2 - 10.1016/j.patcog.2025.112337

DO - 10.1016/j.patcog.2025.112337

M3 - Article

AN - SCOPUS:105014934489

SN - 0031-3203

VL - 172

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 112337

ER -

Spatial multi-semantic features guided spectral-friendly transformer network for hyperspectral image classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this