Abstract:Automatic medical image segmentation based on deep learning plays a crucial role in clinical diagnosis and treatment. To address the limitations of traditional convolutional neural network (CNN) models, which are constrained by local receptive fields, and the overfitting issues of Transformer and multilayer perceptron (MLP) models on small medical image datasets, we propose a multi-scale multi-axis feature fusion model called MSAFNet. This model employs a novel multi-axis mixed residual channel attention block (MX-RCAB) that focuses on local details and global dependencies, thereby enhancing feature representation in both spatial and channel dimensions. Additionally, it utilizes a spatial cross-gating block (SCGB) to filter redundant information and capture discriminative low-level details, thereby improving segmentation performance. Experimental results on the Synapse and ACDC datasets demonstrate that MSAFNet achieves average DSCs of 85.59% and 92.37%, respectively, outperforming representative medical image segmentation methods such as nnUNet and TransUNet.