Abstract:. Accurately extracting water body information from remote sensing images is crucial for fields such as water resource management and disaster monitoring. Aiming at the deficiencies of traditional semantic segmentation models in multi-scale feature utilization, boundary depiction in complex scenes, and differentiation of similar ground objects, this paper proposes a Multi-Scale Feature Extraction and Interaction Network (MSFEINet). It designs a Feature Fusion Module, a Multi-scale Convolution Module, a Scale-Channel Attention Module, and a Depthwise Separable Convolution Feature Extraction Module. Through multi-scale feature interaction, attention mechanism, and cross-layer feature fusion, the accuracy and efficiency of water body segmentation are improved. Experimental results show that MSFEINet can depict contour details more accurately and achieve better segmentation integrity, demonstrating comprehensive advantages in both accuracy and efficiency.