As medical image segmentation is the basic processing step in most of medical image analysis, medical image segmentation is required to have high segmentation accuracy and good stability. Therefore, it is necessary to develop automatic segmentation methods for medical images. However, traditional medical image segmentation methods cannot meet these requirements. Traditional automatic extraction methods need to adjust different parameters for different types of images, and can not be segmtioned across different types of images. Moreover, it is difficult to meet the needs of clinical and brain research in terms of extraction precision, extraction speed and extraction stability. The method proposed in this paper can solve these problems well. It can achieve high precision, fast and stable medical image segmentation without adjusting parameters. In recent years, the deep learning techniques have achieved rapid progress. Among them, the convolutional neural network (cnn) has great potential in the field of image processing due to automatic feature extraction, strong nonlinear expression ability, and no manual intervention. In the field of image segmentation, the common CNN models include FCN [
1], PSP-Net [
2] , mask-RCNN [
3], U-Net [
4] and so on. Among them, the extremely lightweight model U-Net is probably the most widely used model in medical image segmentation. U-Net has a simple structure, but its segmentation result is good. Many researchers chose it as the baseline to design model for various medical image segmentation. Cicek [
5] extended the U-Net model to 3D image segmentation, and then applied it to brain tumor segmentation. Zhang [
6] replaced each sub-block of U-Net with a staggered block, and then applied it to retinal segmentation. Zahangir [
7] proposed R2U-Net that combined residual connections and cyclic connections to replace sub-blocks in U-Net. The improved model was verified on skin disease images and lung images. Oktay [
8] proposed Attention U-Net that set attention gating between skip and connections to highlight more useful deep semantic features to compensate for low-level semantic features. However, If the feature of the small target is eliminated from the deep semantic features through multiple convolutions, attention to gating will not work on the low-level features of the small target. The output of the attention gated is still directly combined with the deep feature map, without considering the semantic gap between the shallow feature map and the deep feature map. Zhou [
9] improved the skip-connection part and introduced deep supervision ideas based on U-Net. Huang [
10] changed the skip connections into a full-scale skip connections. Jha [
11] proposed Double U-Net structure. The first U-Net uses the pre-trained VGG-19 as the encoder, and the second U-Net uses ASPP to capture more information. It obtained good results on four different medical images. C. Guo [
12] proposed spatial attention u-net and applied it to blood vessel extraction. M. Z. [
13] applied U-net++ to the segmentation of brain tumors. Jieneng Chen [
14] proposed TransUNet, which adopts hybrid CNN-Transformer and combines skip-connection to achieve better performance in medical image segmentation. A. Lin [
15] proposes DS-TransUNet by using doubleSwin Transformer combined with U-shaped structure.
Unlike natural pictures, medical images usually have only one or two segmentation targets, and the proportion of target images is small. This kind of data is called category imbalance. If most of the training images are imbalanced images, the model may be able to learn the characteristics of small targets slowly. If there are both category imbalance and category balance images in training data, the model will tend to learn the characteristics of category balance. To obtain the global information of the big target, the network will be deepened. Multiple convolution operations will cause small target features to be lost. Since the shallow feature map contains the boundary information of the target and the global information of the small target, U-Net proposed a jump connection to combine the shallow feature maps from the encoding path and the deep feature maps of the same scale from the decoding path. However, simple jump connections cannot make full use of shallow feature maps. The deep semantic information is scattered after multiple upsampling operations. To solve these problems, we applied the Resnet based Squeeze-and-Excitation module ( SE-Res module) and attention module (A module ) into the U-Net, and proposed a new network model called SEA-Net. In SEA-Net, our main contributions are as follows:
-
We replaced the copy-skip path with the attention path in U-Net model. Unlike Attention U-Net [
8], this path is the only path that provides deep semantic information for the decoding. It combined deep semantics and shallow semantics to adjust the spatial information of the shallow feature map to highlight the target area, and provides more useful spatial semantic information for the decoding process.
-
We added a SE-Res path parallel to the above attention path in the U-Net. The SE-Res path adjusted the channel information weight of the shallow feature map to remove redundant channel information, and to provide more channel semantic information for the decoding process.
-
We proposed a hybrid function (cross entropy loss + Tversky loss) to handle the imbalance of data categories while ensuring that the model can still converge quickly.