44email: alisonbrielee@gmail.com
Zi Li 1Corresponding author. 1Corresponding author.44 Ying Chen33 Zeli Chen1144 Yanzhou Su1144 Tai Ma1122
Tony C. W. Mok1144 Yan-Jie Zhou1144 Yunhao Bai1122 Zhinlin Zheng1144 Le Lu11 Yirui Wang11 Jia Ge33 Xianghua Ye33 Senxiang Yan33 Dakai Jin11
Abstract
In the radiation therapy of nasopharyngeal carcinoma (NPC), clinicians typically delineate the gross tumor volume (GTV) using non-contrast planning computed tomography to ensure accurate radiation dose delivery. However, the low contrast between tumors and adjacent normal tissues necessitates that radiation oncologists manually delineate the tumors, often relying on diagnostic MRI for guidance.In this study, we propose a novel approach to directly segment NPC gross tumors on non-contrast planning CT images, circumventing potential registration errors when aligning MRI or MRI-derived tumor masks to planning CT. To address the low contrast issues between tumors and adjacent normal structures in planning CT, we introduce a 3D Semantic Asymmetry Tumor segmentation (SATs) method.Specifically, we posit that a healthy nasopharyngeal region is characteristically bilaterally symmetric, whereas the emergence of nasopharyngeal carcinoma disrupts this symmetry.Then, we propose a Siamese contrastive learning segmentation framework that minimizes the voxel-wise distance between original and flipped areas without tumor and encourages a larger distance between original and flipped areas with tumor.Thus, our approach enhances the sensitivity of features to semantic asymmetries.Extensive experiments demonstrate that the proposed SATs achieves the leading NPC GTV segmentation performance in both internal and external testing, e.g., with at least 2% absolute Dice score improvement and 12% average distance error reduction when compared to other state-of-the-art methods in the external testing.
Keywords:
Nasopharyngeal Carcinoma Gross Tumor Volume Asymmetry-inspired Segmentation Deep learning Radiation Therapy.
1 Introduction
Nasopharyngeal carcinoma (NPC) ranks among the most prevalent malignancies affecting the nasopharyngeal region, with patient prognosis substantially enhanced through early diagnosis and intervention[9]. A significant proportion of NPC patients can achieve complete remission following radiation therapy (RT)[8].Notably, this type of cancer exhibits a remarkable sensitivity to radiation therapy planning, wherein a pivotal component of this therapeutic intervention is the accurate delineation of the gross tumor volume(GTV).In clinical practice, magnetic resonance imaging (MRI) has emerged as the predominant imaging modality, owing to its superior resolution in visualizing soft tissues. Subsequently, cross-modality registration is conducted between MRI and planning computed tomography (pCT) without injection of contrast to transfer the tumor delineations to pCT scans utilized in treatment planning[32]. However, cross-modality registration is non-trivial due to substantial modality gaps and variations in scanning ranges.Alternatively, they may integrate pCT and MRI mentally to aid in delineating the gross tumor volume. Nevertheless, this manner is time-consuming, often requiring 1-2 hours per case, and fraught with the potential for inaccuracies.
Recently, learning-based approaches have yielded promising outcomes in NPC tumor segmentation from MRI scans[21, 14, 16, 26, 20, 19, 27]. Nonetheless, this modality fails to provide direct measurements of electron density essential for radiotherapy planning. MRI-derived tumor masks necessitate spatial transformation to pCT via image registration, which may introduce alignment errors.Some approaches[29, 5] propose to tackle the automated segmentation of NPC GTV using CT and MRI. Notably, registration errors between paired CT and MRI can lead to multi-modal segmentation performance that is inferior to single-modal methods.Additionally, researches[31, 18, 30, 2] focus on segmenting the NPC gross tumor volume using contrast-enhanced CT scans. However, these approaches yield relatively low performance, e.g., Dice score falling below 70.This limitation is attributable to the invasive capability of NPC tumors into adjacent tissues and the suboptimal contrast of pCT, especially regarding soft tissues like mucous membranes, muscles, and nerves.
In this work, our goal is to segment NPC GTV in non-contrast pCT, which could avoid registration errors when aligning MRI or MRI-derived tumor masks to pCT. Directly segmenting NPC GTV in non-contrast pCT is challenging because the boundaries between the NPC tumor and adjacent soft tissues, such as membranes, muscles, and vessels, are extremely unclear in non-contrast pCT. To alleviate this issue, we propose a 3D semantic asymmetry tumor segmentation method based on the observation that a healthy nasopharyngeal region is typically bilaterally symmetric, but the presence of an NPC tumor disrupts this symmetry, as illustrated in Figure1.Specifically, to exploit the anatomical symmetry cue, we first wrap the pCT using automatically segmented head and neck organs so that the pCT is bilaterally symmetric along the central sagittal plane. This helps to reduce the influence of asymmetric anatomies caused by the patient’s head location and pose variation during the CT scanning. Then, we develop a Siamese contrastive learning segmentation framework based on conventional segmentation loss and an additional voxel-level margin loss. The margin loss is applied to deep features extracted from the original pCT scan and flipped pCT scan, which aims to minimize the voxel-wise distance between original and flipped areas without tumor, and encourage larger distance between original and flipped areas with tumor, making GTV features more sensitive to semantic asymmetries.To determine the asymmetrical nasopharyngeal areas, we present a tumor mask-based region selection approach.
The main contributions of this work are as follows:
- •
We introduce a 3D semantic asymmetry tumor segmentation method for NPC GTV in non-contrast pCT, which is the most commonly and desirable imaging modality in RT planning. To the best of our knowledge, this is the first work to tackle the NPC GTV segmentation in non-contrast CT scans and employ the symmetry cue for the GTV segmentation.
- •
We develop a Siamese contrastive learning segmentation framework with an asymmetrical region selection approach, respectively, which facilitates the learning of asymmetric tumor features effectively.
- •
We demonstrate that our proposed SATs achieves state-of-the-art performance in NPC GTV segmentation, outperforming existing methods in both our internal testing set and an independent external dataset.
2 Related Work
2.1 Learning-based GTV segmentation in NPC
Recently, learning-based approaches have made great progress in NPC GTV segmentation. The work in[31, 25] applies deep networks to segment tumors in contrast-enhanced CT.Chenet al. targets the segmentation of tumors by integrating cross-modality features of MRI and CT[6].NPCNet[19] is designed to segment both primary tumors and metastatic lymph nodes in 2D MRI samples.Researchers[35] investigate domain adaptation techniques and utilize limited annotated target data to enhance the performance of GTV segmentation in MRI.
Current approaches predominantly rely on MRI and/or contrast-enhanced CT for NPC GTV segmentation, as it has long been believed that identifying NPC GTV using non-contrast CT is an extremely challenging, if not impossible, task. To date, effective segmentation methods on non-contrast pCT remain elusive. However, pCT is mostly adopted in RT routine without contrast injection, and transforming MRI-derived tumor masks to pCT inevitably involves considerable alignment errors.
2.2 Symmetry in Medical Image Analysis
The human anatomy usually displays bilateral symmetry, as observed in structures like the brain, breasts, lungs, and pelvis. Leveraging this symmetry has been crucial in various medical image analyses.As for the brain, asymmetry in the shape of subcortical structures has been linked to Alzheimer’s disease, and researchers[22] have applied the shape analysis technique along with machine learning to quantify these asymmetries.Similarly, a siamese Faster R-CNN approach[23] has been proposed for breast cancer detection, which performs joint analysis of both breasts to detect masses in mammographic images and utilizes inherent bilateral symmetry to improve detection performance.Also, researchers[4] have explored exploiting semantic asymmetry for accurate Pelvic fracture detection in X-ray Images.These applications underscore the utility of symmetry-based approaches in medical image analysis, i.e., offering improved diagnostic accuracy and early detection capabilities by leveraging the symmetrical nature of anatomical structures.
3 Method
We propose a 3D semantic asymmetry tumor segmentation (SATs) method based on the semantic asymmetry property of the gross tumor in the nasopharyngeal area, to enable accurate NPC GTV segmentation, even with limited data.Given one CT, as shown in Figure2(a), we utilize a shared encoder-decoder module to process both the original image , where are CT image spatial dimensions, and its flipped image , thereby encoding them into a symmetric representation.Subsequently, we introduce a non-linear projection module and a distance metric learning strategy to refine the resulting feature maps. We intend to maximize the dissimilarity between and at corresponding anatomical locations on the abnormalities and normalities. The distance metric learning paradigm is illustrated in Figure2(b).
3.1 Asymmetrical Abnormal Region Selection
The focus of our SATs model is to pay attention to the asymmetric abnormal area. We introduce two methods, supervised label-based selection, and unsupervised registration-based selection, to detect asymmetric abnormal areas in the scenarios with semantic mask annotation and without annotation.Considering that image asymmetry may originate from pathological or non-pathological sources, such as changes in imaging angles and patient postures, we pre-process the CT scans in [34] to ensure that the scans are symmetric along the central sagittal axis. Specifically, we manually select a patient CT image with bilateral symmetry along the central sagittal plane, which serves as an atlas, and then align other patient CT images to the atlas space through affine registration. This step helps to alleviate the influence of other asymmetric anatomical structures in head & neck that may mislead the model.
The semantic segmentation mask of is denoted as , where represents the background and represents the foreground of tumors. Through the flip operation, we can obtain the flipped semantic mask of .Subsequently, an asymmetrical mask is defined to locate asymmetrical regions in the image , as
(1) |
where . Note that and represent the asymmetrical and symmetrical regions in , respectively.
3.2 Asymmetrical Learning Strategy
Our segmentation loss function is comprised of two components: a combination of Dice loss and entropy loss for the conventional segmentation purpose, and a voxel-wise margin loss specifically designed for asymmetric abnormal regions.
3.2.1 Metric-based margin loss.
We employ pixel-level margin loss, and in the asymmetric anomaly region, we aim to minimize the similarity between the features of any point and its corresponding point on the central sagittal axis.The margin loss between features , where , , and are spatial dimensions and is the number of output features, and flipped after a non-linear projection is as:
(2) | |||
where is the indicator function, and defines a margin that regulates the degree of dissimilarity in semantic asymmetries.
3.2.2 Overall loss function.
We approach tumor segmentation as a binary segmentation task, utilizing both the Dice loss, binary cross-entropy loss, and contrastive learning loss as our objective functions. The overall loss function is formulated as:
(3) |
where is the weight balancing the different losses.
3.3 Siamese Segmentation Architecture
Our SATs architecture comprises the encoder-decoder module and the projection head. While both components are engaged during the training process, only the encoder-decoder module is required for inference.
3.3.1 Siamese encoder-decoder.
The core element is a shared U-shaped encoder-decoder architecture, as shown in Fig.2. The encoder employs repeated applications of 3D residual blocks, with each block comprising two convolutional layers featuring kernels. Each convolutional layer is succeeded by InstanceNorm normalization and LeakyReLU activation. For downsampling, a convolutional operation with a stride of is utilized to halve the resolution of the input feature maps. The initial number of filters is and doubles after each downsampling step to maintain constant time complexity except for the last layer. In total, the encoder performs four downsampling operations.
3.3.2 Projection head.
We utilize a non-linear projection to transform the features before calculating the distance in margin loss, which aims to enhance the quality of the learned features. It consists of three convolution layers with channels followed by a unit-normalization layer. The first two layers in the projection head use the ReLU activation function. We hypothesize that directly applying metric learning to segmentation features might lead to information loss and diminish the model’s effectiveness. For example, some asymmetries in CT images are non-pathological and may stem from variations in the patient’s head positioning and posing, yet they are beneficial for segmentation. Utilizing a non-linear projection can filter out such irrelevant information from the metric learning process, ensuring it is preserved in the features used for segmentation.
3.3.3 Implementation.
During training, we employed the stochastic gradient descent algorithm[3] with a momentum of as the optimizer.The model training is divided into two stages.In the first stage, only the Siamese encoder-decoder is trained for epochs with a learning rate of and decayed via a polynomial schedule.In the second stage, the projection head is trained for epochs, with a learning rate of for the projection head and for the Siamese encoder-decoder, both with decayed via a polynomial schedule.The patch size is and the batch size is .For the voxel-wise contrastive loss, we use a margin hyperparameter.
4 Experiments
4.1 Data Preparation
We collected and curated an in-house dataset from the hospital for the segmentation model development, which consisted of 163 NPC patients with pCT, contrast-enhanced diagnostic CT, and diagnostic MRIs of T1 T2 phases.Diagnostic CT and MRI were registered as, initially, a rigid transformation[1] was applied to the MRI images to approximately align with the CT images, ensuring a similar anatomical positioning. Subsequently, the cross-modality deformable registration algorithm, DEEDs[13], was utilized to achieve precise local alignment. The contrast-enhanced CT and MRIs are used to guide radiation oncologists to generate ground-truth GTV in pCT.
Additionally, we collected and curated one publicly available dataset, SegRap2023111https://segrap2023.grand-challenge.org/dataset/, asexternal dataset, containing 118 no-contrast pCT and enhanced CT. Original tumor target labels in SegRap2023 are generally larger than the GTV guideline’s description. Hence, we curated their labels.Annotations of all datasets were examined and edited by two experienced radiation oncologists following the international GTV delineation consensus guideline[17].For evaluation, 20 of the in-house dataset was randomly selected as the internal testing set, and the curated SegRap2023 was used as the external testing dataset.As illustrated in Figure3, the asymmetric regions in the external data are larger compared to those of in-house, rendering the task more challenging.
4.2 Comparison Methods and Evaluation Metrics
We conducted a comprehensive comparison of our proposed method with nine cutting-edge approaches, encompassing prominent CNN-based, Transformer-based and Mamba-based methods, to evaluate its performance.CNN-based methods include TransUNet[7], STU-Net S[15], STU-Net B[15], STU-Net L[15], MedNeXt[33]Transformer-based methods include UNETR[11], SwinUNETR[10] and its variant SwinUNETR-v2[12].Mamba-based methods include UMambaBot[28].
Method In-housetrain In-housetest Para. Count (M) DSC ASD HD95 UMambaBot 79.27 7.77 1.17 0.77 4.66 3.93 64.76 UNETR 75.75 8.92 1.32 0.70 5.41 4.07 93.01 TransUNet 78.95 8.28 1.58 2.53 6.42 12.89 119.37 SwinUNETR 80.01 8.04 1.19 0.72 4.52 2.77 62.19 SwinUNETR-V2 80.41 7.80 1.17 0.68 4.17 2.40 72.89 MedNeXt 76.15 9.83 1.44 0.88 5.09 3.93 61.80 STU-Net S 79.04 7.30 1.18 0.74 4.95 4.08 14.60 STU-Net B 78.86 7.38 1.20 0.73 4.91 3.98 58.26 STU-Net L 79.24 7.23 1.19 0.72 4.64 3.80 440.30 SATs (Ours) 81.22 8.33 1.14 0.84 4.02 2.74 30.70
To maintain a fair comparison, we trained all competing models for an equal number of epochs, .To avoid model collapse during training, we use different learning rates for different comparative methods. Typically, CNN-based methods use a larger learning rate, while transformer-based methods use a smaller one.For STU-Net S, STU-Net B, STU-Net L, SwinUNETR, SwinUNETR-v2, MedNeXt, and UMambaBot, the learning rate is set to , and stochastic gradient descent[3] with a momentum of is employed as the optimizer.For TransUNet, the learning rate is set to , and stochastic gradient descent with a momentum of is employed as the optimizer.For UNETR, the learning rate is set to , and the AdamW optimizer[24] is employed with betas of .
Method In-housetrain Externaltest Externaltrain In-housetest DSC ASD HD95 DSC ASD HD95 UMambaBot 63.08 12.02 3.37 2.28 9.22 7.52 - - - UNETR 62.56 12.50 3.43 2.44 9.27 7.46 66.84 12.57 1.56 0.67 6.14 3.25 TransUNet 62.96 13.49 3.46 2.45 9.52 8.16 69.71 12.66 2.09 2.56 6.92 11.01 SwinUNETR 62.90 11.90 3.40 2.26 9.11 7.41 68.07 12.75 1.60 0.88 6.00 3.72 SwinUNETR-V2 63.81 12.11 3.29 2.31 8.90 7.32 69.14 12.44 1.65 1.10 6.03 3.90 MedNeXt 64.77 12.05 3.21 2.26 9.01 7.50 68.87 11.50 1.55 0.85 6.12 4.15 STU-Net S 63.50 11.96 3.32 2.29 9.07 7.33 67.62 10.29 1.63 0.95 6.85 4.99 STU-Net B 63.54 12.05 3.32 2.30 9.14 7.46 68.54 9.74 1.59 0.91 6.26 4.24 STU-Net L 63.50 11.91 3.34 2.20 9.09 7.25 67.99 9.71 1.59 0.87 6.47 4.15 SATs (Ours) 66.80 12.02 2.84 2.16 8.51 7.84 70.79 10.97 1.37 0.59 5.29 2.91
We evaluate the overlap of the segmentation maps using the Dice similarity coefficient, DSC (), and calculate the average surface distance, ASD (), across all cases. Additionally, we measure the 95th percentile of the Hausdorff distance, HD95 (), to gauge registration accuracy relative to the boundaries of the structures.Typically, a superior method should produce a higher DSC score and lower ASD and HD95 values.
4.3 Comparing to state-of-the-art methods
4.3.1 In-house Dataset Performance
To validate the GTV segmentation of NPC in non-contrast CT images, we conduct a comparative analysis of our SATs against SOTA segmentation methods using an in-house dataset. Table.1 summarizes the quantitative segmentation performance and their parameters. Under a relatively small number of parameters, the proposed method demonstrates a significant improvement over existing SOTA approaches. Specifically, when compared to the second-best method, the proposed approach achieves enhancements in the DSC, ASD, and HD95 of 1.0%, 2.6%, and 3.6%, respectively. Fig.4 presents the segmentation results of the top four performing methods on a sample from the in-house dataset. It can be observed that our SATs method exhibits higher accuracy in boundary segmentation (e.g., the nasal septum).
4.3.2 Cross-Domain Transfer Evaluation
To validate the generalization performance of our method across different domains, we conducted two experiments: first, we trained using the training set of an in-house dataset and validated on the test set of an external dataset; second, we trained using the training set of the external dataset and validated on the test set of the in-house dataset. The quantitative results are presented in Table.2. Note that, during our practical training, we found that the UMambaEnc model is prone to collapse when trained on external data, and we were unable to resolve the issue of non-convergent loss, so would not be presenting the results of the model trained on external data. From Fig.3, it is evident that the tumor GTV in the external dataset exhibits a greater proportion of asymmetry, making it more challenging to train and generalize compared to the in-house dataset. Our method has demonstrated exceptional performance in both experiments, particularly in the first one, thereby affirming the effectiveness of learning features associated with asymmetric tumors. The segmentation performance of various SOTA models on the external dataset is illustrated in Fig.5.
Proj. Head Marg. Loss DSC () ASD () HD95 () 63.44 10.54 2.97 1.37 7.22 3.34 61.50 10.02 3.20 1.39 7.73 3.58 66.32 10.48 2.60 1.36 6.58 3.50
4.4 Ablation Studies
4.4.1 Effect of projection head and margin loss on the segmentation model.
Table3 demonstrates performance metrics for different segmentation models on the external data (In-housetrain Externaltest).Although the margin loss is append, the model has lower DSC and higher ASD/HD95 than that of the initial model, suggesting that margin loss alone is insufficient to improve performance.Moreover, there is a significant performance boost (4.98 DSC, 0.60 ASD and 1.15 HD95) when the projection head module and margin loss are appended to the baseline.We can observe that the semantic asymmetry learning which consists of the projection head and margin loss, effectively improves tumor segmentation accuracy.
4.4.2 Effect of semantic asymmetry learning.
In Figs.7 and 7, we compare our method to the baseline without projection head module or/and margin loss baselines. Fig.7 shows that our method attains the highest Dice score and is consistently better than each baseline on most of the 117 test scans.Fig.7 further provides the visualization results of segmentation for baseline and ours.
5 Conclusion
We propose a novel semantic asymmetry learning method designed to leverage the inherent asymmetrical properties of tumors in the nasopharyngeal region, thereby enhancing the accuracy of nasopharyngeal carcinoma (NPC) gross tumor volume (GTV) segmentation. Our approach employs a Siamese segmentation network with a shared encoder-decoder architecture, which simultaneously processes original and flipped CT images. This is followed by a non-linear projection module and a distance metric learning component aimed at maximizing the disparity between abnormal and normal anatomical locations.Our method demonstrates a significant improvement in tumor segmentation for NPC by effectively utilizing semantic symmetry inherent in anatomical structures, achieving superior performance compared to nine state-of-the-art methodologies, as validated on both an internal test set and an independent external dataset. It can be potentially used in radiotherapy practice to standardize the NPC GTV delineation and reduce the workload and variation of radiation oncologists.
References
- [1]Bai, X., Bai, F., Huo, X., Ge, J., Mok, T.C.W., Li, Z., Xu, M., Zhou, J., Lu, L., Jin, D., Ye, X., Lu, J., Yan, K.: Matching in the wild: Learning anatomical embeddings for multi-modality images. CoRR abs/2307.03535 (2023)
- [2]Bai, X., Hu, Y., Gong, G., Yin, Y., Xia, Y.: A deep learning approach to segmentation of nasopharyngeal carcinoma using computed tomography. Biomedical Signal Processing and Control 64, 102246 (2021)
- [3]Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers. pp. 177–186. Springer (2010)
- [4]Chen, H., Wang, Y., Zheng, K., Li, W., Chang, C., Harrison, A.P., Xiao, J., Hager, G.D., Lu, L., Liao, C., Miao, S.: Anatomy-aware siamese network: Exploiting semantic asymmetry for accurate pelvic fracture detection in x-ray images. In: ECCV. vol. 12368, pp. 239–255 (2020)
- [5]Chen, H., Qi, Y., Yin, Y., Li, T., Liu, X., Li, X., Gong, G., Wang, L.: Mmfnet: A multi-modality mri fusion network for segmentation of nasopharyngeal carcinoma. Neurocomputing 394, 27–40 (2020)
- [6]Chen, H., Qi, Y., Yin, Y., Li, T., Liu, X., Li, X., Gong, G., Wang, L.: Mmfnet: A multi-modality MRI fusion network for segmentation of nasopharyngeal carcinoma. Neurocomputing 394, 27–40 (2020)
- [7]Chen, J., Mei, J., Li, X., Lu, Y., Yu, Q., Wei, Q., Luo, X., Xie, Y., Adeli, E., Wang, Y., etal.: Transunet: Rethinking the u-net architecture design for medical image segmentation through the lens of transformers. Medical Image Analysis p. 103280 (2024)
- [8]Chen, Y.P., Chan, A.T., Le, Q.T., Blanchard, P., Sun, Y., Ma, J.: Nasopharyngeal carcinoma. The Lancet 394(10192), 64–80 (2019)
- [9]Chua, M.L., Wee, J.T., Hui, E.P., Chan, A.T.: Nasopharyngeal carcinoma. The Lancet 387(10022), 1012–1024 (2016)
- [10]Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries - 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021. vol. 12962, pp. 272–284. Springer (2021)
- [11]Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B.A., Roth, H.R., Xu, D.: UNETR: transformers for 3d medical image segmentation. In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV. pp. 1748–1758. IEEE (2022)
- [12]He, Y., Nath, V., Yang, D., Tang, Y., Myronenko, A., Xu, D.: Swinunetr-v2: Stronger swin transformers with stagewise convolutions for 3d medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention - MICCAI. vol. 14223, pp. 416–426. Springer (2023)
- [13]Heinrich, M.P., Jenkinson, M., Brady, S.M., Schnabel, J.A.: Globally optimal deformable registration on a minimum spanning tree using dense displacement sampling. In: Medical Image Computing and Computer-Assisted Intervention. pp. 115–122. Springer (2012)
- [14]Huang, J.b., Zhuo, E., Li, H., Liu, L., Cai, H., Ou, Y.: Achieving accurate segmentation of nasopharyngeal carcinoma in mr images through recurrent attention. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part V 22. pp. 494–502. Springer (2019)
- [15]Huang, Z., Wang, H., Deng, Z., Ye, J., Su, Y., Sun, H., He, J., Gu, Y., Gu, L., Zhang, S., Qiao, Y.: Stu-net: Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. CoRR abs/2304.06716 (2023)
- [16]Ke, L., Deng, Y., Xia, W., Qiang, M., Chen, X., Liu, K., Jing, B., He, C., Xie, C., Guo, X., etal.: Development of a self-constrained 3d densenet model in automatic detection and segmentation of nasopharyngeal carcinoma using magnetic resonance images. Oral Oncology 110, 104862 (2020)
- [17]Lee, A.W., Ng, W.T., Pan, J.J., Poh, S.S., Ahn, Y.C., AlHussain, H., Corry, J., Grau, C., Grégoire, V., Harrington, K.J., etal.: International guideline for the delineation of the clinical target volumes (ctv) for nasopharyngeal carcinoma. Radiotherapy and Oncology 126(1), 25–36 (2018)
- [18]Li, S., Xiao, J., He, L., Peng, X., Yuan, X.: The tumor target segmentation of nasopharyngeal cancer in ct images based on deep learning methods. Technology in cancer research & treatment 18, 153–160 (2019)
- [19]Li, Y., Dan, T., Li, H., Chen, J., Peng, H., Liu, L., Cai, H.: Npcnet: Jointly segment primary nasopharyngeal carcinoma tumors and metastatic lymph nodes in mr images. IEEE Transactions on Medical Imaging 41(7), 1639–1650 (2022)
- [20]Liao, W., He, J., Luo, X., Wu, M., Shen, Y., Li, C., Xiao, J., Wang, G., Chen, N.: Automatic delineation of gross tumor volume based on magnetic resonance imaging by performing a novel semisupervised learning framework in nasopharyngeal carcinoma. International Journal of Radiation Oncology* Biology* Physics 113(4), 893–902 (2022)
- [21]Lin, L., Dou, Q., Jin, Y.M., Zhou, G.Q., Tang, Y.Q., Chen, W.L., Su, B.A., Liu, F., Tao, C.J., Jiang, N., etal.: Deep learning for automated contouring of primary tumor volumes by mri for nasopharyngeal carcinoma. Radiology 291(3), 677–686 (2019)
- [22]Liu, C.F., Padhy, S., Ramachandran, S., Wang, V.X., Efimov, A., Bernal, A., Shi, L., Vaillant, M., Ratnanather, J.T., Faria, A.V., etal.: Using deep siamese neural networks for detection of brain asymmetries associated with alzheimer’s disease and mild cognitive impairment. Magnetic resonance imaging 64, 190–199 (2019)
- [23]Liu, Y., Zhou, Z., Zhang, S., Luo, L., Zhang, Q., Zhang, F., Li, X., Wang, Y., Yu, Y.: From unilateral to bilateral learning: Detecting mammogram masses with contrasted bilateral network. In: Medical Image Computing and Computer Assisted Intervention - MICCAI. vol. 11769, pp. 477–485. Springer (2019)
- [24]Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net (2019)
- [25]Luo, X., Fu, J., Zhong, Y., Liu, S., Han, B., Astaraki, M., Bendazzoli, S., Toma-Dasu, I., Ye, Y., Chen, Z., etal.: Segrap2023: A benchmark of organs-at-risk and gross tumor volume segmentation for radiotherapy planning of nasopharyngeal carcinoma. arXiv preprint arXiv:2312.09576 (2023)
- [26]Luo, X., Liao, W., Chen, J., Song, T., Chen, Y., Zhang, S., Chen, N., Wang, G., Zhang, S.: Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. pp. 318–329. Springer (2021)
- [27]Luo, X., Liao, W., He, Y., Tang, F., Wu, M., Shen, Y., Huang, H., Song, T., Li, K., Zhang, S., etal.: Deep learning-based accurate delineation of primary gross tumor volume of nasopharyngeal carcinoma on heterogeneous magnetic resonance imaging: a large-scale and multi-center study. Radiotherapy and Oncology p. 109480 (2023)
- [28]Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomedical image segmentation. CoRR abs/2401.04722 (2024)
- [29]Ma, Z., Zhou, S., Wu, X., Zhang, H., Yan, W., Sun, S., Zhou, J.: Nasopharyngeal carcinoma segmentation based on enhanced convolutional neural networks using multi-modal metric learning. Physics in Medicine & Biology 64(2), 025005 (2019)
- [30]Mei, H., Lei, W., Gu, R., Ye, S., Sun, Z., Zhang, S., Wang, G.: Automatic segmentation of gross target volume of nasopharynx cancer using ensemble of multiscale deep neural networks with spatial attention. Neurocomputing 438, 211–222 (2021)
- [31]Men, K., Chen, X., Zhang, Y., Zhang, T., Dai, J., Yi, J., Li, Y.: Deep deconvolutional neural network for target segmentation of nasopharyngeal cancer in planning computed tomography images. Frontiers in oncology 7, 315 (2017)
- [32]Razek, A.A.K.A., King, A.: Mri and ct of nasopharyngeal carcinoma. American Journal of Roentgenology 198(1), 11–18 (2012)
- [33]Roy, S., Köhler, G., Ulrich, C., Baumgartner, M., Petersen, J., Isensee, F., Jäger, P.F., Maier-Hein, K.H.: Mednext: Transformer-driven scaling of convnets for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention - MICCAI. vol. 14223, pp. 405–415. Springer (2023)
- [34]Tian, L., Li, Z., Liu, F., Bai, X., Ge, J., Lu, L., Niethammer, M., Ye, X., Yan, K., Jin, D.: Same++: A self-supervised anatomical embeddings enhanced medical image registration framework using stable sampling and regularized transformation. ArXiv abs/2311.14986 (2023)
- [35]Wang, H., Chen, J., Zhang, S., He, Y., Xu, J., Wu, M., He, J., Liao, W., Luo, X.: Dual-reference source-free active domain adaptation for nasopharyngeal carcinoma tumor segmentation across multiple hospitals. CoRR abs/2309.13401 (2023)