Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (2024)

11institutetext: DAMO Academy, Alibaba Group 22institutetext: East China Normal University, Shanghai, China 33institutetext: The First Affiliated Hospital of College of Medicine, Zhejiang University, China 44institutetext: Hupan Lab, Hangzhou, China
44email: alisonbrielee@gmail.com

Zi Li 1Corresponding author. 1Corresponding author.44  Ying Chen33  Zeli Chen1144  Yanzhou Su1144  Tai Ma1122  
Tony C. W. Mok
1144
  Yan-Jie Zhou1144  Yunhao Bai1122  Zhinlin Zheng1144  Le Lu11  Yirui Wang11  Jia Ge33  Xianghua Ye33  Senxiang Yan33  Dakai Jin11

Abstract

In the radiation therapy of nasopharyngeal carcinoma (NPC), clinicians typically delineate the gross tumor volume (GTV) using non-contrast planning computed tomography to ensure accurate radiation dose delivery. However, the low contrast between tumors and adjacent normal tissues necessitates that radiation oncologists manually delineate the tumors, often relying on diagnostic MRI for guidance.In this study, we propose a novel approach to directly segment NPC gross tumors on non-contrast planning CT images, circumventing potential registration errors when aligning MRI or MRI-derived tumor masks to planning CT. To address the low contrast issues between tumors and adjacent normal structures in planning CT, we introduce a 3D Semantic Asymmetry Tumor segmentation (SATs) method.Specifically, we posit that a healthy nasopharyngeal region is characteristically bilaterally symmetric, whereas the emergence of nasopharyngeal carcinoma disrupts this symmetry.Then, we propose a Siamese contrastive learning segmentation framework that minimizes the voxel-wise distance between original and flipped areas without tumor and encourages a larger distance between original and flipped areas with tumor.Thus, our approach enhances the sensitivity of features to semantic asymmetries.Extensive experiments demonstrate that the proposed SATs achieves the leading NPC GTV segmentation performance in both internal and external testing, e.g., with at least 2% absolute Dice score improvement and 12% average distance error reduction when compared to other state-of-the-art methods in the external testing.

Keywords:

Nasopharyngeal Carcinoma Gross Tumor Volume Asymmetry-inspired Segmentation Deep learning Radiation Therapy.

1 Introduction

Nasopharyngeal carcinoma (NPC) ranks among the most prevalent malignancies affecting the nasopharyngeal region, with patient prognosis substantially enhanced through early diagnosis and intervention[9]. A significant proportion of NPC patients can achieve complete remission following radiation therapy (RT)[8].Notably, this type of cancer exhibits a remarkable sensitivity to radiation therapy planning, wherein a pivotal component of this therapeutic intervention is the accurate delineation of the gross tumor volume(GTV).In clinical practice, magnetic resonance imaging (MRI) has emerged as the predominant imaging modality, owing to its superior resolution in visualizing soft tissues. Subsequently, cross-modality registration is conducted between MRI and planning computed tomography (pCT) without injection of contrast to transfer the tumor delineations to pCT scans utilized in treatment planning[32]. However, cross-modality registration is non-trivial due to substantial modality gaps and variations in scanning ranges.Alternatively, they may integrate pCT and MRI mentally to aid in delineating the gross tumor volume. Nevertheless, this manner is time-consuming, often requiring 1-2 hours per case, and fraught with the potential for inaccuracies.

Recently, learning-based approaches have yielded promising outcomes in NPC tumor segmentation from MRI scans[21, 14, 16, 26, 20, 19, 27]. Nonetheless, this modality fails to provide direct measurements of electron density essential for radiotherapy planning. MRI-derived tumor masks necessitate spatial transformation to pCT via image registration, which may introduce alignment errors.Some approaches[29, 5] propose to tackle the automated segmentation of NPC GTV using CT and MRI. Notably, registration errors between paired CT and MRI can lead to multi-modal segmentation performance that is inferior to single-modal methods.Additionally, researches[31, 18, 30, 2] focus on segmenting the NPC gross tumor volume using contrast-enhanced CT scans. However, these approaches yield relatively low performance, e.g., Dice score falling below 70%percent\%%.This limitation is attributable to the invasive capability of NPC tumors into adjacent tissues and the suboptimal contrast of pCT, especially regarding soft tissues like mucous membranes, muscles, and nerves.

In this work, our goal is to segment NPC GTV in non-contrast pCT, which could avoid registration errors when aligning MRI or MRI-derived tumor masks to pCT. Directly segmenting NPC GTV in non-contrast pCT is challenging because the boundaries between the NPC tumor and adjacent soft tissues, such as membranes, muscles, and vessels, are extremely unclear in non-contrast pCT. To alleviate this issue, we propose a 3D semantic asymmetry tumor segmentation method based on the observation that a healthy nasopharyngeal region is typically bilaterally symmetric, but the presence of an NPC tumor disrupts this symmetry, as illustrated in Figure1.Specifically, to exploit the anatomical symmetry cue, we first wrap the pCT using automatically segmented head and neck organs so that the pCT is bilaterally symmetric along the central sagittal plane. This helps to reduce the influence of asymmetric anatomies caused by the patient’s head location and pose variation during the CT scanning. Then, we develop a Siamese contrastive learning segmentation framework based on conventional segmentation loss and an additional voxel-level margin loss. The margin loss is applied to deep features extracted from the original pCT scan and flipped pCT scan, which aims to minimize the voxel-wise distance between original and flipped areas without tumor, and encourage larger distance between original and flipped areas with tumor, making GTV features more sensitive to semantic asymmetries.To determine the asymmetrical nasopharyngeal areas, we present a tumor mask-based region selection approach.

Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (1)

The main contributions of this work are as follows:

2 Related Work

2.1 Learning-based GTV segmentation in NPC

Recently, learning-based approaches have made great progress in NPC GTV segmentation. The work in[31, 25] applies deep networks to segment tumors in contrast-enhanced CT.Chenet al. targets the segmentation of tumors by integrating cross-modality features of MRI and CT[6].NPCNet[19] is designed to segment both primary tumors and metastatic lymph nodes in 2D MRI samples.Researchers[35] investigate domain adaptation techniques and utilize limited annotated target data to enhance the performance of GTV segmentation in MRI.

Current approaches predominantly rely on MRI and/or contrast-enhanced CT for NPC GTV segmentation, as it has long been believed that identifying NPC GTV using non-contrast CT is an extremely challenging, if not impossible, task. To date, effective segmentation methods on non-contrast pCT remain elusive. However, pCT is mostly adopted in RT routine without contrast injection, and transforming MRI-derived tumor masks to pCT inevitably involves considerable alignment errors.

2.2 Symmetry in Medical Image Analysis

The human anatomy usually displays bilateral symmetry, as observed in structures like the brain, breasts, lungs, and pelvis. Leveraging this symmetry has been crucial in various medical image analyses.As for the brain, asymmetry in the shape of subcortical structures has been linked to Alzheimer’s disease, and researchers[22] have applied the shape analysis technique along with machine learning to quantify these asymmetries.Similarly, a siamese Faster R-CNN approach[23] has been proposed for breast cancer detection, which performs joint analysis of both breasts to detect masses in mammographic images and utilizes inherent bilateral symmetry to improve detection performance.Also, researchers[4] have explored exploiting semantic asymmetry for accurate Pelvic fracture detection in X-ray Images.These applications underscore the utility of symmetry-based approaches in medical image analysis, i.e., offering improved diagnostic accuracy and early detection capabilities by leveraging the symmetrical nature of anatomical structures.

3 Method

We propose a 3D semantic asymmetry tumor segmentation (SATs) method based on the semantic asymmetry property of the gross tumor in the nasopharyngeal area, to enable accurate NPC GTV segmentation, even with limited data.Given one CT, as shown in Figure2(a), we utilize a shared encoder-decoder module to process both the original image ID×H×W𝐼superscript𝐷𝐻𝑊I\in\mathbb{R}^{D\times H\times W}italic_I ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_H × italic_W end_POSTSUPERSCRIPT, where DHW𝐷𝐻𝑊DHWitalic_D italic_H italic_W are CT image spatial dimensions, and its flipped image Isuperscript𝐼I^{\prime}italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, thereby encoding them into a symmetric representation.Subsequently, we introduce a non-linear projection module and a distance metric learning strategy to refine the resulting feature maps. We intend to maximize the dissimilarity between E𝐸Eitalic_E and Efsubscript𝐸𝑓E_{f}italic_E start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT at corresponding anatomical locations on the abnormalities and normalities. The distance metric learning paradigm is illustrated in Figure2(b).

3.1 Asymmetrical Abnormal Region Selection

The focus of our SATs model is to pay attention to the asymmetric abnormal area. We introduce two methods, supervised label-based selection, and unsupervised registration-based selection, to detect asymmetric abnormal areas in the scenarios with semantic mask annotation and without annotation.Considering that image asymmetry may originate from pathological or non-pathological sources, such as changes in imaging angles and patient postures, we pre-process the CT scans in [34] to ensure that the scans are symmetric along the central sagittal axis. Specifically, we manually select a patient CT image with bilateral symmetry along the central sagittal plane, which serves as an atlas, and then align other patient CT images to the atlas space through affine registration. This step helps to alleviate the influence of other asymmetric anatomical structures in head & neck that may mislead the model.

Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (2)

The semantic segmentation mask of I𝐼Iitalic_I is denoted as s{0,1}D×H×W𝑠superscript01𝐷𝐻𝑊s\in\{0,1\}^{D\times H\times W}italic_s ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_D × italic_H × italic_W end_POSTSUPERSCRIPT, where 00 represents the background and 1111 represents the foreground of tumors. Through the flip operation, we can obtain the flipped semantic mask ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of Isuperscript𝐼I^{\prime}italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.Subsequently, an asymmetrical mask 𝒎𝒎\bm{m}bold_italic_m is defined to locate asymmetrical regions in the image I𝐼Iitalic_I, as

𝒎=𝒔𝒔𝒔𝒎𝒔𝒔superscript𝒔bold-′\bm{m}=\bm{s}-\bm{s}\cap\bm{s^{\prime}}bold_italic_m = bold_italic_s - bold_italic_s ∩ bold_italic_s start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT(1)

where 𝒎{0,1}D×H×W𝒎superscript01𝐷𝐻𝑊\bm{m}\in\{0,1\}^{D\times H\times W}bold_italic_m ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_D × italic_H × italic_W end_POSTSUPERSCRIPT. Note that 1111 and 00 represent the asymmetrical and symmetrical regions in I𝐼Iitalic_I, respectively.

3.2 Asymmetrical Learning Strategy

Our segmentation loss function is comprised of two components: a combination of Dice loss and entropy loss for the conventional segmentation purpose, and a voxel-wise margin loss specifically designed for asymmetric abnormal regions.

3.2.1 Metric-based margin loss.

We employ pixel-level margin loss, and in the asymmetric anomaly region, we aim to minimize the similarity between the features of any point and its corresponding point on the central sagittal axis.The margin loss between features EH×W×D×C𝐸superscript𝐻𝑊𝐷𝐶E\in\mathbb{R}^{H\times W\times D\times C}italic_E ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_D × italic_C end_POSTSUPERSCRIPT, where H𝐻Hitalic_H, W𝑊Witalic_W, and D𝐷Ditalic_D are spatial dimensions and C𝐶Citalic_C is the number of output features, and flipped Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT after a non-linear projection is as:

𝒍margin=i,j,kD,H,W[𝟏(m(i,j,k)=1)||E(i,j,k)E(i,j,k)||2+\displaystyle\bm{l}_{margin}={\textstyle\sum_{i,j,k}^{D,H,W}}[\mathbf{1}_{(m(i%,j,k)=1)}||E(i,j,k)-E^{\prime}(i,j,k)||^{2}+bold_italic_l start_POSTSUBSCRIPT italic_m italic_a italic_r italic_g italic_i italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D , italic_H , italic_W end_POSTSUPERSCRIPT [ bold_1 start_POSTSUBSCRIPT ( italic_m ( italic_i , italic_j , italic_k ) = 1 ) end_POSTSUBSCRIPT | | italic_E ( italic_i , italic_j , italic_k ) - italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_i , italic_j , italic_k ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT +(2)
𝟏(m(i,j,k)1)max(t||E(i,j,k)E(i,j,k)||2,0)]\displaystyle\mathbf{1}_{(m(i,j,k)\neq 1)}\max(t-||E(i,j,k)-E^{\prime}(i,j,k)|%|^{2},0)]bold_1 start_POSTSUBSCRIPT ( italic_m ( italic_i , italic_j , italic_k ) ≠ 1 ) end_POSTSUBSCRIPT roman_max ( italic_t - | | italic_E ( italic_i , italic_j , italic_k ) - italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_i , italic_j , italic_k ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 0 ) ]

where 𝟏1\mathbf{1}bold_1 is the indicator function, and t𝑡titalic_t defines a margin that regulates the degree of dissimilarity in semantic asymmetries.

3.2.2 Overall loss function.

We approach tumor segmentation as a binary segmentation task, utilizing both the Dice loss, binary cross-entropy loss, and contrastive learning loss as our objective functions. The overall loss function is formulated as:

l=ldice+lce+βlmargin,𝑙subscript𝑙𝑑𝑖𝑐𝑒subscript𝑙𝑐𝑒𝛽subscript𝑙𝑚𝑎𝑟𝑔𝑖𝑛l=l_{dice}+l_{ce}+\beta l_{margin},italic_l = italic_l start_POSTSUBSCRIPT italic_d italic_i italic_c italic_e end_POSTSUBSCRIPT + italic_l start_POSTSUBSCRIPT italic_c italic_e end_POSTSUBSCRIPT + italic_β italic_l start_POSTSUBSCRIPT italic_m italic_a italic_r italic_g italic_i italic_n end_POSTSUBSCRIPT ,(3)

where β𝛽\betaitalic_β is the weight balancing the different losses.

3.3 Siamese Segmentation Architecture

Our SATs architecture comprises the encoder-decoder module and the projection head. While both components are engaged during the training process, only the encoder-decoder module is required for inference.

3.3.1 Siamese encoder-decoder.

The core element is a shared U-shaped encoder-decoder architecture, as shown in Fig.2. The encoder employs repeated applications of 3D residual blocks, with each block comprising two convolutional layers featuring 3×3×33333\times 3\times 33 × 3 × 3 kernels. Each convolutional layer is succeeded by InstanceNorm normalization and LeakyReLU activation. For downsampling, a convolutional operation with a stride of 2222 is utilized to halve the resolution of the input feature maps. The initial number of filters is 32323232 and doubles after each downsampling step to maintain constant time complexity except for the last layer. In total, the encoder performs four downsampling operations.

3.3.2 Projection head.

We utilize a non-linear projection g𝑔gitalic_g to transform the features before calculating the distance in margin loss, which aims to enhance the quality of the learned features. It consists of three 1×1×11111\times 1\times 11 × 1 × 1 convolution layers with 16161616 channels followed by a unit-normalization layer. The first two layers in the projection head use the ReLU activation function. We hypothesize that directly applying metric learning to segmentation features might lead to information loss and diminish the model’s effectiveness. For example, some asymmetries in CT images are non-pathological and may stem from variations in the patient’s head positioning and posing, yet they are beneficial for segmentation. Utilizing a non-linear projection can filter out such irrelevant information from the metric learning process, ensuring it is preserved in the features used for segmentation.

3.3.3 Implementation.

During training, we employed the stochastic gradient descent algorithm[3] with a momentum of 0.990.990.990.99 as the optimizer.The model training is divided into two stages.In the first stage, only the Siamese encoder-decoder is trained for 800800800800 epochs with a learning rate of 1e21𝑒21e-21 italic_e - 2 and decayed via a polynomial schedule.In the second stage, the projection head is trained for 200200200200 epochs, with a learning rate of 1e21𝑒21e-21 italic_e - 2 for the projection head and 1e51𝑒51e-51 italic_e - 5 for the Siamese encoder-decoder, both with decayed via a polynomial schedule.The patch size is 56×192×1925619219256\times 192\times 19256 × 192 × 192 and the batch size is 2222.For the voxel-wise contrastive loss, we use a margin hyperparametert=20𝑡20t=20italic_t = 20.

4 Experiments

Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (3)

4.1 Data Preparation

We collected and curated an in-house dataset from the hospital for the segmentation model development, which consisted of 163 NPC patients with pCT, contrast-enhanced diagnostic CT, and diagnostic MRIs of T1 &\&& T2 phases.Diagnostic CT and MRI were registered as, initially, a rigid transformation[1] was applied to the MRI images to approximately align with the CT images, ensuring a similar anatomical positioning. Subsequently, the cross-modality deformable registration algorithm, DEEDs[13], was utilized to achieve precise local alignment. The contrast-enhanced CT and MRIs are used to guide radiation oncologists to generate ground-truth GTV in pCT.

Additionally, we collected and curated one publicly available dataset, SegRap2023111https://segrap2023.grand-challenge.org/dataset/, asexternal dataset, containing 118 no-contrast pCT and enhanced CT. Original tumor target labels in SegRap2023 are generally larger than the GTV guideline’s description. Hence, we curated their labels.Annotations of all datasets were examined and edited by two experienced radiation oncologists following the international GTV delineation consensus guideline[17].For evaluation, 20 %percent\%% of the in-house dataset was randomly selected as the internal testing set, and the curated SegRap2023 was used as the external testing dataset.As illustrated in Figure3, the asymmetric regions in the external data are larger compared to those of in-house, rendering the task more challenging.

4.2 Comparison Methods and Evaluation Metrics

We conducted a comprehensive comparison of our proposed method with nine cutting-edge approaches, encompassing prominent CNN-based, Transformer-based and Mamba-based methods, to evaluate its performance.CNN-based methods include TransUNet[7], STU-Net S[15], STU-Net B[15], STU-Net L[15], MedNeXt[33]Transformer-based methods include UNETR[11], SwinUNETR[10] and its variant SwinUNETR-v2[12].Mamba-based methods include UMambaBot[28].

Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (4)

MethodIn-housetrain \Rightarrow In-housetestPara. Count (M)
DSC \uparrowASD \downarrowHD95 \downarrow
UMambaBot79.27 ±plus-or-minus\pm± 7.771.17 ±plus-or-minus\pm± 0.774.66 ±plus-or-minus\pm± 3.9364.76
UNETR75.75 ±plus-or-minus\pm± 8.921.32 ±plus-or-minus\pm± 0.705.41 ±plus-or-minus\pm± 4.0793.01
TransUNet78.95 ±plus-or-minus\pm± 8.281.58 ±plus-or-minus\pm± 2.536.42 ±plus-or-minus\pm± 12.89119.37
SwinUNETR80.01 ±plus-or-minus\pm± 8.041.19 ±plus-or-minus\pm± 0.724.52 ±plus-or-minus\pm± 2.7762.19
SwinUNETR-V280.41 ±plus-or-minus\pm± 7.801.17 ±plus-or-minus\pm± 0.684.17 ±plus-or-minus\pm± 2.4072.89
MedNeXt76.15 ±plus-or-minus\pm± 9.831.44 ±plus-or-minus\pm± 0.885.09 ±plus-or-minus\pm± 3.9361.80
STU-Net S79.04 ±plus-or-minus\pm± 7.301.18 ±plus-or-minus\pm± 0.744.95 ±plus-or-minus\pm± 4.0814.60
STU-Net B78.86 ±plus-or-minus\pm± 7.381.20 ±plus-or-minus\pm± 0.734.91 ±plus-or-minus\pm± 3.9858.26
STU-Net L79.24 ±plus-or-minus\pm± 7.231.19 ±plus-or-minus\pm± 0.724.64 ±plus-or-minus\pm± 3.80440.30
SATs (Ours)81.22 ±plus-or-minus\pm± 8.331.14 ±plus-or-minus\pm± 0.844.02 ±plus-or-minus\pm± 2.7430.70

To maintain a fair comparison, we trained all competing models for an equal number of epochs, 1000100010001000.To avoid model collapse during training, we use different learning rates for different comparative methods. Typically, CNN-based methods use a larger learning rate, while transformer-based methods use a smaller one.For STU-Net S, STU-Net B, STU-Net L, SwinUNETR, SwinUNETR-v2, MedNeXt, and UMambaBot, the learning rate is set to 1e21𝑒21e-21 italic_e - 2, and stochastic gradient descent[3] with a momentum of 0.990.990.990.99 is employed as the optimizer.For TransUNet, the learning rate is set to 1e31𝑒31e-31 italic_e - 3, and stochastic gradient descent with a momentum of 0.990.990.990.99 is employed as the optimizer.For UNETR, the learning rate is set to 1e51𝑒51e-51 italic_e - 5, and the AdamW optimizer[24] is employed with betas of (0.9,0.999)0.90.999(0.9,0.999)( 0.9 , 0.999 ).

MethodIn-housetrain \Rightarrow ExternaltestExternaltrain \Rightarrow In-housetest
DSC \uparrowASD \downarrowHD95 \downarrowDSC \uparrowASD \downarrowHD95 \downarrow
UMambaBot63.08 ±plus-or-minus\pm± 12.023.37 ±plus-or-minus\pm± 2.289.22 ±plus-or-minus\pm± 7.52---
UNETR62.56 ±plus-or-minus\pm± 12.503.43 ±plus-or-minus\pm± 2.449.27 ±plus-or-minus\pm± 7.4666.84 ±plus-or-minus\pm± 12.571.56 ±plus-or-minus\pm± 0.676.14 ±plus-or-minus\pm± 3.25
TransUNet62.96 ±plus-or-minus\pm± 13.493.46 ±plus-or-minus\pm± 2.459.52 ±plus-or-minus\pm± 8.1669.71 ±plus-or-minus\pm± 12.662.09 ±plus-or-minus\pm± 2.566.92 ±plus-or-minus\pm± 11.01
SwinUNETR62.90 ±plus-or-minus\pm± 11.903.40 ±plus-or-minus\pm± 2.269.11 ±plus-or-minus\pm± 7.4168.07 ±plus-or-minus\pm± 12.751.60 ±plus-or-minus\pm± 0.886.00 ±plus-or-minus\pm± 3.72
SwinUNETR-V263.81 ±plus-or-minus\pm± 12.113.29 ±plus-or-minus\pm± 2.318.90 ±plus-or-minus\pm± 7.3269.14 ±plus-or-minus\pm± 12.441.65 ±plus-or-minus\pm± 1.106.03 ±plus-or-minus\pm± 3.90
MedNeXt64.77 ±plus-or-minus\pm± 12.053.21 ±plus-or-minus\pm± 2.269.01 ±plus-or-minus\pm± 7.5068.87 ±plus-or-minus\pm± 11.501.55 ±plus-or-minus\pm± 0.856.12 ±plus-or-minus\pm± 4.15
STU-Net S63.50 ±plus-or-minus\pm± 11.963.32 ±plus-or-minus\pm± 2.299.07 ±plus-or-minus\pm± 7.3367.62 ±plus-or-minus\pm± 10.291.63 ±plus-or-minus\pm± 0.956.85 ±plus-or-minus\pm± 4.99
STU-Net B63.54 ±plus-or-minus\pm± 12.053.32 ±plus-or-minus\pm± 2.309.14 ±plus-or-minus\pm± 7.4668.54 ±plus-or-minus\pm± 9.741.59 ±plus-or-minus\pm± 0.916.26 ±plus-or-minus\pm± 4.24
STU-Net L63.50 ±plus-or-minus\pm± 11.913.34 ±plus-or-minus\pm± 2.209.09 ±plus-or-minus\pm± 7.2567.99 ±plus-or-minus\pm± 9.711.59 ±plus-or-minus\pm± 0.876.47 ±plus-or-minus\pm± 4.15
SATs (Ours)66.80 ±plus-or-minus\pm± 12.022.84 ±plus-or-minus\pm± 2.168.51 ±plus-or-minus\pm± 7.8470.79 ±plus-or-minus\pm± 10.971.37 ±plus-or-minus\pm± 0.595.29 ±plus-or-minus\pm± 2.91

Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (5)

We evaluate the overlap of the segmentation maps using the Dice similarity coefficient, DSC (%percent\%%), and calculate the average surface distance, ASD (mm𝑚𝑚mmitalic_m italic_m), across all cases. Additionally, we measure the 95th percentile of the Hausdorff distance, HD95 (mm𝑚𝑚mmitalic_m italic_m), to gauge registration accuracy relative to the boundaries of the structures.Typically, a superior method should produce a higher DSC score and lower ASD and HD95 values.

4.3 Comparing to state-of-the-art methods

4.3.1 In-house Dataset Performance

To validate the GTV segmentation of NPC in non-contrast CT images, we conduct a comparative analysis of our SATs against SOTA segmentation methods using an in-house dataset. Table.1 summarizes the quantitative segmentation performance and their parameters. Under a relatively small number of parameters, the proposed method demonstrates a significant improvement over existing SOTA approaches. Specifically, when compared to the second-best method, the proposed approach achieves enhancements in the DSC, ASD, and HD95 of 1.0%, 2.6%, and 3.6%, respectively. Fig.4 presents the segmentation results of the top four performing methods on a sample from the in-house dataset. It can be observed that our SATs method exhibits higher accuracy in boundary segmentation (e.g., the nasal septum).

4.3.2 Cross-Domain Transfer Evaluation

To validate the generalization performance of our method across different domains, we conducted two experiments: first, we trained using the training set of an in-house dataset and validated on the test set of an external dataset; second, we trained using the training set of the external dataset and validated on the test set of the in-house dataset. The quantitative results are presented in Table.2. Note that, during our practical training, we found that the UMambaEnc model is prone to collapse when trained on external data, and we were unable to resolve the issue of non-convergent loss, so would not be presenting the results of the model trained on external data. From Fig.3, it is evident that the tumor GTV in the external dataset exhibits a greater proportion of asymmetry, making it more challenging to train and generalize compared to the in-house dataset. Our method has demonstrated exceptional performance in both experiments, particularly in the first one, thereby affirming the effectiveness of learning features associated with asymmetric tumors. The segmentation performance of various SOTA models on the external dataset is illustrated in Fig.5.

Proj. HeadMarg. LossDSC (%percent\%%)ASD (mm𝑚𝑚mmitalic_m italic_m)HD95 (mm𝑚𝑚mmitalic_m italic_m)
\usym2717\usym2717\usym{2717}2717\usym2717\usym2717\usym{2717}271763.44 ±plus-or-minus\pm± 10.542.97 ±plus-or-minus\pm± 1.377.22 ±plus-or-minus\pm± 3.34
\usym2717\usym2717\usym{2717}2717\checkmark61.50 ±plus-or-minus\pm± 10.023.20 ±plus-or-minus\pm± 1.397.73 ±plus-or-minus\pm± 3.58
\checkmark\checkmark66.32 ±plus-or-minus\pm± 10.482.60 ±plus-or-minus\pm± 1.366.58 ±plus-or-minus\pm± 3.50

Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (6)
Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (7)

4.4 Ablation Studies

4.4.1 Effect of projection head and margin loss on the segmentation model.

Table3 demonstrates performance metrics for different segmentation models on the external data (In-housetrain \Rightarrow Externaltest).Although the margin loss is append, the model has lower DSC and higher ASD/HD95 than that of the initial model, suggesting that margin loss alone is insufficient to improve performance.Moreover, there is a significant performance boost (+++4.98%percent\%% DSC, --0.60mm𝑚𝑚mmitalic_m italic_m ASD and --1.15mm𝑚𝑚mmitalic_m italic_m HD95) when the projection head module and margin loss are appended to the baseline.We can observe that the semantic asymmetry learning which consists of the projection head and margin loss, effectively improves tumor segmentation accuracy.

4.4.2 Effect of semantic asymmetry learning.

In Figs.7 and 7, we compare our method to the baseline without projection head module or/and margin loss baselines. Fig.7 shows that our method attains the highest Dice score and is consistently better than each baseline on most of the 117 test scans.Fig.7 further provides the visualization results of segmentation for baseline and ours.

5 Conclusion

We propose a novel semantic asymmetry learning method designed to leverage the inherent asymmetrical properties of tumors in the nasopharyngeal region, thereby enhancing the accuracy of nasopharyngeal carcinoma (NPC) gross tumor volume (GTV) segmentation. Our approach employs a Siamese segmentation network with a shared encoder-decoder architecture, which simultaneously processes original and flipped CT images. This is followed by a non-linear projection module and a distance metric learning component aimed at maximizing the disparity between abnormal and normal anatomical locations.Our method demonstrates a significant improvement in tumor segmentation for NPC by effectively utilizing semantic symmetry inherent in anatomical structures, achieving superior performance compared to nine state-of-the-art methodologies, as validated on both an internal test set and an independent external dataset. It can be potentially used in radiotherapy practice to standardize the NPC GTV delineation and reduce the workload and variation of radiation oncologists.

References

  • [1]Bai, X., Bai, F., Huo, X., Ge, J., Mok, T.C.W., Li, Z., Xu, M., Zhou, J., Lu, L., Jin, D., Ye, X., Lu, J., Yan, K.: Matching in the wild: Learning anatomical embeddings for multi-modality images. CoRR abs/2307.03535 (2023)
  • [2]Bai, X., Hu, Y., Gong, G., Yin, Y., Xia, Y.: A deep learning approach to segmentation of nasopharyngeal carcinoma using computed tomography. Biomedical Signal Processing and Control 64, 102246 (2021)
  • [3]Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers. pp. 177–186. Springer (2010)
  • [4]Chen, H., Wang, Y., Zheng, K., Li, W., Chang, C., Harrison, A.P., Xiao, J., Hager, G.D., Lu, L., Liao, C., Miao, S.: Anatomy-aware siamese network: Exploiting semantic asymmetry for accurate pelvic fracture detection in x-ray images. In: ECCV. vol. 12368, pp. 239–255 (2020)
  • [5]Chen, H., Qi, Y., Yin, Y., Li, T., Liu, X., Li, X., Gong, G., Wang, L.: Mmfnet: A multi-modality mri fusion network for segmentation of nasopharyngeal carcinoma. Neurocomputing 394, 27–40 (2020)
  • [6]Chen, H., Qi, Y., Yin, Y., Li, T., Liu, X., Li, X., Gong, G., Wang, L.: Mmfnet: A multi-modality MRI fusion network for segmentation of nasopharyngeal carcinoma. Neurocomputing 394, 27–40 (2020)
  • [7]Chen, J., Mei, J., Li, X., Lu, Y., Yu, Q., Wei, Q., Luo, X., Xie, Y., Adeli, E., Wang, Y., etal.: Transunet: Rethinking the u-net architecture design for medical image segmentation through the lens of transformers. Medical Image Analysis p. 103280 (2024)
  • [8]Chen, Y.P., Chan, A.T., Le, Q.T., Blanchard, P., Sun, Y., Ma, J.: Nasopharyngeal carcinoma. The Lancet 394(10192), 64–80 (2019)
  • [9]Chua, M.L., Wee, J.T., Hui, E.P., Chan, A.T.: Nasopharyngeal carcinoma. The Lancet 387(10022), 1012–1024 (2016)
  • [10]Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries - 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021. vol. 12962, pp. 272–284. Springer (2021)
  • [11]Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B.A., Roth, H.R., Xu, D.: UNETR: transformers for 3d medical image segmentation. In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV. pp. 1748–1758. IEEE (2022)
  • [12]He, Y., Nath, V., Yang, D., Tang, Y., Myronenko, A., Xu, D.: Swinunetr-v2: Stronger swin transformers with stagewise convolutions for 3d medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention - MICCAI. vol. 14223, pp. 416–426. Springer (2023)
  • [13]Heinrich, M.P., Jenkinson, M., Brady, S.M., Schnabel, J.A.: Globally optimal deformable registration on a minimum spanning tree using dense displacement sampling. In: Medical Image Computing and Computer-Assisted Intervention. pp. 115–122. Springer (2012)
  • [14]Huang, J.b., Zhuo, E., Li, H., Liu, L., Cai, H., Ou, Y.: Achieving accurate segmentation of nasopharyngeal carcinoma in mr images through recurrent attention. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part V 22. pp. 494–502. Springer (2019)
  • [15]Huang, Z., Wang, H., Deng, Z., Ye, J., Su, Y., Sun, H., He, J., Gu, Y., Gu, L., Zhang, S., Qiao, Y.: Stu-net: Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. CoRR abs/2304.06716 (2023)
  • [16]Ke, L., Deng, Y., Xia, W., Qiang, M., Chen, X., Liu, K., Jing, B., He, C., Xie, C., Guo, X., etal.: Development of a self-constrained 3d densenet model in automatic detection and segmentation of nasopharyngeal carcinoma using magnetic resonance images. Oral Oncology 110, 104862 (2020)
  • [17]Lee, A.W., Ng, W.T., Pan, J.J., Poh, S.S., Ahn, Y.C., AlHussain, H., Corry, J., Grau, C., Grégoire, V., Harrington, K.J., etal.: International guideline for the delineation of the clinical target volumes (ctv) for nasopharyngeal carcinoma. Radiotherapy and Oncology 126(1), 25–36 (2018)
  • [18]Li, S., Xiao, J., He, L., Peng, X., Yuan, X.: The tumor target segmentation of nasopharyngeal cancer in ct images based on deep learning methods. Technology in cancer research & treatment 18, 153–160 (2019)
  • [19]Li, Y., Dan, T., Li, H., Chen, J., Peng, H., Liu, L., Cai, H.: Npcnet: Jointly segment primary nasopharyngeal carcinoma tumors and metastatic lymph nodes in mr images. IEEE Transactions on Medical Imaging 41(7), 1639–1650 (2022)
  • [20]Liao, W., He, J., Luo, X., Wu, M., Shen, Y., Li, C., Xiao, J., Wang, G., Chen, N.: Automatic delineation of gross tumor volume based on magnetic resonance imaging by performing a novel semisupervised learning framework in nasopharyngeal carcinoma. International Journal of Radiation Oncology* Biology* Physics 113(4), 893–902 (2022)
  • [21]Lin, L., Dou, Q., Jin, Y.M., Zhou, G.Q., Tang, Y.Q., Chen, W.L., Su, B.A., Liu, F., Tao, C.J., Jiang, N., etal.: Deep learning for automated contouring of primary tumor volumes by mri for nasopharyngeal carcinoma. Radiology 291(3), 677–686 (2019)
  • [22]Liu, C.F., Padhy, S., Ramachandran, S., Wang, V.X., Efimov, A., Bernal, A., Shi, L., Vaillant, M., Ratnanather, J.T., Faria, A.V., etal.: Using deep siamese neural networks for detection of brain asymmetries associated with alzheimer’s disease and mild cognitive impairment. Magnetic resonance imaging 64, 190–199 (2019)
  • [23]Liu, Y., Zhou, Z., Zhang, S., Luo, L., Zhang, Q., Zhang, F., Li, X., Wang, Y., Yu, Y.: From unilateral to bilateral learning: Detecting mammogram masses with contrasted bilateral network. In: Medical Image Computing and Computer Assisted Intervention - MICCAI. vol. 11769, pp. 477–485. Springer (2019)
  • [24]Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net (2019)
  • [25]Luo, X., Fu, J., Zhong, Y., Liu, S., Han, B., Astaraki, M., Bendazzoli, S., Toma-Dasu, I., Ye, Y., Chen, Z., etal.: Segrap2023: A benchmark of organs-at-risk and gross tumor volume segmentation for radiotherapy planning of nasopharyngeal carcinoma. arXiv preprint arXiv:2312.09576 (2023)
  • [26]Luo, X., Liao, W., Chen, J., Song, T., Chen, Y., Zhang, S., Chen, N., Wang, G., Zhang, S.: Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. pp. 318–329. Springer (2021)
  • [27]Luo, X., Liao, W., He, Y., Tang, F., Wu, M., Shen, Y., Huang, H., Song, T., Li, K., Zhang, S., etal.: Deep learning-based accurate delineation of primary gross tumor volume of nasopharyngeal carcinoma on heterogeneous magnetic resonance imaging: a large-scale and multi-center study. Radiotherapy and Oncology p. 109480 (2023)
  • [28]Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomedical image segmentation. CoRR abs/2401.04722 (2024)
  • [29]Ma, Z., Zhou, S., Wu, X., Zhang, H., Yan, W., Sun, S., Zhou, J.: Nasopharyngeal carcinoma segmentation based on enhanced convolutional neural networks using multi-modal metric learning. Physics in Medicine & Biology 64(2), 025005 (2019)
  • [30]Mei, H., Lei, W., Gu, R., Ye, S., Sun, Z., Zhang, S., Wang, G.: Automatic segmentation of gross target volume of nasopharynx cancer using ensemble of multiscale deep neural networks with spatial attention. Neurocomputing 438, 211–222 (2021)
  • [31]Men, K., Chen, X., Zhang, Y., Zhang, T., Dai, J., Yi, J., Li, Y.: Deep deconvolutional neural network for target segmentation of nasopharyngeal cancer in planning computed tomography images. Frontiers in oncology 7, 315 (2017)
  • [32]Razek, A.A.K.A., King, A.: Mri and ct of nasopharyngeal carcinoma. American Journal of Roentgenology 198(1), 11–18 (2012)
  • [33]Roy, S., Köhler, G., Ulrich, C., Baumgartner, M., Petersen, J., Isensee, F., Jäger, P.F., Maier-Hein, K.H.: Mednext: Transformer-driven scaling of convnets for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention - MICCAI. vol. 14223, pp. 405–415. Springer (2023)
  • [34]Tian, L., Li, Z., Liu, F., Bai, X., Ge, J., Lu, L., Niethammer, M., Ye, X., Yan, K., Jin, D.: Same++: A self-supervised anatomical embeddings enhanced medical image registration framework using stable sampling and regularized transformation. ArXiv abs/2311.14986 (2023)
  • [35]Wang, H., Chen, J., Zhang, S., He, Y., Xu, J., Wu, M., He, J., Liao, W., Luo, X.: Dual-reference source-free active domain adaptation for nasopharyngeal carcinoma tumor segmentation across multiple hospitals. CoRR abs/2309.13401 (2023)
Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Barbera Armstrong

Last Updated:

Views: 5913

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.