❝Even when a foreground class is present in the image and a registered target label only contains background voxels, the network can achieve a zero-loss value by overfitting.
❝As a consequence, upweighting the over-fitted samples will be of no harm in terms of loss reduction which leads to the upweighting of maximal noisy (empty) samples.
❝We found that the parameters have a strong correlation with the ground-truth voxels present in their values. Applying a fixed compensation weighting to the data parameters can improve the correlation of the learned parameters and out target scores:
image.png
相当于对于DP做了一个矫正,因为发现DP是存在一定的偏差的。
donates the count of ground-truth voxels. sc
Out-of-line backpropagation process for improved stability
数据参数和模型参数存在inter-dependency,在预测不准确的早期时期会产生问题。
通过两步走的方法来解决:
先训练main model
再data parameters (out-of-line)
这样既可以保持稳定,又可以估计label noise。
【什么是out-of-line?】
❝When using the out-of-line, two-step approach data parameter optimization becomes a hypothesis of "waht would help the model optimizing right now?" without intervening.
We omitted the provided cochlea labels and train on binary masks of backgroun/tumour.我们忽视了提供的其他标签,只做二分类任务。
As the tumour is either contained on the right- or left size of the hemisphere, we flipped the right samples to provide pre-oriented training data and omit the data without tumour structures.大脑肿瘤要么在右侧和左侧,我们将在右侧的样本进行了反转,并且省略了没有肿瘤的样本。
For the 2D experiments we sliced the last data dimension.
Model and training settings
【2D segmentation】
For 2D segmentation, we employ a LR-ASPP MobileNetV3-Large model
AdamW优化器,0.0005 learning rate,batch=32,cosine annealing schedule with restart after 500 batch steps and multiplication factor of 2
For the data parameters, we use SparseAdam-optrimizer implementation
【3D segmentation】
For 3D experiments we use a custom 3D-MobileNet backbone with an adapted 3D-LR-ASPP head.
0.01 learning rate,batch=8,exponentially decayed scheduling with factor 0.99。
during training,我们没有做weight-clipping,weight decay of l2正则 on data parameters
parameters DP were initialized with a value of 0
For all experiments,we used spatial affine and bspline augmentation and random-noise-augmentation on image intensities。
prior to augmenting we upscaled the input image and labels to 256x256 px in 2D and 192x192x192 vox in 3D training。
数据被分成三分之一validation,三分之二training。
use global class weights
Experiment I
2D model training,artificially disturbed ground-truth
Experiment II
2D model training quality-mixed registered single-atlas labels
use 30T1-weighted images as fixed targets and T2-weighted image 和 labels作为moving pairs。
配准使用了Convex ADam方法。
我们选择了两种配准质量来展示对训练的影响:
best-quality registration:the single best registration with an average of around 80% orical-Dice across all atlas registrations