WBIR | DeepSTAPLE：UDA任务下学习多模态配准质量

机器学习炼丹术

发布于 2023-03-16 21:25:15

4310

文章被收录于专栏：机器学习炼丹术

image.png

introduction

❝To overcome these issues transferring existing annotations from a labeled source to the target domain is desirable.

想要实现这个label transfer需要两个步骤：

multiple sample annotations are transferred to target images via image registration。通过配准来将样本标注迁移到target images上
Secondly label fusion can be applied to build the label consensus。

这里的label fusion和label consensus需要后续理解一下。

method

Data Paremeters

image.png

上面的论文formulate the data parameter and curriculum learning approach as a modification altering the logits input of the loss function.

通过一个可学习的，logits-weighting，改进可以在不同的场景中展示，比方说有噪声的训练样本，或者不同类在训练过程被加权。我们的Data Parameters （DP）focus on per-sample parameters的。这里的STAPLE的策略需要参考下面的论文：

image.png

我们后续再来看上面的论文，应该是有某一种策略来计算出STAPLE。

这里先对计算出来的DP增加了一个sigmoid：

image.png

然后看到这个损失，就是segmentation的crossentropy的损失，这个DP作为每一个样本的权重：

image.png

Risk Regularisation

【为什么要用这个？】

❝Even when a foreground class is present in the image and a registered target label only contains background voxels, the network can achieve a zero-loss value by overfitting.
❝As a consequence, upweighting the over-fitted samples will be of no harm in terms of loss reduction which leads to the upweighting of maximal noisy (empty) samples.

因此加了一个risk regularisation来鼓励网络冒险：

image.png

\#\{f_{\theta}(X_b)=c\}

和

\#\{f_{\theta}(X_b)=\bar{c}\}

分别表示positive和negative predicted voxel count。

当预测更多的target voxels的时候，样本可以减少损失。比方说，被归类成clean sample。

这个公式是平衡的，因为如果预测是不正确的，那么预测更多正体素将会增加交叉熵比重。

这个我还不理解

Fixed wegihting scheme

❝We found that the parameters have a strong correlation with the ground-truth voxels present in their values. Applying a fixed compensation weighting to the data parameters can improve the correlation of the learned parameters and out target scores:

image.png

相当于对于DP做了一个矫正，因为发现DP是存在一定的偏差的。

\#\{y_b=c\}

donates the count of ground-truth voxels. sc

Out-of-line backpropagation process for improved stability

数据参数和模型参数存在inter-dependency，在预测不准确的早期时期会产生问题。

通过两步走的方法来解决：

先训练main model
再data parameters （out-of-line）

这样既可以保持稳定，又可以估计label noise。

【什么是out-of-line？】

❝When using the out-of-line, two-step approach data parameter optimization becomes a hypothesis of "waht would help the model optimizing right now?" without intervening.

consensus generation via weightd voting

image.png

这一段可能只有看了代码，才能更好的理解了。

Experiments

继续填坑。。。

Dataset

我们的实验选择了一个多模态分割任务，是CorssMoDa挑战赛的一部分。（这个挑战赛我之前搞过，就是多模态迁移的一个挑战赛）。

数据包含：

contrast-enhanced T1-weighted brain tumour MRI scans（384/448x384/448x80vox@0.5mmx0.5mmx1.0-1.5mm）
high-resolution T2-weighted images(512x5125x120vox@0.4x0.4x1.0-1.5mm)

我们还使用了TCIA dataset来提供omiited labels of the C rossModa challenge which served as orcle-labels.

预处理部分：

Prior to training isotropic resampling to 0.5mmx0.5mmx0.5mm was performed as well as cropping the data to 128x128x128vox around the tumour.
这里不禁产生了一个问题，是如何crop到tumour的周围128的呢？我猜测，是根据label的来crop的。相当于对入组数据做了一定的约束
We omitted the provided cochlea labels and train on binary masks of backgroun/tumour.我们忽视了提供的其他标签，只做二分类任务。
As the tumour is either contained on the right- or left size of the hemisphere, we flipped the right samples to provide pre-oriented training data and omit the data without tumour structures.大脑肿瘤要么在右侧和左侧，我们将在右侧的样本进行了反转，并且省略了没有肿瘤的样本。
For the 2D experiments we sliced the last data dimension.

Model and training settings

【2D segmentation】

For 2D segmentation, we employ a LR-ASPP MobileNetV3-Large model
AdamW优化器，0.0005 learning rate,batch=32,cosine annealing schedule with restart after 500 batch steps and multiplication factor of 2
For the data parameters, we use SparseAdam-optrimizer implementation

【3D segmentation】

For 3D experiments we use a custom 3D-MobileNet backbone with an adapted 3D-LR-ASPP head.
0.01 learning rate，batch=8，exponentially decayed scheduling with factor 0.99。

during training，我们没有做weight-clipping，weight decay of l2正则 on data parameters
parameters DP were initialized with a value of 0
For all experiments，we used spatial affine and bspline augmentation and random-noise-augmentation on image intensities。
prior to augmenting we upscaled the input image and labels to 256x256 px in 2D and 192x192x192 vox in 3D training。
数据被分成三分之一validation，三分之二training。
use global class weights

1/{n_{bins}^{0.35}}

Experiment I

2D model training，artificially disturbed ground-truth

Experiment II

2D model training quality-mixed registered single-atlas labels
use 30T1-weighted images as fixed targets and T2-weighted image 和 labels作为moving pairs。
配准使用了Convex ADam方法。
我们选择了两种配准质量来展示对训练的影响：
- best-quality registration：the single best registration with an average of around 80% orical-Dice across all atlas registrations
- Combined-quality:

Experiment III

3D 配准用了iterative deeds和Convex Adam。
这里意识到一个有趣的事情，假设我们的fixed data有40个，然后我们将moving data对fixed data做配准，每一个fixed data用10个moving data来配准，那么其实最终可以得到400个fixed data和label的pair。

image.png

上图展示了in-line 和out-of-line训练data parameters的区别，inline就会很差。这里除了Dice，还展示了Spearman-corr，这两幅图像也可以做这个spearman-corr的吗？

Experiment IV

Consensus generation and subsequent network training
通过两种配准方法，我们得到了两个sonsensi：
- 10 deed registered @ 40 fixed
- 10 Convex Adam registered @ 40 fixed
Consensi were built by applying the STAPLE algorithm as baseline and opposed to that our proposed weighted-sum method on data parameters
在这个基础上训练了几个nnUnet的模型来作为分割