前些天看到一篇临床研究的文献,发表于2017年 Breast Cancer Res Treat期刊的Clinical and molecular relevance of mutant-allele tumor heterogeneity in breast cancer,主要讲了使用Mutant-allele tumor heterogeneity(MATH)算法评估肿瘤异质性,并研究了其与一些临床指标以及组学数据的相关性,思路很简单,效果比较一般,并没有较大的突破,但是其MATH的算法还是值得看看的!
MATH算法最早可追溯到发表于2013年Oral Oncol期刊的MATH, a novel measure of intratumor genetic heterogeneity, is high in poor-outcome classes of head and neck squamous cell carcinoma文章。后来该作者在Cancer上发表了一篇关于头颈部鳞状细胞癌的文章High intratumor genetic heterogeneity is related to worse outcome in patients with head and neck squamous cell carcinoma,并再次说明了MATH的有效性,高MATH的病人与低整体存活率有关等等
然后结合一篇国外的博文MATH and Tumors,大致上理解MATH的原理,整体上还是比较简单的。
先说说什么是肿瘤异质性,虽然肿瘤异质性可分为肿瘤间异质性和肿瘤内异质性,但是不做特别说明,我们默认为肿瘤异质性就是指肿瘤内异质性(Intra-tumor heterogeneity (ITH)),随着癌细胞的不断生长,其分裂后的子代细胞呈现出与同代细胞或者父细胞的不同,从而使得其各个方面有了较大的差异,最终导致肿瘤的生长、侵染、预后等指标的差异。最近几年对于肿瘤异质性的研究小结可以粗略的看下【盘点】浅谈肿瘤异质性
针对肿瘤异质性这种情况,2013年那篇作者想通过MATH指标来看看是否高肿瘤异质性的病人是否与较差的预后有关联。上述两篇的整体思路是先计算每个病人的MATH值,然后根据MATH值将病人分为低、中、高三大类,然后分别评估这三组病人的MATH值与临床指标的相关性以及突变等组学数据的关联。
所以我们需要知道MATH值是怎么计算的,先看下Cancer文献的原文:
The MATH value for each tumor was based on the distribution of mutant-allele fractions among tumor-specific mutated loci, calculated as the percentage ratio of the width (median absolute deviation, MAD, scaled by a constant factor so that the expected MAD of a sample from a normal distribution equals the standard deviation) to the center (median) of its distribution:MATH=100 * MAD/median
再看下上述2017年文献中的描述:
the steps to determine the MATH value can be summarized as follows: (1) calculating the mutant-allele fraction (MAF) for each locus as the ratio of mutant reads to total reads; (2) obtaining the absolute differences of each MAF from the median MAF value, multiplying the median of these absolute differences by a factor of 1.4826, thus the median absolute deviation (MAD) was generated; (3) calculating MATH as the percentage ratio of the MAD to the median of the MAFs among the tumor’s mutated genomic loci, presented as MATH = 100 * MAD/median.
以及2013年较早的那篇
Each tumor’s MATH value was calculated from the median absolute deviation (MAD) and the median of its mutant-allele fractions at tumor-specific mutated loci:MATH=100 * MAD/median. Calculation of MAD followed the default in R, with values scaled by a constant factor (1.4826) so that the expected MAD of a sample from a normal distribution equals the standard deviation.
MATH的意义,作者认为MATH能有效的代表肿瘤特异性特变位点的MAF值的分布的偏差,相当于说明MAF偏离该样本的MAF整体分布的程度(有点标准差的意思),当然是MATH值越大,说明肿瘤异质性越高!
在bioconductor的maftools这个R包里面可以很方便的计算 MATH值哦,一般人我不告诉他的!
https://bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/maftools.html
发表于2016年的NC,The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes 做的是乳腺癌人群队列突变研究, 就使用了MATH算法来探索肿瘤异质性。而且很明显可以看到,ER阳性和阴性的乳腺癌患者的 MATH值分布不一样。而且作者可以把 MATH值用来给病人分组,这样就可以给病人做KM生存分析并且很明显看到在ER阳性的病人里面 MATH指标是跟生存显著相关的,但是在ER阴性病人却并非如此!!!