前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >GWAS多环境表型数据用BLUE值还是BLUP值?

GWAS多环境表型数据用BLUE值还是BLUP值?

作者头像
邓飞
发布2020-03-19 15:18:28
1.6K0
发布2020-03-19 15:18:28
举报

有朋友说,BLUP值有正有负,做GWAS时能不能用BLUP的绝对值?

我说:绝对不可以

BLUP为什么有正有负?

因为BLUP估算的是随机因子的效应值,随机因子的模型假定是:平均数为0,所以BLUP值之和应该是0或者接近于0,所以肯定有正有负的。

参考文献:

He S , Schulthess A W , Mirdita V , et al. Genomic selection in a commercial winter wheat population[J]. Theoretical and Applied Genetics, 2016, 129(3):641-651.

问题: 植物中, 多年多点, 或者一年多点的数据, 每个基因型有多个表型值, 问题来了, 如果做GWAS或者GS时, 需要一个基因型对应一个表型值, 那么这个表型值用什么呢?

目前有三个答案: 1, 平均值 2, BLUE值(最佳线性无偏估计, 固定因子) 3, BLUP值(最佳线性无偏预测, 随机因子)

论坛讨论: https://www.researchgate.net/post/Does_any_one_have_an_idea_of_which_one_BLUE_or_BLUP_to_use_for_a_GWAS_analysis_of_a_trait_in_wheat_eg_resistance_to_rust

Does any one have an idea of which one, BLUE or BLUP, to use for a GWAS analysis of a trait in wheat (e.g resistance to rust), I have 3 data sets of resistance evaluation from two locations generated in a field experiment of alpha-lattice design ( 2 replications of 300 materials and per each replication, 10 incomplete blocks containing 30 accessions). Two of the data sets are from the same location but different years; and the third one is a single year data from another location). So I am thinking of calculating the BLUE / BLUP of each location and a total one for the combined data to be used in the GWAS.

回答1:

I strongly recommend to use BLUE, as you are doing a two-stage analysis. BLUE will allow you to have an ‘adjusted mean’ for each genotype according to design effects, and maybe other covariates in your model. And this is what you want, a more precise mean.This translates into using your model effects (for example replicate fixed and incomplete blocks and plots random), but your genotype (or clone) effect FIXED. If you use BLUP then you are doing shrinkage of your genetic effects. This will mean that your genetic effects are moved towards the mean according to theri information, yes, they will be your best predictions of those random effects, but they will be adjusted by theri sample size and the vairance associated with theri data. This is what a random effects does, but the issue is that you eliminate part of the genetic signal if is random, and then often you will end up with more noise than what you want for your GWAS. Hence, you fit is as fixed, and then once you do your GWAS it will be a random effect, but it will not be double shrinkaged. Good luck

回答2:

Hi Sisay, I would recommend to consider all design effects as random (properties BLUP, it is not a effect) and the population structure and marker effects as fixed (properties BLUE, it is not a effect). There is no reason to perform separate analysis in each trial. It is always desirable one stage analysis (all model parameters are learned from the same likelihood). If you do this mentioned approach you would take the risk of have several false positive associations. If you are not confident to perform one-stage analysis., I would suggest you to analyze each trial individually, and considering all design effects as random, and genotypic effect as fixed. In the GWAS analysis modeling marker effects as fixed, and controlling for population structure either using the relationship matrix (including a genotypic random effect into the model), or modeling population structure as a fixed effect, or both. You can identify subpopulations using unsupervised or supervised clustering approach (PCA, structure, whatever). I hope to be helpful,

整体结论: 1, 如果能够使用one-stage, 即将一年多点和多年多点数据合并分析, 而不是先计算校正值, 然后再进行分析(那样是two-stage), 这样就不牵涉到使用什么值作为表型值了, 直接使用原始表型值!, 这是最好的. 2, 如果一定要进行two-stage, 即先计算校正值, 然后进行GS或者GWAS, 那么推荐使用品种的BLUE值, 而不是BLUP值. 因为在混合线性模型中, 随机因子会向均值收缩(shrinkage), 虽然结果是最佳预测, 但是校正值的方差变小, 当你做GWAS时, 不容易找到显著性位点, 增加了噪音(noise). 而且在GWAS或者GS中, 品种是作为随机因子, 如果你使用BLUP值, 相当于进行了两次收缩(shrinkage). 3, 因此, 比较好的方式是, 在one-stage中, 将地点, 年份, 区组作为随机因子, 将品种作为固定因子, 计算BLUE值.

文献1:

这篇文章是冬小麦GS的文章.

这里, 计算了小区遗传力, 残差除以每个基因型的调和平均数. 同时, 在模型中, 将品种作为固定因子, 计算了BLUE值.

这里, 使用了BLUE值, 而不是动物育种中的BLUP值.

估计了13个环境的遗传力, 计算了每一个环境的BLUE值.

在小麦育种中, 重点选择的是基因型值, 而不是育种值, 因此, 相比较BLUP值, BLUE值更适合.(这句话有点费解)

文献2: https://www.researchgate.net/publication/268118579_Genomic_Selection_for_End-Use_Quality_Traits_in_CIMMYT_Spring_Wheat

文献2中, 将LSmeans作为BLUE值.

那么, LSMeans, BLUE, BLUP值有什么区别呢?可以见我之前写的文章:GWAS分析中表型值是使用BLUE值还是BLUP值?, 还有另一篇: 混合线性模型中BLUE值 VS BLUP值.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-03-14,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 育种数据分析之放飞自我 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 有朋友说,BLUP值有正有负,做GWAS时能不能用BLUP的绝对值?
  • 参考文献:
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档