之前写过一篇博客, 介绍领导安利我哔哩哔哩的故事, 介绍到我将我从YouTube上收集的关于混合线性模型, 关于GWAS, 关于GS, 关于农业数据分析相关的视频, 上传到了哔哩哔哩上面. 今天我们看一下介绍多年多点遗传力及BLUP值计算的视频内容. 阅读原文可以查看视频, 这里我用文字和代码进行重演.
预览数据 数据包括品种(Line), 重复(Rep), 年份(Year), 地点(Loc), 收获日期(Harvest), 产量(Yield), Brix, PH, TA这三个也是观测值, 主要是品种性状.
需要分析的观测值: Yield, Brix, pH, TA 需要考虑的因子: Line, Rep, Year, Loc
> head(dat)
Line Rep Year Loc Harvest Yield Brix pH TA
1 SCT_0001 1 2009 OH 9/7/13 NA 5.70 4.42 0.39
2 SCT_0001 1 2009 CA 9/23/10 38789.80 4.75 4.47 0.34
3 SCT_0001 1 2010 CA <NA> NA 4.34 4.44 0.29
4 SCT_0001 2 2009 OH 9/14/13 NA 5.40 4.48 0.38
5 SCT_0001 2 2009 CA 9/14/10 NA 4.84 4.67 0.25
6 SCT_0001 2 2010 OH 9/30/14 93832.21 5.40 4.58 0.31
查看Brix的直方图
hist(dat$Brix,col="gold")
查看Brix在不同地点的箱线图:
boxplot(dat$Brix~dat$Loc, xlab="Location", ylab="Degrees Brix", main="Degrees Brix by Location", col="pink")
这里建模之前, 需要对数据进行转化, 将需要考虑的因素变为因子(Factor), 将需要分析的性状变为数值(number)
> str(dat)
'data.frame': 986 obs. of 9 variables:
$ Line : Factor w/ 143 levels "SCT_0001","SCT_0002",..: 1 1 1 1 1 1 1 2 2 2 ...
$ Rep : int 1 1 1 2 2 2 2 1 1 1 ...
$ Year : int 2009 2009 2010 2009 2009 2010 2010 2009 2009 2010 ...
$ Loc : Factor w/ 2 levels "CA","OH": 2 1 1 2 1 2 1 2 1 1 ...
$ Harvest: Factor w/ 26 levels "10/10/10","10/2/10",..: 25 19 NA 11 10 24 NA 25 9 NA ...
$ Yield : num NA 38790 NA NA NA ...
$ Brix : num 5.7 4.75 4.34 5.4 4.84 5.4 5.16 5.5 5.65 4.54 ...
$ pH : num 4.42 4.47 4.44 4.48 4.67 4.58 4.47 4.44 4.47 4.52 ...
$ TA : num 0.39 0.34 0.29 0.38 0.25 0.31 0.29 0.44 0.33 0.23 ...
> for(i in 1:5) dat[,i] = as.factor(dat[,i])
> str(dat)
'data.frame': 986 obs. of 9 variables:
$ Line : Factor w/ 143 levels "SCT_0001","SCT_0002",..: 1 1 1 1 1 1 1 2 2 2 ...
$ Rep : Factor w/ 2 levels "1","2": 1 1 1 2 2 2 2 1 1 1 ...
$ Year : Factor w/ 2 levels "2009","2010": 1 1 2 1 1 2 2 1 1 2 ...
$ Loc : Factor w/ 2 levels "CA","OH": 2 1 1 2 1 2 1 2 1 1 ...
$ Harvest: Factor w/ 26 levels "10/10/10","10/2/10",..: 25 19 NA 11 10 24 NA 25 9 NA ...
$ Yield : num NA 38790 NA NA NA ...
$ Brix : num 5.7 4.75 4.34 5.4 4.84 5.4 5.16 5.5 5.65 4.54 ...
$ pH : num 4.42 4.47 4.44 4.48 4.67 4.58 4.47 4.44 4.47 4.52 ...
$ TA : num 0.39 0.34 0.29 0.38 0.25 0.31 0.29 0.44 0.33 0.23 ...
可以看到, 转化之后, 前五列变为了Factor, 后四列为num
这里使用混合线性模型, 使用R包lme4
观测值: Brix 固定因子: 无 随机因子: Line + Loc + Year + Loc.Year.Rep + Line:Loc + Line:Year + Line:Year:Loc
library(lme4)
library(lmerTest)
mod1 = lmer(Brix ~ (1|Line) + (1|Loc) + (1|Year) +
(1|Line:Loc) + (1|Line:Year) , data=dat)
summary(mod1)
结果:
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: Brix ~ (1 | Line) + (1 | Loc) + (1 | Year) + (1 | Line:Loc) + (1 | Line:Year)
Data: dat
REML criterion at convergence: 1742.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.0258 -0.6133 -0.0533 0.5293 5.0653
Random effects:
Groups Name Variance Std.Dev.
Line:Year (Intercept) 0.010209 0.10104
Line:Loc (Intercept) 0.009483 0.09738
Line (Intercept) 0.191933 0.43810
Year (Intercept) 0.031648 0.17790
Loc (Intercept) 0.188486 0.43415
Residual 0.257820 0.50776
Number of obs: 967, groups: Line:Year, 284; Line:Loc, 281; Line, 143; Year, 2; Loc, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.2817 0.3343 1.3583 15.8 0.0167 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
这里 Vg 是Line的方差组分: 0.191933 VGL 是品种与地点的交互方差组分: 0.009483 VGY 为品种与年份的交互方差组分: 0.010209 Ve 为残差方差组分: 0.257820
这里, 品种与地点以及年份互作, 没有考虑, 因此不做计算.
地点为2, 年份为2, 重复为2
$$ h^2 = \frac{ 0.191933}{ 0.191933 + 0.009483/2 + 0.010209/2 +0.257820/8 } $$
> vg = 0.191933
> vgl = 0.009483
> ve = 0.257820
> # 遗传力
> vg = 0.191933
> vgl = 0.009483
> vgy = 0.010209
> ve = 0.257820
> h2 = vg/(vg + vgl/2 + vgy/2 + ve/8)
> h2
[1] 0.8202037
> brixblup = ranef(mod1)$Line
> head(brixblup)
(Intercept)
SCT_0001 -0.13628589
SCT_0002 -0.14944411
SCT_0003 -0.05347823
SCT_0004 -0.03878384
SCT_0005 1.13568896
SCT_0006 0.49028454
> write.csv(brixblup,"brixblup.csv")
可以看出, BLUP值和平均值趋势基本一致, 但是有个别品种, BLUP值和平均值变化较大.
mm = as.data.frame(tapply(dat$Brix, dat$Line, na.rm=T, mean))
names(mm) = "mean"
plot(brixblup$blup,mm$mean)
这篇无疑是开山之作, 但是也有一些不足:
这里没有考虑: 品种:年份:地点, 残差的分母应该是222 = 8 , 而不是4.
# asrmel的解决方案
library(asreml)
library(learnasreml)
head(dat)
mod2 = asreml(Brix ~ 1 , random = ~Loc + Year + Line+Line:Year + Line:Loc, data=dat)
summary(mod2)$varcomp
pin(mod2, h2 ~ V3/(V3+V4/2+V5/2 + V6/8))
> mod2 = asreml(Brix ~ 1 , random = ~Loc + Year + Line+Line:Year + Line:Loc, data=dat)
ASReml: Tue May 14 21:03:02 2019
LogLik S2 DF wall cpu
-27.8710 0.3114 966 21:03:02 0.0
-8.1437 0.2968 966 21:03:02 0.0
8.3477 0.2802 966 21:03:02 0.0
16.0150 0.2642 966 21:03:02 0.0
16.6323 0.2585 966 21:03:02 0.0
16.6563 0.2579 966 21:03:02 0.0
16.6571 0.2578 966 21:03:02 0.0
16.6571 0.2578 966 21:03:02 0.0
Finished on: Tue May 14 21:03:02 2019
LogLikelihood Converged
> summary(mod2)$varcomp
gamma component std.error z.ratio constraint
Loc!Loc.var 0.73097982 0.188461143 0.26741513 0.7047512 Positive
Year!Year.var 0.12272634 0.031641291 0.04564871 0.6931475 Positive
Line!Line.var 0.74444586 0.191932955 0.02948641 6.5092010 Positive
Line:Year!Line.var 0.03960966 0.010212159 0.01139269 0.8963779 Positive
Line:Loc!Line.var 0.03677090 0.009480269 0.01125209 0.8425343 Positive
R!variance 1.00000000 0.257819899 0.01557455 16.5539268 Positive
> pin(mod2, h2 ~ V3/(V3+V4/2+V5/2 + V6/8))
Estimate SE
h2 0.820203 0.0394735
更简洁, 更友好, 更方便.
公众号回复: blup
下载文档(PPT), 数据, 代码: