Graph pangenome captures missing heritability and empowers tomato breeding
https://www.nature.com/articles/s41586-022-04808-9
西红柿Nature.pdf
论文的代码链接
https://github.com/YaoZhou89/TGG/tree/main/5.Genetic_analysis
论文中做了eQTL的相关分析,转录组数据处理部分的方法写到
Genes with TPM values of >0.5 were defined as expressed. Only genes expressed in at least 100 accessions were retained for the downstream analyses. Raw expression levels were normalized with quantile–quantile normalization. To remove potential batch effects and confounding factors affecting gene expression, the probabilistic estimation of expression residuals method was applied with the top four factors as covariates.
论文里提供了 quantile–quantile normalization 的代码
https://github.com/YaoZhou89/TGG/blob/main/5.Genetic_analysis/scripts/prepare_gene_expression.R
quantile_normalisation <- function(df){
df_rank <- apply(df,2,rank,ties.method="min")
df_sorted <- data.frame(apply(df, 2, sort))
df_mean <- apply(df_sorted, 1, mean)
index_to_mean <- function(my_index, my_mean){
return(my_mean[my_index])
}
df_final <- apply(df_rank, 2, index_to_mean, my_mean=df_mean)
rownames(df_final) <- rownames(df)
return(df_final)
}
这个代码是什么意思暂时还没看明白,还得查查
然后是peer这个R包
https://github.com/PMBio/peer 这个R包的主页,但是按照这个主页的安装方法试了一下没有成功,主要是编译的时候需要复制文件到usr目录下,我没有权限,不知道怎么更改编译的目录。主页上提供的R包下载链接好像失效了,
找到了一个简书链接 https://www.jianshu.com/p/3b613cafafe8
提供了一个编译好的R包的下载链接
下载下来我试了一下,只能在linux系统下使用,R语言版本需要用3.几,直接用conda安装一个3.6版本的R,使用是没有问题的
加载的时候直接指定R包的路径
代码
library(peer)
expr = read.csv('examples/data/expression.csv', header=FALSE)
model = PEER()
PEER_setPhenoMean(model,as.matrix(expr))
PEER_setNk(model,20)
PEER_getNk(model)
PEER_update(model)
factors = PEER_getX(model)
weights = PEER_getW(model)
precision = PEER_getAlpha(model)
residuals = PEER_getResiduals(model)
PEER_setAdd_mean(model, TRUE)
write.table(residuals,"./peer_residuals.tsv",quote=F,row.names=F,sep="\t",col.names=F)
write.table(factors,"./peer_covariates.tsv",quote=F,row.names=F,sep="\t",col.names=F)
PEER_update(model) 这一行的代码会运行比较长的时间
整体是啥意思暂时还没看明白,代码能够跑通
还有一篇水稻的泛基因组论文也写到了这个方法
论文
A super pan-genomic landscape of rice
水稻200rice super-pangenome CR.pdf
方法部分的内容
Genes with a mean FPKM value larger than 0.1 were used in the downstream analysis and 23,736 genes met this condition. To obtain a normal distribution of expression values for each gene, FPKM values of each gene were further normalized using the quantile-quantile normalization (qqnorm) function in R (version 3.1.2). The top 20 hidden and confounding factors in the expression data, the normal quantile transformed expression values were inferred using the probabilistic estimation of expression residuals (PEER) method
image.png
欢迎大家关注我的公众号