A chickpea genetic variation map based on the sequencing of 3,366 genomes
image.png
s41586-021-04066-1.pdf
鹰嘴豆基因组重测序论文,涉及到了泛基因组。最近朋友圈好多人转发这个论文。就找到原文来看了看。论文里的Figure1a 基本上泛基因组的论文都会涉及到,正好论文提供了作图的原始数据,所以我们用原始数据尝试来模仿一下。
image.png
image.png
library(readxl)
df<-read_excel("41586_2021_4066_MOESM13_ESM.xlsx")
head(df)
table(df$Repeat)
这里有一个疑问是:这里为什么会出现重复呢?加入使用10个个体做测序,最终数据不是应该正好是10个吗?还要仔细看看论文
论文中的图实现了Y轴截断,这个用ggplot2来实现还不太好搞,之前Y叔推出了R包ggbreak来做。今天这篇推文暂时不尝试ggbreak这个R包。截断借助拼图实现。细节美化出图后借助其他软件来实现。
library(ggplot2)
ggplot()+
stat_summary(data=df,
aes(x=`Number of individuals`,
y=`Dispensable-genome`),
geom = "ribbon",
fun.data = "mean_cl_boot",
fun.args = list(conf.int=0.99))
image.png
包括填充颜色,构造一份用来表示图例的数据放到右下角
这里因为原始数据集太大,我只选取了一部分用来作图
df1<-df[1:2258*10,]
library(ggnewscale)
ggplot()+
stat_summary(data=df1,
aes(x=`Number of individuals`,
y=`Dispensable-genome`),
geom = "ribbon",
fill = "#20a1ac",
fun.data = "mean_cl_boot",
fun.args = list(conf.int=0.99))+
new_scale_fill()+
geom_text(data=df.legend,
aes(x=x,y=y,
label=label,
hjust=0))+
geom_point(data=df.legend,
aes(x=x-100,y=y,color=label),
shape=15,
size=4,
show.legend = F)+
scale_color_manual(values = c("#f0dc19",
"#20a1ac",
"#cd3322"))+
theme_minimal()+
theme(panel.grid = element_blank(),
axis.line = element_line())
image.png
ggplot()+
stat_summary(data=df1,
aes(x=`Number of individuals`,
y=`Core-genome`),
geom = "ribbon",
fill = "#20a1ac",
fun.data = "mean_cl_boot",
fun.args = list(conf.int=0.99))+
stat_summary(data=df1,
aes(x=`Number of individuals`,
y=`Pan-genome`),
geom = "ribbon",
fill = "#f0dc19",
fun.data = "mean_cl_boot",
fun.args = list(conf.int=0.99))+
theme_minimal()+
theme(panel.grid = element_blank(),
axis.line.y = element_line(),
axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.ticks.y = element_line())
拼图
library(aplot)
pdf(file = "p2.pdf",
width = 6,
height = 6,
family = "serif")
p1 %>% insert_top(p2)
dev.off()
出图后对细节进行了编辑
image.png