GENESPACE: syntenic pan-genome annotations for eukaryotes
https://www.biorxiv.org/content/10.1101/2022.03.09.483468v1
还没有发表
https://github.com/jtlovell/GENESPACE
https://htmlpreview.github.io/?https://github.com/jtlovell/GENESPACE/blob/master/doc/genespaceOverview.html
windows系统还不能用 只能在MacOS或者在Linux系统下使用,我试试在linux下使用
conda install -c bioconda orthofinder
https://github.com/wyp1125/MCScanX
git clone https://github.com/wyp1125/MCScanX.git
cd MCScanX
make
image.png
这里出现了三个error,但是也出现了三个可执行程序,试了一下可以运行,不知道后面会不会有影响
image.png
conda install r-data.table r-dbscan r-R.utils r-devtools
conda install bioconductor-Biostrings bioconductor-rtracklayer
# 启动R radian
devtools::install_github("jtlovell/GENESPACE", upgrade = F)
library(GENESPACE)
runwd<-file.path("./testGenespace/")
make_exampleDataDir(writeDir = runwd) ## 这一步会下载示例数据
gids<-c("human","chimp","rhesus")
gpar<-init_genespace(genomeIDs = gids,speciesIDs = gids,versionIDs = gids,ploidy = rep(1,3),wd = runwd,gffString = "gff",pepString = "pep",path2orthofinder = "orthofinder",path2mcscanx = "/home/myan/scratch/apps/mingyan/Biotools/MCScanX",path2diamond = "diamond",diamondMode = "fast",orthofinderMethod = "fast",rawGenomeDir = file.path(runwd,"rawGenomes"))
parse_annotations(gsParam = gpar,gffEntryType = "gene",gffIdColumn ="locus",gffStripText = "locus=",headerEntryIndex = 1,headerSep = " ",headerStripText = "locus=")
# 上面这行代码没有看懂是在干啥
gpar<-run_orthofinder(gsParam = gpar)
## 运行这行代码出现警告信息
Warning message:
In system2(gsParam$paths$orthofinderCall, com, stdout = TRUE, stderr = TRUE) :
running command ''orthofinder' -b ./testGenespace//orthofinder -t 4 -a 1 -X -og 2>&1' had status 120 and error message 'Interrupted system call'
## 不知道时候对后续有影响 有可能是 runwd<-file.path("./testGenespace/") 这行代码最后多了一个斜线 重新运行了一遍没有问题了
gpar<-synteny(gsParam = gpar)
## 画图展示
pdf(file="abc.pdf",width = 10,height = 8)
plot_riparianHits(gpar)
dev.off()
image.png
pdf(file="abc.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, refGenome = "chimp",invertTheseChrs = data.frame(genome = "rhesus", chr = 2),genomeIDs = c("chimp", "human", "rhesus"),labelTheseGenomes = c("chimp", "rhesus"),gapProp = .001,refChrCols = c("#BC4F43", "#F67243"),blackBg = FALSE,returnSourceData = T, verbose = F)
dev.off()
image.png
regs <- data.frame(genome = c("human", "human", "chimp", "rhesus"),chr = c(3, 3, 4, 5),start = c(0, 50e6, 0, 60e6),end = c(10e6, 70e6, 50e6, 90e6),cols = c("pink", "gold", "cyan", "dodgerblue"))
pdf(file = "abc2.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, onlyTheseRegions = regs,blackBg = FALSE)
dev.off()
image.png
pg <- pangenome(gpar)
输出一个文件 results/human_pangenomeDB.txt.gz
打开这个文件,部分结果如下
image.png
这个结果怎么看暂时没看懂
帮助文档里写道
This is the source data that can be manipulated programatically to extract your regions of interest. Future GENESPACE releases will have auxilary functions that let the user access the pan-genome by rules (e.g. contains these genes, in these regions etc.). For now, we’ll leave this work to scripting by the user.
接下来就是研究研究如何准备自己的数据