流程输入 | SRR14800265.fastq.gz 测试数据下载 SRX11133330: Nanopore sequencing of SARS-CoV-2: V-22 1 OXFORD_NANOPORE (MinION) run: 277,605 spots, 139.3M bases, 133.2Mb downloads 使用NCBI官方工具sra-toolkit拆分成fastq.gz文件 fastq-dump SRR14800265 --gzip 得到SRR14800265.fastq.gz 参考文件,默认路径/opt/ref下 Artic-ncov2019 artic-ncov2019 primer&参考序列 分析流程文件(可一键导入sliverworkspace运行)及报告文件,conda环境文件下载,导入操作 |
---|---|
运行环境 | docker image based on ubuntu21.04 Conda Mamba(默认使用清华源) ssh 获取镜像代码见下文段落 |
分析软件 | - artic=1.2.1 - artic-network::rampart=1.2.0 - snakemake-minimal=5.8.1 - pangolin=4.1.3 |
输出结果 | 按照序列一致性组装的新冠病毒序列 SRR14800265.consensus.fa Panglin 根据组装的序列分析得出病毒分型信息 lineage_report.csv 根据primertrim.bam获的新冠病毒突变信息,过滤后得到 SRR14800265.pass.vcf.gz |
备注:docker运行的操作系统,推荐为Linux,windows,macOS系统下docker可能部分功能(网络)不能正常运行
# 拉取docker镜像
docker pull doujiangbaozi/sliverworkspace:latest
# 查看docker 镜像
docker images
version: "3"
services:
SarsCov2:
image: doujiangbaozi/sliverworkspace:latest
container_name: SarsCov2
volumes:
- /media/sliver/Data/data:/opt/data:rw #挂载input数据,artic目录下
- /media/sliver/Manufacture/SC2/envs:/root/mambaforge-pypy3/envs:rw #挂载envs conda环境目录
- /media/sliver/Manufacture/SC2/config:/opt/config:rw #挂载config,conda配置文件目录
- /media/sliver/Manufacture/SC2/ref:/opt/ref:rw #挂载reference目录
- /media/sliver/Manufacture/SC2/result:/opt/result:rw #挂载中间文件和输出结果目录
ports:
- "9024:9024" #ssh连接端口可以按需修改
environment:
- TZ=Asia/Shanghai #设置时区
- PS=20191124 #修改默认ssh密码
- PT=9024 #修改默认ssh端口
基础环境运行
# docker-compose.yml 所在目录下运行
docker-compose up -d
# 或者
docker-compose up -d -f /路径/docker-compose.yaml
# 查看docker是否正常运行,docker-compose.yaml目录下运行
docker-compose ps
# 或者
docker ps
docker 容器使用,类似于登录远程服务器
# 登录docker,使用的是ssh服务,可以本地或者远程部署使用
ssh root@192.168.6.6 -p9024
# 看到如下,显示如下提示即正常登录
(base) root@SliverWorkstation:~#
#样本编号
export sn=SRR14800265
#数据输入目录
export data=/opt/data
#数据输出、中间文件目录
export result=/opt/result
#conda安装的环境目录
export envs=/root/mambaforge-pypy3/envs
#artic primer 版本V1,V2,V3,V4,V4.1
export artic_primer_version=4.1
#设置可用线程数
export threads=8
#首次运行下载artic-ncov2019
if [ ! -d "/opt/ref/artic-ncov2019" ]; then
apt-get install -y git
git clone https://github.com/artic-network/artic-ncov2019.git "/opt/ref/artic-ncov2019"
fi
#conda检测环境是否存在,首次运行不存在创建该环境并安装软件
if [ ! -d "${envs}/artic-ncov2019" ]; then
mamba env create -f /opt/ref/artic-ncov2019/environment.yml
mamba install muscle=3.8
fi
source activate artic-ncov2019
cp -f ${data}/artic/${sn}.fastq.gz ${result}/${sn}/
mkdir -p ${result}/${sn}/clean
artic guppyplex --min-length 400 --max-length 700 \
--directory ${result}/${sn}/ \
--output ${result}/${sn}/clean/${sn}.clean.fastq
conda deactivate
source activate artic-ncov2019
if [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta ]; then
cp -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/SARS-CoV-2.reference.fasta \
/opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta
fi
cd ${result}/${sn}
mkdir -p ${result}/${sn}/aligned
minimap2 -a -x map-ont -t ${threads} \
/opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \
${result}/${sn}/clean/${sn}.clean.fastq | samtools view -bS -F 4 - | samtools sort -o ${result}/${sn}/aligned/${sn}.sorted.bam -
samtools index ${result}/${sn}/aligned/${sn}.sorted.bam
conda deactivate
source activate artic-ncov2019
cd ${result}/${sn}
if [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed ]; then
cp -r /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/SARS-CoV-2.scheme.bed \
/opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed
fi
align_trim --normalise 200 /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed \
--start --remove-incorrect-pairs --report ${result}/${sn}/aligned/${sn}.alignreport.txt < ${result}/${sn}/aligned/${sn}.sorted.bam \
2> ${result}/${sn}/aligned/${sn}.alignreport.er | \
samtools sort -T ${result}/${sn}/aligned/temp - -o ${result}/${sn}/aligned/${sn}.trimmed.rg.sorted.bam
align_trim --normalise 200 /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed \
--remove-incorrect-pairs --report ${result}/${sn}/aligned/${sn}.alignreport.txt < ${result}/${sn}/aligned/${sn}.sorted.bam \
2> ${result}/${sn}/aligned/${sn}.alignreport.er| \
samtools sort -T ${result}/${sn}/aligned/temp - -o ${result}/${sn}/aligned/${sn}.primertrimmed.rg.sorted.bam
samtools index ${result}/${sn}/aligned/${sn}.trimmed.rg.sorted.bam
samtools index ${result}/${sn}/aligned/${sn}.primertrimmed.rg.sorted.bam
samtools coverage ${result}/${sn}/aligned/${sn}.primertrimmed.rg.sorted.bam -o ${result}/${sn}/aligned/${sn}.samcov.tsv
conda deactivate
source activate artic-ncov2019
if [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta ]; then
cp -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/SARS-CoV-2.reference.fasta \
/opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta
fi
mkdir -p ${result}/${sn}/vcf
if [ -f ${result}/${sn}/vcf/${sn}.nCoV-2019_1.hdf ];then
rm -f ${result}/${sn}/vcf/${sn}.nCoV-2019_1.hdf
fi
if [ -f ${result}/${sn}/vcf/${sn}.nCoV-2019_2.hdf ];then
rm -f ${result}/${sn}/vcf/${sn}.nCoV-2019_2.hdf
fi
medaka consensus --model r941_min_high_g351 \
--threads ${threads} --chunk_len 800 --chunk_ovlp 400 \
--RG 1 ${result}/${sn}/aligned/${sn}.trimmed.rg.sorted.bam \
${result}/${sn}/vcf/${sn}.nCoV-2019_1.hdf
medaka variant /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \
${result}/${sn}/vcf/${sn}.nCoV-2019_1.hdf ${result}/${sn}/vcf/${sn}.nCoV-2019_1.vcf
medaka consensus --model r941_min_high_g351 \
--threads ${threads} --chunk_len 800 --chunk_ovlp 400 \
--RG 2 ${result}/${sn}/aligned/${sn}.trimmed.rg.sorted.bam \
${result}/${sn}/vcf/${sn}.nCoV-2019_2.hdf
medaka variant /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \
${result}/${sn}/vcf/${sn}.nCoV-2019_2.hdf ${result}/${sn}/vcf/${sn}.nCoV-2019_2.vcf
artic_vcf_merge ${result}/${sn}/vcf/${sn} /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed \
2> ${result}/${sn}/vcf/${sn}.primersitereport.txt \
nCoV-2019_1:${result}/${sn}/vcf/${sn}.nCoV-2019_1.vcf nCoV-2019_2:${result}/${sn}/vcf/${sn}.nCoV-2019_2.vcf
bgzip -f ${result}/${sn}/vcf/${sn}.merged.vcf
tabix -f -p vcf ${result}/${sn}/vcf/${sn}.merged.vcf.gz
conda deactivate
source activate artic-ncov2019
longshot -P 0 -F -A --no_haps \
--bam ${result}/${sn}/aligned/${sn}.primertrimmed.rg.sorted.bam \
--ref /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \
--out ${result}/${sn}/vcf/${sn}.longshoted.vcf \
--potential_variants ${result}/${sn}/vcf/${sn}.merged.vcf.gz
conda deactivate
source activate artic-ncov2019
artic_vcf_filter --medaka ${result}/${sn}/vcf/${sn}.longshoted.vcf \
${result}/${sn}/vcf/${sn}.pass.vcf \
${result}/${sn}/vcf/${sn}.fail.vcf
bgzip -f ${result}/${sn}/vcf/${sn}.pass.vcf
tabix -p vcf ${result}/${sn}/vcf/${sn}.pass.vcf.gz
conda deactivate
source activate artic-ncov2019
artic_make_depth_mask --store-rg-depths \
/opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \
${result}/${sn}/aligned/${sn}.trimmed.rg.sorted.bam \
${result}/${sn}/${sn}.coverage_mask.txt
artic_mask /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \
${result}/${sn}/${sn}.coverage_mask.txt \
${result}/${sn}/vcf/${sn}.fail.vcf \
${result}/${sn}/${sn}.preconsensus.fasta
conda deactivate
source activate artic-ncov2019
bcftools consensus \
-f ${result}/${sn}/${sn}.preconsensus.fasta ${result}/${sn}/vcf/${sn}.pass.vcf.gz \
-m ${result}/${sn}/${sn}.coverage_mask.txt \
-o ${result}/${sn}/${sn}.consensus.fasta
artic_fasta_header ${result}/${sn}/${sn}.consensus.fasta "${sn}"
conda deactivate
#conda检测环境是否存在,首次运行不存在创建该环境并安装软件
if [ ! -d "${envs}/pangolin" ]; then
mamba env create -f /opt/config/pangolin.yaml
fi
source activate pangolin
pangolin ${result}/${sn}/${sn}.consensus.fasta --outdir ${result}/${sn}
conda deactivate
source activate artic-ncov2019
cat ${result}/${sn}/${sn}.consensus.fasta \
/opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \
> ${result}/${sn}/${sn}.muscle.in.fasta
muscle -in ${result}/${sn}/${sn}.muscle.in.fasta -out ${result}/${sn}/${sn}.muscle.out.fasta
conda deactivate
https://github.com/artic-network/artic-ncov2019
#官方建议分析过程
#数据过滤
artic guppyplex --min-length 400 --max-length 700 \
--directory /opt/result/SRR14800265/ \
--output /opt/result/SRR14800265/clean/SRR14800265.clean.fastq
#获取一致性序列和突变数据
artic minion --normalise 200 --threads 32 \
--medaka --medaka-model r941_min_high_g351 \
--scheme-directory /opt/ref/artic-ncov2019/primer_schemes \
--read-file /opt/result/SRR14800265/clean/SRR14800265.clean.fastq nCoV-2019/V4.1 SRR14800265
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。