前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >01.GATK人种系变异最佳实践SnakeMake流程:WorkFlow简介

01.GATK人种系变异最佳实践SnakeMake流程:WorkFlow简介

原创
作者头像
生信探索
发布2023-05-26 10:30:00
3860
发布2023-05-26 10:30:00
举报
文章被收录于专栏:生信探索生信探索

学习的第一个GATK找变异流程,人的种系变异的短序列变异,包括SNP和INDEL。写了一个SnakeMake分析流程,从fastq文件到最后的vep注释后的VCF文件,关于VCF的介绍可以参考上一篇推文基因序列变异信息VCF (Variant Call Format)

流程代码在https://jihulab.com/BioQuest/smkhgshttps://github.com/BioQuestX/smkhgs

README

GATK best practices workflow Pipeline summary

SnakeMake workflow for Human Germline short variants (SNP+INDEL)

Reference

  1. Reference genome related files and GTAK budnle files (GATK)
  2. VEP Variarition annotation files (VEP)

Prepare

  1. Adapter trimming (Fastp)
  2. Aligner (BWA mem2)
  3. Mark duplicates (samblaster)
  4. Generates recalibration table for Base Quality Score Recalibration (BaseRecalibrator)
  5. Apply base quality score recalibration (ApplyBQSR)

Quality control report

  1. Fastp report (MultiQC)
  2. Alignment report (MultiQC)

Call

  1. Call germline SNPs and indels via local re-assembly of haplotypes (HaplotypeCaller)
  2. Import VCFs to GenomicsDB (GenomicsDBImport)
  3. Perform joint genotyping on one or more samples pre-called with HaplotypeCaller (GenotypeGVCFs)

Filter

  1. Select a SNP or INDEL of variants from a VCF file (SelectVariants)
  2. Build a recalibration model to score variant quality for filtering purposes (VariantRecalibrator)
  3. Apply a score cutoff to filter variants based on a recalibration table (ApplyVQSR)
  4. Merge all the VCF files (Picard)

Annotation

Annotate variant calls with VEP (VEP)

SnakeMake Report

Outputs

代码语言:text
复制
.
├── config
│   ├── captured_regions.bed
│   ├── config.yaml
│   └── samples.tsv
├── dag.svg
├── logs
│   ├── annotate
│   ├── call
│   ├── filter
│   ├── prepare
│   ├── qc
│   ├── ref
│   └── trim
├── raw
│   ├── SRR24443168.fastq.gz
│   └── SRR24443169.fastq.gz
├── README.md
├── report
│   ├── fastp_multiqc_data
│   ├── fastp_multiqc.html
│   ├── prepare_multiqc_data
│   ├── prepare_multiqc.html
│   └── vep_report.html
├── results
│   ├── called
│   ├── filtered
│   ├── prepared
│   ├── trimmed
│   └── vep_annotated.vcf.gz
├── workflow
│   ├── envs
│   ├── report
│   ├── rules
│   ├── schemas
│   ├── scripts
│   └── Snakefile

Directed Acyclic Graph

Reference

代码语言:Python
复制
GATK best practices workflow: https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows
GATK: https://software.broadinstitute.org/gatk/
VEP: https://www.ensembl.org/info/docs/tools/vep/index.html
fastp: https://github.com/OpenGene/fastp
BWA mem2: http://bio-bwa.sourceforge.net/
samblaster: https://github.com/GregoryFaust/samblaster
BaseRecalibrator: https://gatk.broadinstitute.org/hc/en-us/articles/13832708374939-BaseRecalibrator
ApplyBQSR: https://github.com/GregoryFaust/samblaster
HaplotypeCaller: https://gatk.broadinstitute.org/hc/en-us/articles/13832687299739-HaplotypeCaller
GenomicsDBImport: https://gatk.broadinstitute.org/hc/en-us/articles/13832686645787-GenomicsDBImport
GenotypeGVCFs: https://gatk.broadinstitute.org/hc/en-us/articles/13832766863259-GenotypeGVCFs
SelectVariants: https://gatk.broadinstitute.org/hc/en-us/articles/13832694334235-SelectVariants
VariantRecalibrator: https://gatk.broadinstitute.org/hc/en-us/articles/13832694334235-VariantRecalibrator
ApplyVQSR: https://gatk.broadinstitute.org/hc/en-us/articles/13832694334235-ApplyVQSR
Picard: https://broadinstitute.github.io/picard
MultiQC: https://multiqc.info

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • README
    • Reference
      • Prepare
        • Quality control report
          • Call
            • Filter
              • Annotation
              • SnakeMake Report
              • Outputs
              • Directed Acyclic Graph
              • Reference
              领券
              问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档