Whole-genome resequencing of Cucurbita pepo morphotypes to discover genomic variants associated with morphology and horticulturally valuable traits
Horticulture Research 2019年8月 University of Thessaloniki Greece 当前阶段重点关注基因组重测序数据分析方法以及对结果的解释
图片来自原论文
8个品种 为了整合品种内变异(To integrate putative intra-cultivar variation),每个品种采集5株植株的幼叶然后混合。使用SSR引物评估品种内的变异。
以上每个品种单独计算,然后分亚种计算(number of SNPs were calculated for the eight accessions, as well as for each subspecies separately.) (关于连锁不平衡的一系列计算还有点看不明白)
在论文中可以找到对应的下载链接
samtools faidx Cpepp_genome_v4.1.fa
samtools faidx Cpepp_genome_v4.1.fa Cp4.1LG01 > OneChrom_Cpepp_genome.fa
samtools faidx OneChrom_Cpepp_genome.fa
bwa index OneChrom_Cpepp_genome.fa
for i in Acorn Cocozelle Crookneck Marrow Pumpkin Scallop Yellow_Zuchini Zuchini
do
bwa mem -t 4 -R '@RG\tID:foo\tPL:illumina\tSM:'${i} Reference/OneChrom_Cpepp_genome.fa ${i}/${i}_1.fastq ${i}/${i}_2.fastq | samtools v
done
这里有一个疑问:bwa mem 比对需要制定测序平台,原文测序平台BGISEQ-500,不知道这里设置为illumina是否可以?
gatk CreateSequenceDictionary -R OneChrom_Cpepp_genome.fa -O OneChrom_Cpepp_genome.dict
for i in Acorn Cocozelle Crooknect Marrow Pumpkin Scallop Yellow_Zuchini Zuchini
do
samtools sort -@ 4 -m 4G -O bam -o Output/${i}.sorted.bam Output/${i}.bam
echo 'sorted done'
gatk MarkDuplicates -I Output/${i}.sorted.bam -O Output/${i}.sorted.markdup.bam -M Output/${i}.sorted.markdup_metrics.txt
echo 'MarkDup done'
samtools index Output/${i}.sorted.markdup.bam
echo 'index done'
gatk HaplotypeCaller -R Reference/OneChrom_Cpepp_genome.fa --emit-ref-confidence GVCF -I Output/${i}.sorted.markdup.bam -O GVCFoutput/$
echo 'Haplo done'
done
gatk CombineGVCFs -R ../Reference/OneChrom_Cpepp_genome.fa $(for i in $(ls *.vcf);do echo "--variant $i"; done) -O Combined.g.vc
gatk GenotypeGVCFs -R ../Reference/OneChrom_Cpepp_genome.fa -V Combined.g.vcf -O Nangua8cultivars.vcf
bcftools view -H Nangua8cultivars.vcf | wc -l
结果为 224755
以上步骤就得到了计算一些群体基因组学相关指标所用到的vcf文件
这里为了减小运算压力,只选取了原始数据的前400000行和参考基因组的第一条染色体
没有对原始数据进行质控过滤
今天就先到这里啦!