前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >这可能是我见过最简单的一篇SCI了

这可能是我见过最简单的一篇SCI了

作者头像
生信技能树
发布2018-07-27 09:48:24
5150
发布2018-07-27 09:48:24
举报
文章被收录于专栏:生信技能树生信技能树

多批次WES数据该肿么办

批次很多时候无法避免,比如文章 Biomed Res Int. 2014 . doi: 10.1155/2014/319534 就提到:

In large WES studies, some samples are occasionally sequenced twice or even more times due to a variety of reasons, for example, insufficient coverage in the first experiment, sample duplication, and the rest. It is challenging how to best utilize these duplicated exomes for SNP discovery and genotype calling, especially with batch effects taken into consideration.

正好作者有这样的数据,来源于 Shanghai Breast Cancer Study (SBCS) 数据集的 92 subjects (51 cases and 41 controls) 的外显子数据,建库策略是 QIAmp DNA kit + Illumina TruSeq 得到fastq数据后走标准的 GATK 流程得到 184个BAM文件

可以分3个策略来进行比较

  • M strategy (merging duplicates into one)
  • group H consisting of the higher sequencing depth for each subject
  • group L consisting of the lower depth for each subject

找SNP也是GATK,后续的SNP过滤策略是:

  • (1) ≥ 3 SNPs detected within 10 bp distance;
  • (2) > 10% alignments mapped ambiguously;
  • (3) SNPs having a quality score < 50;
  • (4) variant confidence/quality by depth < 1.5;
  • (5) strand bias score calculated by GATK > −1

最后得到的SNP数量是:46,860, 44,806, and 43,664 for the M, H, and L groups,

对找到SNP做的比较有点简单:

  • heterozygous-homozygous ratio (Hete/Homo)
  • transition-transversion ratio (Ti/Tv)
  • overlapping rate with the 1000 Genomes Project consistently

测序数据评价指标

包括

  • an average of 64.0 and 57.2 million reads per exome
  • with 43.4 and 36.0 mean depths across the target regions
  • 98.23% and 98.65% of the reads were aligned to the human reference genome
  • 49.70% and 49.11% were mapped to the target regions
  • Approximately 86.16% and 86.14% of the reads in the H and L groups had mapping quality ≥ 20

作者并没有上传测试原始数据,简单的给了一些测序及分析总结后的结果而已

Table S1: Data production by 92 duplicated WES subjects.

Table S2: Number of variants observed across the on-target and off-target regions.

Click here to view.(41K, xlsx)

首先是测序详情

可以看到测序数据量其实都还可以,不管是L还是H组!

然后是找到的SNP详情

可以看到把同一个样本的L和H两个数据合并后的确能找到更多的SNP,但是这个观点不是很容易推理吗,为什么需要这样的分析来证明呢?

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-06-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 测序数据评价指标
相关产品与服务
大数据
全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档