CeleScope 教程 || FocuSCOPE™单细胞肺癌靶向基因突变数据分析

生信技能树jimmy

发布于 2022-06-13 09:31:32

7880

发布于 2022-06-13 09:31:32

文章被收录于专栏：单细胞天地

🍳 恭喜！当您打开这个文档，说明您已经获得单细胞肺癌靶向分析数据，开启了单细胞数据分析之旅。在您正式使用CeleScope分析新格元单细胞数据之前，我们希望向您介绍celescope软件的一些基本过程。您可以快速阅读文档，并在您的服务器上完成FocuSCOPE™单细胞肺癌靶向基因突变检测的下机数据到肺癌靶向基因的单细胞表达量信息分析。

一、简介

随着单细胞技术的飞速进展，单细胞转录组，单细胞免疫受体，空间转录组技术等先后出现，从多种角度为我们提供细胞层面的表达信息。很多癌症常伴有一系列基因突变，如EGFR、BRAF、ALK、NRAS等。针对这些特定基因突变的靶向药物治疗因其副作用小、疗效显著而受到人们的普遍欢迎，在单细胞水平对这些基因进行靶向测序将有助于临床研究。

新格元FocuSCOPE™靶向高通量单细胞测序技术可以对突变基因、融合基因、病毒与宿主RNA等研究热点的区域进行特异性捕获研究。

FocuSCOPE™单细胞肺癌靶向基因突变检测试剂盒能完成单细胞捕获、细胞裂解、分子标签标记、细胞mRNA捕获、肺癌靶基因靶突变位点富集文库的构建全流程实验。肺癌靶向捕获区域覆盖EGFR、BRAF、KRAS、NRAS、HRAS、PIK3CA基因的热点突变区段。

二、原理

采用独特性的分子标签磁珠设计，在寡核苷酸序列上加入了靶向目标位点区域3’端的特异性探针，继而实现对目标区域信息的高效获取，同时通过特异性引物进行多重PCR扩增富集，进一步提高对目标区域进行的测序分析。

FocuSCOPE™优势单细胞肺癌靶向基因突变检测专注于解决传统测序中尚未解决的问题

突破基因位置限制基于特异性磁珠设计，可有效捕获目标基因信息
获得信息更全面同时获得一个细胞的全转录组数据和靶向基因检测数据
目标基因设定灵活可检测已知靶基因，也可接受个性化定制

三、应用场景

多角度研究不同肺癌亚型及患者间肿瘤异质性;
解析肺癌微环境;
监测肺癌发展、复发过程中的突变动态;
监测不同驱动突变下的肺癌演化路径;
分析用药前后、单药与联合用药的热点突变和细胞比例变化;
探究治疗方案与突变的潜在联系;辅助临床治疗策略;
辅助分子分型的诊断;
探究疾病发生发展机制;

四、产品优势

高效捕获靶向区域：特异性探针设计,可提高靶向捕获基因效率
肺癌瘤内异质性：单细胞转录组与靶基因同时捕获,了解肺癌组织异质性
肺癌靶药突变覆盖率高：覆盖超60%肺癌靶向药物敏感性及耐药性突变区域
靶向基因表达丰度：无偏检测细胞内靶基因的表达值
单细胞分辨率：在单细胞层面研究肺癌特定突变型/共突变型信息,探究肺癌驱动突变以及耐药突变机制
Biomarker探索：实现不同突变模式下，肺癌用药疗效及预后标志物探索

五、celescope FocuSCOPE 分析流程

FocuSCOPE生成的单细胞转录动态监测数据的基本流程。

FocuSCOPE pipeline (单细胞FocuSCOPE™单细胞肺癌靶向基因突变检测数据分析) 包含10个主要指令，可以通过celescope snp {指令} --help查看：

$ celescope snp --help
usage: celescope snp [-h]
                     {mkref,sample,barcode,cutadapt,consensus,star,featureCounts,target_metrics,variant_calling,filter_snp,analysis_snp}
                     ...

Single-cell snp

positional arguments:
  {mkref,sample,barcode,cutadapt,consensus,star,featureCounts,target_metrics,variant_calling,filter_snp,analysis_snp}

optional arguments:
  -h, --help            show this help message and exit

下载测试数据与脚本为了便于测试软件，我们在github上托管了测试数据（请注意，这些数据仅供测试用途，部分数据是人为生成的）。

mkdir test_dir
cd test_dir
git clone https://github.com/singleron-RD/celescope_test_data.git
git clone https://github.com/singleron-RD/celescope_test_script.git

当然，也可以在gitee上下载：

mkdir test_dir
cd test_dir
git clone https://gitee.com/singleron-rd/celescope_test_data.git
git clone https://gitee.com/singleron-rd/celescope_test_script.git

所有的软件DEMO测试数据我们已经在celescope rna的教程中下载过，这里可以看一下snp数据的结构，让我们看一看下载的测试数据与脚本。

$ tree 
|-- annovar # 一般突变信息的注释数据库需要自己构建
|   |-- annotate_variation.pl # 主程序，功能包括下载数据库，三种不同的注释
|   |-- coding_change.pl 
|   |-- convert2annovar.pl 
|   |-- humandb_HG38 
|   |-- retrieve_seq_from_fasta.pl 
|   |-- table_annovar.pl 
|   `-- variants_reduction.pl 
|-- annovar.config
|-- annovar.tar
|-- run_shell.sh
`-- snp.mapfile

六、FocuSCOPE™单细胞肺癌靶向基因突变检测分析流程实操

单细胞肺癌靶向基因突变检测研究在实验过程中会构建一个转录组文库和肺癌靶基因序列富集，因此数据分析也分为两个环节：

(1) 单细胞转录组分析
(2) 肺癌靶基因序列富集

本篇文章内只介绍单细胞肺癌靶基因序列富集，而celescope分析单细胞转录组数据的教程已在前期中介绍过。在分析之前我们先要激活我们celescope软件的运行环境，可以使用conda activate celescope命令进行激活。

(base) singleron 13:20:48 /../snp
$ conda activate celescope 
(celescope) singleron 13:21:18 /../snp

接下来就是要使用的3个重要的配置文件，annovar.config、snp.mapfile和run_shell.sh。

$ tree -L 1
.
|-- run_shell.sh
|-- annovar.config
`-- snp.mapfile

1、用 multi_snp构建celescope snp 分析的 shell 脚本

配置 mapfile文件--mapfile 是multi_snp下的参数，需要提供一个制表符分隔 (tab-delimited) 的文本文件。mapfile 的每一行代表双端 (paired-end) fastq文件。

snp  /../celescope_test_data/snp/fastqs  snp_test1  /../celescope_test_data/snp/snp_match_dir

其中，

第一列: snp_fastq_ID：对应 snp_fastq文件的名称前缀
第二列: snp_datapath：对应 snp_fastq文件的路径
第三列: snp_sample_name：对应质控报告的名称
第四列：对应与其“配对的”单细胞转录组分析 “snp_match_dir” 路径

此外，还有一个非常重要的ANNOVAR突变信息注释数据库，那么在分析之前还需要构建annovar.config配置文件。

$ cat annovar.config 
[ANNOVAR]
dir = /../celescope_test/snp/annovar # 所有的annovar的脚本文件路径
db = /../celescope_test/snp/annovar/humandb_HG38 # 人类注释数据库
buildver = hg38 #指定基因组版本
protocol = refGene,cosmic70 #指定：refGene代表的是下载的数据库的名字，RefSeq中所有带注释的转录本的FASTA序列；cosmic数据库中的癌症突变信息
operation = g,f # ‘g’表示基于基因的注释，‘f’表示基于筛选子的注释。

annovar.config文件就配置好了。

另一个是 shell 脚本文件：run_shell.sh

$ cat run_shell.sh 
multi_snp \
 --mapfile ./snp.mapfile \
 --genomeDir /../ref_lib/Homo_Sapiens/Homo_GRCh38_GenomeDir \ #mkref创建的参考基因组目录(路径)
 --annovar_config ./snp/annovar.config \ # annovar_config的输入路径
 --mod shell \
 --not_consensus \
 --panel lung_1
 # 如果增加参数“--not_consensus”，分析过程中将不会生成consensus相关目录

2、生成shell脚本

（1）运行刚编辑好的shell脚本run_shell.sh

$ sh run_shell.sh

（2）运行完以后就可以自动生成一个名称为shell的文件目录。

$ tree -L 1
.
|-- run_shell.sh
|-- shell
`-- snp.mapfile

shell文件夹中会有一个以snp_test命名的脚本运行存储数据的目录，以及一个运行的shell脚本snp_test.sh，snp_test.sh脚本中的每行指令对应每一步分析（质控报告的每一部分数据）。

3、投递shell脚本

进入到shell目录中，就可以运行脚本snp_test.sh，然后在终端命令行中输入sh snp_test.sh。那么程序就会在当前的终端界面运行。但是，如果在当前的终端界面中进行运行，终端界面就不能关闭，也不能掉线，否则运行的程序就会中断。那么，为了避免这种情况发生，我们可以使用nohup将运行脚本投递后台运行，执行nohup sh snp_test.sh &，并生成一个nohup.out运行的日志文件。

$ tree -L 1
.
|-- snp_test
|-- snp_test.sh
`-- nohup.out

如果对每一步做了什么感兴趣，可以单独运行查看，snp_test.sh 里面是：

(celescope) singleron /../snp/shell
$ cat snp_test1.sh 
celescope snp sample --outdir .//snp_test1/00.sample --sample snp_test1 --assay snp --thread 4 --chemistry auto  --fq1 /mnt/sdd/singleron_training_class/resources/celescope_test/celescope_test_data/snp/fastqs/snp1_1.fq.gz 
celescope snp barcode --outdir .//snp_test1/01.barcode --sample snp_test1 --assay snp --thread 4 --chemistry auto --lowNum 2  --fq1 /mnt/sdd/singleron_training_class/resources/celescope_test/celescope_test_data/snp/fastqs/snp1_1.fq.gz --fq2 /mnt/sdd/singleron_training_class/resources/celescope_test/celescope_test_data/snp/fastqs/snp1_2.fq.gz 
celescope snp cutadapt --outdir .//snp_test1/02.cutadapt --sample snp_test1 --assay snp --thread 4 --minimum_length 20 --nextseq_trim 20 --overlap 10 --insert 150  --fq .//snp_test1/01.barcode/snp_test1_2.fq 
celescope snp consensus --outdir .//snp_test1/03.consensus --sample snp_test1 --assay snp --thread 4 --threshold 0.5 --not_consensus --min_consensus_read 1  --fq .//snp_test1/02.cutadapt/snp_test1_clean_2.fq 
celescope snp star --outdir .//snp_test1/04.star --sample snp_test1 --assay snp --thread 4 --genomeDir /mnt/sdd/singleron_training_class/resources/ref_lib/Homo_Sapiens/Homo_GRCh38_GenomeDir --outFilterMultimapNmax 1 --starMem 30  --fq .//snp_test1/02.cutadapt/snp_test1_clean_2.fq 
celescope snp featureCounts --outdir .//snp_test1/05.featureCounts --sample snp_test1 --assay snp --thread 4 --gtf_type exon --genomeDir /mnt/sdd/singleron_training_class/resources/ref_lib/Homo_Sapiens/Homo_GRCh38_GenomeDir  --input .//snp_test1/04.star/snp_test1_Aligned.sortedByCoord.out.bam 
celescope snp target_metrics --outdir .//snp_test1/06.target_metrics --sample snp_test1 --assay snp --thread 4 --panel lung_1  --bam .//snp_test1/05.featureCounts/snp_test1_Aligned.sortedByCoord.out.bam.featureCounts.bam --match_dir /mnt/sdd/singleron_training_class/resources/celescope_test/celescope_test_data/snp/snp_match_dir --add_RG 
celescope snp variant_calling --outdir .//snp_test1/07.variant_calling --sample snp_test1 --assay snp --thread 4 --genomeDir /mnt/sdd/singleron_training_class/resources/ref_lib/Homo_Sapiens/Homo_GRCh38_GenomeDir --panel lung_1  --bam .//snp_test1/06.target_metrics/snp_test1_filtered_sorted.bam --match_dir /mnt/sdd/singleron_training_class/resources/celescope_test/celescope_test_data/snp/snp_match_dir 
celescope snp filter_snp --outdir .//snp_test1/08.filter_snp --sample snp_test1 --assay snp --thread 4 --threshold_method auto  --vcf .//snp_test1/07.variant_calling/snp_test1_norm.vcf 
celescope snp analysis_snp --outdir .//snp_test1/09.analysis_snp --sample snp_test1 --assay snp --thread 4 --annovar_config /mnt/sdd/singleron_training_class/resources/celescope_test/snp/annovar.config  --match_dir /mnt/sdd/singleron_training_class/resources/celescope_test/celescope_test_data/snp/snp_match_dir --vcf .//snp_test1/08.filter_snp/snp_test1_filtered.vcf

4、结果目录

$ tree
.
|-- 00.sample
|   `-- stat.txt
|-- 01.barcode
|   |-- snp_test1_2.fq
|   `-- stat.txt
|-- 02.cutadapt
|   |-- cutadapt.log
|   |-- snp_test1_clean_2.fq
|   `-- stat.txt
|-- 04.star
|   |-- snp_test1_Aligned.out.bam
|   |-- snp_test1_Aligned.sortedByCoord.out.bam
|   |-- snp_test1_Aligned.sortedByCoord.out.bam.bai
|   |-- snp_test1_Log.final.out
|   |-- snp_test1_Log.out
|   |-- snp_test1_Log.progress.out
|   |-- snp_test1_SJ.out.tab
|   |-- snp_test1_region.log
|   `-- stat.txt
|-- 05.featureCounts
|   |-- snp_test1
|   |-- snp_test1.summary
|   |-- snp_test1_Aligned.sortedByCoord.out.bam.featureCounts.bam
|   |-- snp_test1_name_sorted.bam
|   `-- stat.txt
|-- 06.target_metrics
|   |-- snp_test1_filtered.bam
|   |-- snp_test1_filtered_sorted.bam
|   |-- snp_test1_filtered_sorted.bam.bai
|   `-- stat.txt
|-- 07.variant_calling
|   |-- snp_test1_norm.vcf
|   |-- snp_test1_raw.bcf
|   |-- snp_test1_raw.vcf
|   |-- snp_test1_splitN.bai
|   |-- snp_test1_splitN.bam
|   `-- stat.txt
|-- 08.filter_snp
|   |-- snp_test1_filtered.vcf
|   `-- stat.txt
|-- 09.analysis_snp
|   |-- annovar
|   |   |-- snp_test1.hg38_cosmic70_dropped
|   |   |-- snp_test1.hg38_cosmic70_filtered
|   |   |-- snp_test1.hg38_multianno.txt
|   |   |-- snp_test1.input
|   |   |-- snp_test1.log
|   |   |-- snp_test1.refGene.exonic_variant_function
|   |   |-- snp_test1.refGene.log
|   |   `-- snp_test1.refGene.variant_function
|   |-- snp_test1_gt.csv
|   |-- snp_test1_variant_ncell.csv
|   |-- snp_test1_variant_table.csv
|   `-- stat.txt
`-- snp_test1_report.html

当我们运行完以后，就可以得到一个单细胞肺癌靶向基因突变检测分析的网页报告。