前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >scATAC-seq| 可视化DARs

scATAC-seq| 可视化DARs

作者头像
生信技能树
发布2023-02-28 12:15:06
6020
发布2023-02-28 12:15:06
举报
文章被收录于专栏:生信技能树生信技能树

在单细胞转录组冒尖的时候,单细胞其它组学却一直不温不火,比如单细胞ATAC以及单细胞免疫组库。所以虽然很早之前我们就推荐了github代码:https://github.com/GreenleafLab/10x-scATAC-2019

  • 01_Filter_Cells_v2.R
  • 02_Get_Peak_Set_hg19_v2.R
  • 03_Run_chromVAR_v2.R
  • 04_Run_Cicero_v2.R
  • 05_Cluster_Unique_Peaks_v2.R
  • 06_Analyze_UMAP_Trajectory.R
  • 07_ChromVAR_For_GWAS_w_CoAccessbility_v2.R
  • 08_Run_scCNV_v2.R

但是一直没有深入解读它,在2021的时候《单细胞天地》的一个小伙伴写了个开头:

但是没有能坚持下来,其实文章给的配套github代码非常齐全了,就是需要花时间钻研和解读。

恰好2022我们又看到了一个带有全套github代码以及配套数据的文章,见:10x的单细胞ATAC上游流程之cellranger-atac,在等待的期间,交流群的学徒就先拿 atac_v1_pbmc_10k 这样的示例数据开始练习了。所以转发学徒的系列教程给大家。

现在继续:

上回求出差异的peaks之后,还能挖掘更多的数据信息:

例如用热图来展示结果,并且求peak区域附近的基因,看其分布位置

↓↓↓

To investigate the differences in chromatin accessibility among all cell types, we first identified a total of 212,326 peaks in our scATAC-seq data by using MACS223 and found that ~10.6% (22,682 unique peaks) of them exhibited significant differences among cell types; these sites were defined as differentially accessible chromatin regions (DARs) (adjusted P < 0.05 and log2(fold change (FC)) > 0.25)(Fig. 2a)

The majority of DARs were located in the promoter and intronic regions of the genome, and the distribution of DARs was relatively conserved across cell types (Fig. 2b and Supplementary Fig. S2b).

文章用MACS223求出scATAC-seq中的peak,随后用过设置阈值求出有显著差异的染色质开放区域(differentially accessible chromatin regions (DARs)即有显著差异的peak)及其基因的分布位置

文章:Single-cell multiomics analysis reveals regulatory programs in clear cell renal cell carcinoma

流程:

  • 寻找差异的peaks:FindAllMarkers or FindMarkers
  • peaks附近的基因:ClosestFeature
  • 设置阈值:求出DAR
  • 求平均表达量:AverageExpression
  • 可视化:热图,饼图,直方图,条形图等

1. 显著差异的peaks

代码语言:javascript
复制
# 确保已经分好细胞群,不再是序号
table(Idents(pbmc))
# CD14+ Monocytes                 NK dim             CD4 Memory              CD4 Naive 
# 3122                             360                    820                    698 
# Double negative T cell           CD8 effector              CD8 Naive        CD16+ Monocytes 
# 391                                563                    317                    217 
# pre-B cell      B cell progenitor         Dendritic cell                    pDC 
# 280                    170                     68                            49 
# NK bright 
# 5 
levels(pbmc)
# [1] "CD14+ Monocytes"        "NK dim"                 "CD4 Memory"            
# [4] "CD4 Naive"              "Double negative T cell" "CD8 effector"          
# [7] "CD8 Naive"              "CD16+ Monocytes"        "pre-B cell"            
# [10] "B cell progenitor"      "Dendritic cell"         "pDC"                   
# [13] "NK bright"

############################# identify differentially accessible chromatin regions between celltypes
DefaultAssay(pbmc) <- "peaks"
# 寻找显著差异的peaks
# cellType.DARs <- FindAllMarkers(scATAC.data, 
#                                 test.use = 'LR',
#                                 logfc.threshold=0, 
#                                 min.pct = 0.05, # often necessary to lower the min.pct threshold
#                                 latent.vars = "peak_region_fragments")

# 运行比较慢,这里只比较两个细胞群的差异peaks
cellType.DARs <- Seurat::FindMarkers(
  object = pbmc,
  ident.1 = 'CD4 Naive',
  ident.2 = 'CD8 Naive',
  logfc.threshold=0,  # 默认是0.25
  only.pos = TRUE,
  test.use = 'LR', # 推荐用logistic regression
  min.pct = 0.05, # 该基因表达数目占该类细胞总数的比例
  latent.vars = 'nCount_peaks'
)
head(cellType.DARs)
#                          p_val         avg_log2FC pct.1 pct.2    p_val_adj
# chr1-234544345-234545516 8.454563e-16  0.5295198 0.191 0.022 7.402900e-11
# chr16-79632101-79636717  1.431972e-13  0.2235968 0.268 0.073 1.253849e-08
# chr6-167362837-167366950 1.648728e-13  0.2943871 0.285 0.085 1.443643e-08
# chr12-6895592-6896474    3.370162e-13  0.4385442 0.123 0.006 2.950948e-08
# chr12-6898094-6899126    8.623449e-13  0.2885348 0.106 0.003 7.550778e-08
# chr9-90340551-90341713   1.854913e-12  0.2595704 0.142 0.016 1.624180e-07
dim(cellType.DARs) #12642     5

有些教程在FindMarkers用的变量是latent.vars = "peak_region_fragments",其实出来的结果差不多,因其相关性很高

代码语言:javascript
复制
cor(pbmc$peak_region_fragments, pbmc$nCount_peaks) # 0.999884

2. peaks附近的基因

peaks附近的基因: ClosestFeature

代码语言:javascript
复制
# 寻找附近的基因
cf <- ClosestFeature(pbmc, regions = rownames(cellType.DARs))# Find the closest feature to a given set of genomic regions
head(cf)

table(rownames(cellType.DARs) %in% cf$query_region) # 对应peaks
# TRUE 
# 12642
table(rownames(cellType.DARs) %in% cf$closest_region) # 对应基因的region
# FALSE 
# 12642

这里就把求出的regions = rownames(cellType.DARs)当成所有细胞群差异的peak,如果着重研究某两个细胞群的差异,regions就需要改变一下

代码语言:javascript
复制
open_CD4_Naive <- rownames(cellType.DARs[cellType.DARs$avg_log2FC > 0.5, ])
open_CD8_Naive <- rownames(cellType.DARs[cellType.DARs$avg_log2FC < -0.5, ])

closest_genes_CD4_Naive <- ClosestFeature(pbmc, regions = open_CD4_Naive)
closest_genes_CD8_Naive <- ClosestFeature(pbmc, regions = open_CD8_Naive)

结果:

  • cf$query_region:输入的peaks
  • cf$closest_region:基因的region
代码语言:javascript
复制
> head(cf)
                          tx_id gene_name         gene_id   gene_biotype type
ENST00000481183 ENST00000481183    TARBP1 ENSG00000059588 protein_coding  gap
ENST00000569649 ENST00000569649       MAF ENSG00000178573 protein_coding  cds
ENST00000508775 ENST00000508775   RNASET2 ENSG00000026297 protein_coding  cds
ENST00000535707 ENST00000535707       CD4 ENSG00000010610 protein_coding  gap
ENST00000541982 ENST00000541982       CD4 ENSG00000010610 protein_coding  utr
ENST00000343150 ENST00000343150      CTSL ENSG00000135047 protein_coding  utr
                          closest_region             query_region distance
ENST00000481183 chr1-234541846-234546190 chr1-234544345-234545516        0
ENST00000569649  chr16-79632682-79633799  chr16-79632101-79636717        0
ENST00000508775 chr6-167365976-167366036 chr6-167362837-167366950        0
ENST00000535707    chr12-6896172-6898654    chr12-6895592-6896474        0
ENST00000541982    chr12-6898702-6898828    chr12-6898094-6899126        0
ENST00000343150   chr9-90340434-90341313   chr9-90340551-90341713        0
> table(cf$gene_biotype)

             lincRNA processed_transcript       protein_coding                 rRNA 
                1017                  114                11490                   21 
> table(cf$type)

 cds exon  gap  utr 
5465 1781 2724 2672 

这里出来的基因类型和位置都可以用饼图来展示

3. 求平均表达量

  • AverageExpression
代码语言:javascript
复制
# 求表达量,画热图
cellType.DARs <- cbind(cellType.DARs, 
                       gene=cf$gene_name, 
                       query_region = cf$query_region, 
                       type = cf$type, 
                       distance=cf$distance)
# 设置阈值
#require logfc.threshold >= 0.25 & p_val_adj < 0.05
cellType.sig.pos.DARs <- cellType.DARs %>% dplyr::filter(avg_log2FC >=0.25 & p_val_adj < 0.05) %>% arrange(desc(avg_log2FC)) 
dim(cellType.sig.pos.DARs) #19  9
#plot--- differentially accessible chromatin regions heatmap
sig.region <- cellType.sig.pos.DARs %>% dplyr::select(query_region) %>% distinct() 

# 求平均表达量
#average fragment of each peak in each cell type
sig.region.mean <- AverageExpression(pbmc, 
                                     features = sig.region$query_region, 
                                     assays = "peaks")

sig.region.mean.scale <- scale(t(sig.region.mean$peaks))
代码语言:javascript
复制
> sig.region.mean.scale[1:3,]
                chr1-234544345-234545516 chr12-6895592-6896474 chr1-213843536-213844175
CD14+ Monocytes               -0.4226576            -0.2815046               -0.3905448
NK dim                        -0.4918933            -0.5234233               -0.4341829
CD4 Memory                     0.9332080             0.6218469                0.4443438
                chr2-112938926-112941640 chr14-59103807-59105663 chr21-46890735-46891216
CD14+ Monocytes               -0.5949016              -0.5212354              -0.2926606
NK dim                        -0.5968046              -0.5656007              -0.3216966
CD4 Memory                     1.1153124              -0.6531111               0.7722765
                chr15-57883649-57884489 chr3-118937835-118938422 chr3-32279336-32281981
CD14+ Monocytes              -0.4343374              -0.66028682             -0.4855104
NK dim                       -0.9293215              -0.15753047             -0.4648244
CD4 Memory                    0.1321672               0.09150125              1.4333515
                chr21-37660946-37662184 chr8-18870780-18872100 chr6-11728007-11729822
CD14+ Monocytes              -0.7490132              0.3656120             0.35311667
NK dim                       -0.7044903             -0.6868159            -0.72428267
CD4 Memory                    1.5498568             -0.3555641             0.05747466
                chr10-35300735-35301082 chr6-167362837-167366950 chr12-6898094-6899126
CD14+ Monocytes             -0.23194611               -0.8103845             1.1220201
NK dim                      -0.32961628               -0.7481198            -0.7828576
CD4 Memory                   0.09374038                0.5397166             0.8174112
                chr19-52191949-52194762 chr12-6899696-6902624 chr10-62491054-62493599 chr9-90340551-90341713
CD14+ Monocytes              -0.3312373             1.2080742              -1.2251931              1.9848987
NK dim                       -0.7100521            -0.8918025              -0.6249917             -0.5224442
CD4 Memory                    1.4545402             0.3754083               0.7368930             -0.2492964

4. 画热图

代码语言:javascript
复制
# ?Heatmap
Heatmap(sig.region.mean.scale, 
        name = "z-score", 
        show_column_dend = F, 
        show_row_dend = F, 
        row_names_side = "left",
        show_column_names = F, 
        row_names_gp = gpar(fontsize = 10), 
        width = unit(10, "cm"), 
        height = unit(8, "cm"),
        # heatmap_legend_param = list(direction = "horizontal")
        show_heatmap_legend=F) 
library(circlize)
# 添加注释
col_fun = colorRamp2(c(-4,0,4), c("blue", "white", "red"))
lgd = Legend(col_fun = col_fun, 
             title = "z-score",
             direction = "horizontal", # 横向注释
             title_position='lefttop')
lgd
draw(lgd, x = unit(20.7, "cm"), y = unit(7.4, "cm")) # 调整注释位置

参考:https://satijalab.org/signac/reference/closestfeature

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2022-12-28,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 流程:
  • 1. 显著差异的peaks
  • 2. peaks附近的基因
  • 3. 求平均表达量
  • 4. 画热图
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档