前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析13

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析13

原创
作者头像
小胡子刺猬的生信学习123
发布2022-09-04 12:11:35
5480
发布2022-09-04 12:11:35
举报

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析1:https://cloud.tencent.com/developer/article/2055573

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析2:https://cloud.tencent.com/developer/article/2072069

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析3:https://cloud.tencent.com/developer/article/2078159

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析4:https://cloud.tencent.com/developer/article/2078348

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析5:https://cloud.tencent.com/developer/article/2084580

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析6:https://cloud.tencent.com/developer/article/2085385

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析7:https://cloud.tencent.com/developer/article/2085705

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析8:https://cloud.tencent.com/developer/article/2086805

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析9:https://cloud.tencent.com/developer/article/2087563

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析10:https://cloud.tencent.com/developer/article/2090290

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析11:https://cloud.tencent.com/developer/article/2093123

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析12:https://cloud.tencent.com/developer/article/2093208

image.png
image.png

还是对fig3的atac的内容进行解析,这次作者做的主要是右边的内容。

image.png
image.png

代码解析

scATAC-seq processing workflow

image.png
image.png
代码语言:javascript
复制
###########################################################
# Matt Regner
# Franco Lab
# Date: May-December 2020
# 
# Sample: EEC
# Description: This script performs the following tasks  
#         1) scATAC-seq processing
#         2) scRNA-seq/scATAC-seq integration
#         3) Peak calling
###########################################################
.libPaths('/home/regnerm/anaconda3/envs/scENDO_scOVAR/lib/R/library')

source("./filterDoublets_modified.R")
###############################################################
library(scater)
library(dplyr)
library(Seurat)
library(patchwork)
library(SingleCellExperiment)
library(scater)
library(ComplexHeatmap)
library(ConsensusClusterPlus)
##msigdbr:基因集数据库
library(msigdbr)
##fgsea 是一个用于快速预排序基因集富集分析 (GSEA) 的 R 包。
library(fgsea)
library(dplyr)
##tibble是data.frame的一种形式
library(tibble)
library(Signac)
library(ggplot2)
library(stringr)
library(EnsDb.Hsapiens.v86)
library(Seurat)
library(Signac)
library(ggplot2)
library(ensembldb)
library(EnsDb.Hsapiens.v86)
library(ArchR)
library(SingleR)
library(viridis)

set.seed(11)
addArchRThreads(threads = 32) 
addArchRGenome("hg38")

# Set up directories and file variables:
##################################################################################################
SAMPLE.ID <- "EEC"

output.dir <- "."
####################
setwd(output.dir)
####################

inputFiles <- list.files(pattern = "\\.gz$")

sampleNames <- c("3533EL","3571DL","36186L","36639L","366C5L")

encode.all <- read.delim("./GRCh38-ccREs.bed",header =F)
colnames(encode.all)[1:3] <- c("seqnames","start","end")

# Read in matching scRNA-seq
###################################################################################################
rna <- readRDS("./endo_EEC_scRNA_processed.rds")

# Redo differential expression with new cell type markers 
Wilcox.markers <- readRDS("./wilcox_DEGs.rds")

# Create Arrow and ArchR project
##########################################################################
ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = sampleNames,
  filterTSS = 0, #Dont set this too high because you can always increase later
  filterFrags = 0,
  addTileMat = T,
  addGeneScoreMat = F
)

doubScores <- addDoubletScores(
  input = ArrowFiles,
  k = 10, #Refers to how many cells near a "pseudo-doublet" to count.
  knnMethod = "UMAP",useMatrix = "TileMatrix",nTrials=5,LSIMethod = 1,scaleDims = F,
  corCutOff = 0.75,UMAPParams = list(n_neighbors =30, min_dist = 0.3, metric = "cosine", verbose =FALSE),
  dimsToUse = 1:50
)

proj <- ArchRProject(
  ArrowFiles = ArrowFiles,
  outputDirectory =SAMPLE.ID,
  copyArrows = T #This is recommened so that you maintain an unaltered copy for later usage.
)

# Filter out outlier low quality cells and doublets
###############################################################################
# GMM for fragments per cell
library(mclust)

for (i in sampleNames){
  proj.i <- proj[proj$Sample == i]

  # GMM for fragments per cell
  depth.clust <- Mclust(log10(proj.i$nFrags),G = 2)
  proj.i$depth.cluster <- depth.clust$classification
  proj.i$depth.cluster.uncertainty <- depth.clust$uncertainty

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    color = as.character(proj.i$depth.cluster),
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) + ggtitle(paste0("GMM classification:\n",i," log10(fragments)"))+
    ggsave(paste0(i,"_depth.pdf"),width = 4,height = 4)

  # GMM for TSS per cell
  TSS.clust <- Mclust(log10(proj.i$TSSEnrichment+1),G = 2)
  proj.i$TSS.cluster <- TSS.clust$classification
  proj.i$TSS.cluster.uncertainty <- TSS.clust$uncertainty

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    color = as.character(proj.i$TSS.cluster),
    discrete = T,
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) + ggtitle(paste0("GMM classification:\n",i," TSS Enrichment"))+
    ggsave(paste0(i,"_TSS.pdf"),width = 4,height = 4)


  df.TSS <- data.frame(proj.i$cellNames,proj.i$TSS.cluster,proj.i$TSS.cluster.uncertainty,proj.i$TSSEnrichment)
  df.TSS <- dplyr::filter(df.TSS,proj.i.TSS.cluster == "2")
  df.TSS <- dplyr::filter(df.TSS,proj.i.TSS.cluster.uncertainty <= 0.05)
  saveRDS(df.TSS,paste0("df_TSS_",i,".rds"))

  df.depth <- data.frame(proj.i$cellNames,proj.i$depth.cluster,proj.i$depth.cluster.uncertainty,proj.i$nFrags)
  df.depth <- dplyr::filter(df.depth,proj.i.depth.cluster == "2")
  df.depth <- dplyr::filter(df.depth,proj.i.depth.cluster.uncertainty <= 0.05)
  saveRDS(df.depth,paste0("df_depth_",i,".rds"))

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    colorDensity = T,
    continuousSet = "sambaNight",
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) +geom_hline(yintercept = log10(min(df.TSS$proj.i.TSSEnrichment)+1),linetype = "dashed")+
    geom_vline(xintercept = min(log10(df.depth$proj.i.nFrags)),linetype = "dashed")+
    ggtitle(paste0("QC thresholds:\n",i))+
    ggsave(paste0(i,"_QC.pdf"),width = 4,height = 4)

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    color = proj.i$DoubletEnrichment,
    discrete = F,
    continuousSet = "sambaNight",
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) +geom_hline(yintercept = min(log10(df.TSS$proj.i.TSSEnrichment+1)),linetype = "dashed")+
    geom_vline(xintercept = min(log10(df.depth$proj.i.nFrags)),linetype = "dashed")+
    ggtitle(paste0("Doublet Enrichment:\n",i))+
    ggsave(paste0(i,"_doublets.pdf"),width = 4,height = 4)

}


for (i in sampleNames[2]){
  proj.i <- proj[proj$Sample == i]

  # GMM for fragments per cell
  ##Mclust:进行聚类
  depth.clust <- Mclust(log10(proj.i$nFrags),G = 2)
  proj.i$depth.cluster <- depth.clust$classification
  proj.i$depth.cluster.uncertainty <- depth.clust$uncertainty

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    color = as.character(proj.i$depth.cluster),
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) + ggtitle(paste0("GMM classification:\n",i," log10(fragments)"))+
    ggsave(paste0(i,"_depth.pdf"),width = 4,height = 4)

  # Manually set TSS threshold
  #TSS.clust <- Mclust(log10(proj.i$TSSEnrichment+1),G = 2)
  proj.i$TSS.cluster <- ifelse(log10(proj.i$TSSEnrichment+1) >= 0.80,"2","1")
  proj.i$TSS.cluster.uncertainty <- rep(NA,nrow(proj.i@cellColData))

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    color = as.character(proj.i$TSS.cluster),
    discrete = T,
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) + ggtitle(paste0("GMM classification:\n",i," TSS Enrichment"))+
    ggsave(paste0(i,"_TSS.pdf"),width = 4,height = 4)


  df.TSS <- data.frame(proj.i$cellNames,proj.i$TSS.cluster,proj.i$TSS.cluster.uncertainty,proj.i$TSSEnrichment)
  df.TSS <- dplyr::filter(df.TSS,proj.i.TSS.cluster == "2")
  #df.TSS <- dplyr::filter(df.TSS,proj.i.TSS.cluster.uncertainty <= 0.05)
  saveRDS(df.TSS,paste0("df_TSS_",i,".rds"))

  df.depth <- data.frame(proj.i$cellNames,proj.i$depth.cluster,proj.i$depth.cluster.uncertainty,proj.i$nFrags)
  df.depth <- dplyr::filter(df.depth,proj.i.depth.cluster == "2")
  df.depth <- dplyr::filter(df.depth,proj.i.depth.cluster.uncertainty <= 0.05)
  saveRDS(df.depth,paste0("df_depth_",i,".rds"))

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    colorDensity = T,
    continuousSet = "sambaNight",
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) +geom_hline(yintercept = log10(min(df.TSS$proj.i.TSSEnrichment)+1),linetype = "dashed")+
    geom_vline(xintercept = min(log10(df.depth$proj.i.nFrags)),linetype = "dashed")+
    ggtitle(paste0("QC thresholds:\n",i))+
    ggsave(paste0(i,"_QC.pdf"),width = 4,height = 4)

  ggPoint(
    x = log10(proj.i$nFrags),
    y = log10(proj.i$TSSEnrichment+1),
    color = proj.i$DoubletEnrichment,
    discrete = F,
    continuousSet = "sambaNight",
    xlabel = "log10(unique fragments)",
    ylabel = "log10(TSS Enrichment+1)"
  ) +geom_hline(yintercept = min(log10(df.TSS$proj.i.TSSEnrichment+1)),linetype = "dashed")+
    geom_vline(xintercept = min(log10(df.depth$proj.i.nFrags)),linetype = "dashed")+
    ggtitle(paste0("Doublet Enrichment:\n",i))+
    ggsave(paste0(i,"_doublets.pdf"),width = 4,height = 4)

}


###############################################################################
dev.off()

# Filter out low quality cells, and remove doublets
##############################################################################

list.depth <- list.files(pattern = "^df_depth")

df.depth <-  data.frame(cellNames=character(),
                        cluster=character(),
                        cluster.uncertainty=character(),
                        nFrags = character())
for (i in list.depth){
  df <- readRDS(i)
  colnames(df) <- c("cellNames","cluster","cluster.uncertainty","nFrags")
  df.depth <- rbind(df.depth,df)
}

list.TSS <- list.files(pattern = "^df_TSS")

df.TSS <-  data.frame(cellNames=character(),
                      cluster=character(),
                      cluster.uncertainty=character(),
                      TSSEnrichment = character())
for (i in list.TSS){
  df <- readRDS(i)
  colnames(df) <- c("cellNames","cluster","cluster.uncertainty","TSSEnrichment")
  df.TSS <- rbind(df.TSS,df)
}


colnames(df.TSS) <- c("cellNames","TSS.cluster","TSS.cluster.uncertainty","TSSEnrichment")
colnames(df.depth) <- c("cellNames","depth.cluster","depth.cluster.uncertainty","nFrags")

cellsPass <- intersect(df.TSS$cellNames,df.depth$cellNames)

cellsFail <-  proj$cellNames[!(proj$cellNames %in% cellsPass)]

# Screen for high quality barcodes (remove non cellular barcodes)
proj.filter <- proj[proj$cellNames %in% cellsPass]


proj <- filterDoublets(proj.filter,filterRatio = 1,cutEnrich = 1,cutScore = -Inf)


plotFragmentSizes(proj)+ggtitle("Fragment Size Histogram")+ggsave("Frags_hist.pdf",width = 6,height = 4)
plotTSSEnrichment(proj)+ggtitle("TSS Enrichment")+ggsave("TSS.pdf",width = 6,height = 4)
###############################################################################################################


# Perform LSI reduction and clustering with ATAC data only
#######################################################################
# Add LSI dimreduc
proj <- addIterativeLSI(
  ArchRProj = proj,
  useMatrix = "TileMatrix",
  name = "IterativeLSI",
  iterations = 4,
  LSIMethod = 2,
  scaleDims = T,
  clusterParams = list( #See Seurat::FindClusters
    resolution = c(0.2),
    sampleCells = 10000,
    n.start = 10
  ),
  UMAPParams = list(n_neighbors =30,
                    min_dist = 0.3,
                    metric = "cosine",
                    verbose =FALSE),
  varFeatures = 25000,
  dimsToUse = 1:50,
  binarize = T,
  corCutOff = 0.75,
  force = T,
  seed = 11
)

proj <- addClusters(
  input = proj,
  reducedDims = "IterativeLSI",
  method = "Seurat",
  name = "ATAC_clusters",
  resolution = 0.7,
  dimsToUse = 1:50,force = T
)

# Add UMAP based on LSI dims
proj <- addUMAP(proj,nNeighbors = 30,minDist = 0.3,dimsToUse = 1:50,metric = "cosine",force = T,reducedDims = "IterativeLSI")

###################################################################################################


# Estimate gene activity in ATAC data and perform cell type annotation:

# Add Gene activity matrix using ArchR model
proj <- addGeneScoreMatrix(proj,matrixName = "ArchRGeneScore",force = T)


getAvailableMatrices(proj)
saveRDS(proj,"proj_LSI_AND_GeneScores.rds")

#  Constrained Integration to only align cells from the same patient tumor

groupList <- SimpleList()
for (i in levels(factor(proj$Sample))){

  rna.sub <- rna[,rna$Sample == i]
  RNA.cells <- colnames(rna.sub)

  idxSample <- BiocGenerics::which(proj$Sample == i)
  cellsSample <- proj$cellNames[idxSample]
  proj.filter <- proj[cellsSample, ]
  ATAC.cells <- proj.filter$cellNames

  groupList[[i]] <- SimpleList(
    ATAC = ATAC.cells,
    RNA = RNA.cells
  )
}



proj <- addGeneIntegrationMatrix(
  ArchRProj = proj,
  useMatrix = "ArchRGeneScore",
  matrixName = "GeneIntegrationMatrix_ArchR",
  reducedDims = "IterativeLSI",
  seRNA = rna,
  groupList = groupList,
  addToArrow = T,
  force= TRUE,
  groupRNA = "cell.type",
  nameCell = "predictedCell_ArchR",
  nameGroup = "predictedGroup_ArchR",
  nameScore = "predictedScore_ArchR",
  plotUMAP = F,
  useImputation = F,
  transferParams = list(dims = 1:50)
)
getAvailableMatrices(proj)
saveRDS(proj,"proj_LSI_GeneScores_Annotations_Int.rds")
proj <- readRDS("proj_LSI_GeneScores_Annotations_Int.rds")
############################################################################################
# Begin Downstream Analysis
#      1)  Plotting RNA/ATAC by sample, by cluster, by predicted label 
#      2)  Marker Gene (RNA/ATAC) intersection
#      3)  Peak2GeneLinks/Coaccessiblity
######################################################

# PART 1: Plotting 
######################################################################################

# Make embedding highlighting by 1) Predicted group ArchR 2) Predicted group Signac
# 3) Sample 4) ATAC-only clusters 
atac.archr <- plotEmbedding(proj,colorBy = "cellColData",name = "predictedGroup_ArchR")
atac.archr.emb <- as.data.frame(atac.archr$data)
atac.archr.emb$cell.type.archr <- atac.archr.emb$color
atac.archr.emb$cell.type.archr <- sub("-", ":", atac.archr.emb$cell.type.archr)
atac.archr.emb$cell.type.archr <- gsub(".*:", "", atac.archr.emb$cell.type.archr)
head(atac.archr.emb)
atac.archr.emb$cell.type.archr <- factor(atac.archr.emb$cell.type.archr, levels = levels(as.factor(rna$cell.type)))
head(atac.archr.emb)

atac <- plotEmbedding(proj,colorBy = "cellColData",name = "Sample")
atac.emb.sample <- as.data.frame(atac$data)
atac.emb.sample$sample <- atac.emb.sample$color
atac.emb.sample$sample <- sub("-", ":", atac.emb.sample$sample )
atac.emb.sample$sample  <- gsub(".*:", "", atac.emb.sample$sample )
head(atac.emb.sample)
head(atac.emb.sample)


atac <- plotEmbedding(proj,colorBy = "cellColData",name = "ATAC_clusters")
atac.emb.cluster <- as.data.frame(atac$data)
atac.emb.cluster$sample <- atac.emb.cluster$color
atac.emb.cluster$sample <- sub("-", ":", atac.emb.cluster$sample )
atac.emb.cluster$sample  <- gsub(".*:", "", atac.emb.cluster$sample )
head(atac.emb.cluster)

atac.emb.all <- cbind(atac.archr.emb[,c(1:2,4)],
                      atac.emb.sample[,4],
                      atac.emb.cluster[,4])

atac.emb.all$plain <- "Plain"

colnames(atac.emb.all) <- c("UMAP1","UMAP2","Predicted.Group.ArchR",
                            "Sample","ATAC_clusters","Blank")

head(atac.emb.all)

var.list <- colnames(atac.emb.all)[3:4]

for (i in 1:length(var.list)){

  ggplot(atac.emb.all,aes_string(x = "UMAP1",y="UMAP2",color = var.list[i]))+
    geom_point(size = .1)+
    theme_classic()+
    ggtitle(paste0("scATAC-seq: ",var.list[i]))+
    theme(plot.title = element_text(face = "bold"))+
    xlab("UMAP_1")+
    ylab("UMAP_2")+
    theme(legend.key.size = unit(0.2, "cm"))+
    guides(colour = guide_legend(override.aes = list(size=3)))+
    ggsave(paste0(var.list[i],"_ATAC.pdf"),width = 8,height = 6)
  
  
}


prediction.scores <- data.frame(ArchR= proj$predictedScore_ArchR)
var.list <- colnames(prediction.scores)

for (i in 1:length(var.list)){
  
  ggplot(prediction.scores,aes_string(x = var.list[i]))+
    geom_histogram(binwidth = 0.025,fill="#000000", color="#e9ecef", alpha=0.9)+
    theme_classic()+
    ggsave(paste0(var.list[i],"_ATAC.pdf"),width = 8,height = 6)

}



# Plot matching scRNA-seq plots:
#####################################################################################
rna.emb <- as.data.frame(rna@reductions$umap@cell.embeddings)
rna.emb$cell.type <- as.factor(rna$cell.type)
rna.emb$sample <- rna$Sample

rna.emb$cell.type <- factor(rna.emb$cell.type,levels = levels(atac.emb.all$Predicted.Group.ArchR))
rna.cell.plot <- ggplot(rna.emb,aes(x = UMAP_1,y=UMAP_2,color = cell.type))+
  geom_point(size = .1)+
  theme_classic()+
  ggtitle("scRNAseq")+
  theme(plot.title = element_text(face = "bold"))+
  xlab("UMAP_1")+
  ylab("UMAP_2")+ 
  theme(legend.key.size = unit(0.2, "cm"))+
  guides(colour = guide_legend(override.aes = list(size=3)))
rna.cell.plot +ggsave("RNA_All_labels.pdf",width = 8,height = 6)
  


rna.emb$sample <- factor(rna.emb$sample,levels = levels(factor(atac.emb.all$Sample)))
rna.sample.plot <-ggplot(rna.emb,aes(x = UMAP_1,y=UMAP_2,color = sample))+
  geom_point(size = .1)+
  theme_classic()+
  ggtitle("scRNAseq")+
  theme(plot.title = element_text(face = "bold"))+
  xlab("UMAP_1")+
  ylab("UMAP_2")+ 
  theme(legend.key.size = unit(0.2, "cm"))+
  guides(colour = guide_legend(override.aes = list(size=3)))+ggsave("RNA_All_sample.pdf")
rna.sample.plot +ggsave("RNA_All_samples.pdf",width = 8,height = 6)

#########################################################################################################


# PART 2: Marker gene ovarlap RNA/ATAC


# Differential pseudo-gene activity analysis: 
############################

######################################################################################
#ArchR
########################################################################################
# DEGs using ATAC labels
markersGS.archr <- getMarkerFeatures(
  ArchRProj = proj,
  useMatrix = "ArchRGeneScore",
  groupBy = "ATAC_clusters",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)


heatmapGS.archr <- markerHeatmap(
  seMarker = markersGS.archr,
  cutOff = "FDR <= 0.01 & Log2FC >= 1.25",
  labelMarkers =NULL,
  transpose = F,
  pal =  viridis(n=256),
  limits = c(-2,2)
)
ComplexHeatmap::draw(heatmapGS.archr, heatmap_legend_side = "bot", annotation_legend_side = "bot")
plotPDF(heatmapGS.archr , name = "GeneScores-Marker-Heatmap_ArchR", width = 8, height = 6, ArchRProj = proj, addDOC = FALSE)




# DEGs using predicted labels (removing small groups)
idxSample <- BiocGenerics::which(proj$predictedScore_ArchR > 0.5)
cellsSample <- proj$cellNames[idxSample]
proj.filter <- proj[cellsSample, ]

popular.groups <- summary(factor(proj.filter$predictedGroup_ArchR))
popular.groups <- popular.groups[popular.groups > 10]
proj.filter$Mode.Label <- ifelse(proj.filter$predictedGroup_ArchR %in% names(popular.groups),TRUE,FALSE)

idxSample <- BiocGenerics::which(proj.filter$Mode.Label == TRUE)
cellsSample <- proj.filter$cellNames[idxSample]
proj.filter <- proj.filter[cellsSample, ]

# DEGs using predicted labels
markersGS.archr.pred <- getMarkerFeatures(
  ArchRProj = proj.filter,
  useMatrix = "ArchRGeneScore",
  groupBy = "predictedGroup_ArchR",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)
heatmapGS.archr.pred <- markerHeatmap(
  seMarker = markersGS.archr.pred,
  cutOff = "FDR <= 0.01 & Log2FC >= 1.25",
  labelMarkers =NULL,
  transpose = F,
  pal =  viridis(n=256),
  limits = c(-2,2)
)
ComplexHeatmap::draw(heatmapGS.archr.pred, heatmap_legend_side = "bot", annotation_legend_side = "bot")
plotPDF(heatmapGS.archr.pred, name = "GeneScores-Marker-Heatmap_ArchR_pred", width = 8, height = 6, ArchRProj = proj.filter, addDOC = FALSE)

# Differential peak analysis:
############################
# ATAC clusters
proj <- addGroupCoverages(ArchRProj = proj,groupBy = "ATAC_clusters",force = T)

pathToMacs2 <- findMacs2()

proj <- addReproduciblePeakSet(
  ArchRProj = proj,
  groupBy = "ATAC_clusters",
  pathToMacs2 = pathToMacs2,force = T
)
proj <- addPeakMatrix(proj,force = T)
proj <- addBgdPeaks(proj)

markersPeaks <- getMarkerFeatures(
  ArchRProj = proj,
  useMatrix = "PeakMatrix",
  groupBy = "ATAC_clusters",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)

heatmapPeaks<- markerHeatmap(
  seMarker = markersPeaks,
  cutOff = "FDR <= 0.01 & Log2FC >= 1.25",
  labelMarkers =NULL,
  transpose = F,
  pal =  viridis(n=256),
  limits = c(-2,2)
)
ComplexHeatmap::draw(heatmapPeaks, heatmap_legend_side = "bot", annotation_legend_side = "bot")
plotPDF(heatmapPeaks, name = "Markers_peaks_ATAC_clusters", width = 8, height = 6, ArchRProj = proj, addDOC = FALSE)


# ArchR predicted labels


# DEGs using predicted labels (removing small groups)
idxSample <- BiocGenerics::which(proj$predictedScore_ArchR >= 0.5)
cellsSample <- proj$cellNames[idxSample]
proj.filter <- proj[cellsSample, ]

popular.groups <- summary(factor(proj.filter$predictedGroup_ArchR))
popular.groups <- popular.groups[popular.groups > 10]
proj.filter$Mode.Label <- ifelse(proj.filter$predictedGroup_ArchR %in% names(popular.groups),TRUE,FALSE)

idxSample <- BiocGenerics::which(proj.filter$Mode.Label == TRUE)
cellsSample <- proj.filter$cellNames[idxSample]
proj.filter <- proj.filter[cellsSample, ]

proj.archr <- addGroupCoverages(ArchRProj = proj.filter,groupBy = "predictedGroup_ArchR",force = T)

pathToMacs2 <- findMacs2()

proj.archr <- addReproduciblePeakSet(
  ArchRProj = proj.archr,
  groupBy = "predictedGroup_ArchR",
  pathToMacs2 = pathToMacs2,force = T
)
proj.archr <- addPeakMatrix(proj.archr,force = T)
proj.archr <- addBgdPeaks(proj.archr,force = T)
markersPeaks.archr <- getMarkerFeatures(
  ArchRProj = proj.archr,
  useMatrix = "PeakMatrix",
  groupBy = "predictedGroup_ArchR",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)

heatmapPeaks.archr<- markerHeatmap(
  seMarker = markersPeaks.archr,
  cutOff = "FDR <= 0.01 & Log2FC >= 1.25",
  labelMarkers =NULL,
  transpose = F,
  pal =  viridis(n=256),
  limits = c(-2,2)
)
ComplexHeatmap::draw(heatmapPeaks.archr, heatmap_legend_side = "bot", annotation_legend_side = "bot")
plotPDF(heatmapPeaks.archr , name = "Markers_peaks_Archr_Predicted_labels", width = 8, height = 6, ArchRProj =  proj.archr, addDOC = FALSE)

saveRDS(proj.archr,"./final_archr_proj_archrGS.rds")
# RNA heatmap
####################################################################################
topN <-Wilcox.markers %>% group_by(cluster) %>% dplyr::filter(p_val_adj <= 0.01) %>% top_n(30, desc(avg_logFC))

# Downsample cells from each cluster
rna.sub <- subset(rna,downsample =300)
rna.sub <- NormalizeData(rna.sub)
rna.sub <- rna.sub[rownames(rna.sub) %in% topN$gene,]
rna.sub <- ScaleData(rna.sub,features = rownames(rna.sub))

mat <- rna.sub@assays$RNA@scale.data

cluster_anno<- rna.sub@meta.data$cell.type

col_fun = circlize::colorRamp2(c(-2, 0, 2),viridis(n = 3))

heatmapRNA <- Heatmap(mat, name = "Expression",
        column_split = factor(cluster_anno),
        cluster_columns =T,
        show_column_dend = F,
        cluster_column_slices = T,
        column_title_gp = gpar(fontsize = 8),
        column_gap = unit(0.1, "mm"),
        cluster_rows = T,
        show_row_dend = FALSE,
        col = col_fun,
        column_title_rot = 90,
        show_column_names = F)
plotPDF(heatmapRNA, name = "Heatmap_RNA", width = 8, height = 6)


#################
# END OF SCRIPT
#################

总结

上面的流程还是比较清楚的,先对ATAC的数据进行基本的过滤,然后跟前面的一个教程里面的内容很像,把两个数据集进行整合,去对atac数据集进行peak分析,最后去寻找差异基因,进行可视化,可视化都分了两种,一种是样本,一种是细胞类型。

里面觉得很好的是画热图的方法,还有对样本循环做数据过滤的过程,作者从前面到现在的数据质量的值都很固定,其实应该是有自己调试过的。

这里面的循环比较适合我这种比较懒得人,还有给两个数据集建立锚得也是单细胞两组学结合用到得。

我觉得作者的配色还挺漂亮的,准备拿自己的数据集进行可视化,学习改一下代码。

我现在有一个感觉,就是一个漂亮的代码写出来不亚于一篇好的paper,如果要是把里面的东西一点一点的讲清楚,也是很考验写东西的人的能力的。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 代码解析
  • 总结
相关产品与服务
命令行工具
腾讯云命令行工具 TCCLI 是管理腾讯云资源的统一工具。使用腾讯云命令行工具,您可以快速调用腾讯云 API 来管理您的腾讯云资源。此外,您还可以基于腾讯云的命令行工具来做自动化和脚本处理,以更多样的方式进行组合和重用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档