ComplexHeatmap包更新支持pheatmap转换

生信菜鸟团

发布于 2020-05-18 13:24:27

2.4K0

发布于 2020-05-18 13:24:27

文章被收录于专栏：生信菜鸟团

热图一直是一种数据矩阵可视化使用率较高的展示形式，常见包含：

heatmap():用于绘制简单热图的函数；

heatmap.2():绘制增强热图的函数；

d3heatmap:用于绘制交互式热图的R包；

pheatmap是热图中使用频率比较高一个R包，ComplexHeatmap:用于绘制、注释和排列复杂热图。现在ComplexHeatmap 迎来新版本升级，支持pheatmap 参数转换。

新增的 ComplexHeatmap::pheatmap()该功能实际上将中的所有参数映射pheatmap::pheatmap()到中的适当参数ComplexHeatmap::Heatmap()，这意味着可以直接将它转换为一个复杂的热图，例如如下比较

更多的比较信息及参数对照表可点击原文查看，顺着主页信息，学习了复杂热图内容。

ComplexHeatmap复杂热图

单个热图由热图主体和热图组件组成。热图主体可以按行和列进行拆分，支持水平和垂直排列。热图组件是标题，树状图，矩阵名称和热图注释，它们放置在heamap主体的四个侧面上，并支持热图主体进行重新排序或拆分。同时，热图和注释（列注释）也可以垂直排列。

所述ComplexHeatmap包是在面向对象的方式实现。为了描述热图列表，有以下几类：

Heatmap 类：单个热图，其中包含热图主体，行/列名称，标题，树形图和行/列注释。
HeatmapList class：热图和热图注释的列表。
HeatmapAnnotationclass：定义行注释和列注释的列表。热图注释可以是热图的组成部分，也可以独立于热图。

还有一些内部类：

SingleAnnotationclass：定义单个行注释或列注释。该HeatmapAnnotation对象包含对象列表 SingleAnnotation。
ColorMapping类：从值到颜色的映射。主矩阵和注释的颜色映射由ColorMapping类控制。
AnnotationFunctionclass：构造用户定义的注释。这是创建用户定义的注释图形的基础。

ComplexHeatmap是在网格系统下实现的，因此用户需要了解基本的网格功能才能充分利用该软件包。

从单细胞RNASeq可视化细胞异质性

（以下为中文翻译搬运内容）

在此示例中，小鼠T细胞的单细胞RNA-Seq数据可视化以显示细胞的异质性。数据（mouse_scRNAseq_corrected.txt）来自Buettner等人，2015，补充数据1，表格“ 细胞周期校正的基因表达谱 ”。作者在此对数据进行预处理：

# 删除了重复的基因
expr = read.table("data/mouse_scRNAseq_corrected.txt", sep = "\t", header = TRUE)
expr = expr[!duplicated(expr[[1]]), ]
rownames(expr) = expr[[1]]
expr = expr[-1]
expr = as.matrix(expr)

# 超过一半的细胞中未表达的基因被滤除。
expr = expr[apply(expr, 1, function(x) sum(x > 0)/length(x) > 0.5), , drop = FALSE]

该get_correlated_variable_rows()功能在此处定义。它提取在细胞之间可变表达并与其他基因相关的特征基因。

get_correlated_variable_genes = function(mat, n = nrow(mat), cor_cutoff = 0, n_cutoff = 0) {
    ind = order(apply(mat, 1, function(x) {
            q = quantile(x, c(0.1, 0.9))
            x = x[x < q[1] & x > q[2]]
            var(x)/mean(x)
        }), decreasing = TRUE)[1:n]
    mat2 = mat[ind, , drop = FALSE]
    dt = cor(t(mat2), method = "spearman")
    diag(dt) = 0
    dt[abs(dt) < cor_cutoff] = 0
    dt[dt < 0] = -1
    dt[dt > 0] = 1

    i = colSums(abs(dt)) > n_cutoff

    mat3 = mat2[i, ,drop = FALSE]
    return(mat3)
}

Signature genes 被定义为一系列基因，其中每个基因与20多个基因相关，绝对相关性大于0.5。

mat2包含每个基因表达情况，

Since single cell RNASeq data is highly variable and outliers are frequent, gene expression is only scaled within the 10th and 90th quantiles.

mat = get_correlated_variable_genes(expr, cor_cutoff = 0.5, n_cutoff = 20)
mat2 = t(apply(mat, 1, function(x) {
    q10 = quantile(x, 0.1)
    q90 = quantile(x, 0.9)
    x[x < q10] = q10
    x[x > q90] = q90
    scale(x)
}))
colnames(mat2) = colnames(mat)

添加细胞周期基因和核糖核蛋白基因信息。细胞周期基因列表来自Buettner等人，2015，补充表1，表“ Union of Cyclebase and GO genes ”。核糖核蛋白基因来自 GO：0030529。基因列表存储在 mouse_cell_cycle_gene.rds和中mouse_ribonucleoprotein.rds。

cc = readRDS("data/mouse_cell_cycle_gene.rds")
ccl = rownames(mat) %in% cc
cc_gene = rownames(mat)[ccl]

rp = readRDS("data/mouse_ribonucleoprotein.rds")
rpl = rownames(mat) %in% rp

由于按比例缩放每个基因的表达值，一个基因相对于其他基因的表达水平已经失真，因此我们计算一个基因在所有样本中的平均表达，用于比较基因之间的表达水平。

base_mean = rowMeans(mat)

将以下信息进行可视化：

scaled expression，mat2，
base_mean，
基因是否是核糖核蛋白基因rpl，
基因是否是细胞周期基因ccl，
细胞周期基因symbols cc_gene，

对于具有相对高表达水平（大于所有基因的25％的分位数）的细胞周期基因，基因名称显示为文本标签。在第一个热图中，基于树状聚类的两个主要组，在两个树状图的基础上对列树状图进行了铺底，以突出显示这两个亚群。

library(GetoptLong)
ht_list = Heatmap(mat2, col = colorRamp2(c(-1.5, 0, 1.5), c("blue", "white", "red")), 
    name = "scaled_expr", column_title = qq("relative expression for @{nrow(mat)} genes"),
    show_column_names = FALSE, width = unit(8, "cm"),
    heatmap_legend_param = list(title = "Scaled expr")) +
    Heatmap(base_mean, name = "base_expr", width = unit(5, "mm"),
        heatmap_legend_param = list(title = "Base expr")) +
    Heatmap(rpl + 0, name = "ribonucleoprotein", col = c("0" = "white", "1" = "purple"), 
        show_heatmap_legend = FALSE, width = unit(5, "mm")) +
    Heatmap(ccl + 0, name = "cell_cycle", col = c("0" = "white", "1" = "red"), 
        show_heatmap_legend = FALSE, width = unit(5, "mm")) +
    rowAnnotation(link = anno_mark(at = which(ccl & base_mean > quantile(base_mean, 0.25)), 
        labels = rownames(mat)[ccl & base_mean > quantile(base_mean, 0.25)], 
        labels_gp = gpar(fontsize = 10), padding = unit(1, "mm"))) +
    Heatmap(cor(t(mat2)), name = "cor", 
        col = colorRamp2(c(-1, 0, 1), c("green", "white", "red")), 
        show_row_names = FALSE, show_column_names = FALSE, row_dend_side = "right", 
        show_column_dend = FALSE, column_title = "pairwise correlation between genes",
        heatmap_legend_param = list(title = "Correlation"))
ht_list = draw(ht_list, main_heatmap = "cor")
decorate_column_dend("scaled_expr", {
    tree = column_dend(ht_list)$scaled_expr
    ind = cutree(as.hclust(tree), k = 2)[order.dendrogram(tree)]

    first_index = function(l) which(l)[1]
    last_index = function(l) { x = which(l); x[length(x)] }
    x1 = c(first_index(ind == 1), first_index(ind == 2)) - 1
    x2 = c(last_index(ind == 1), last_index(ind == 2))
    grid.rect(x = x1/length(ind), width = (x2 - x1)/length(ind), just = "left",
        default.units = "npc", gp = gpar(fill = c("#FF000040", "#00FF0040"), col = NA))
})

可视化结果