随着单细胞项目的细胞数量越来越夸张,动辄就过百万,比如:百万级别数量的单细胞数据在r里面如何更快处理呢。大家对此的看法不约而同是转为Python编程语言,而且很适用于GPU加速。但是也有部分小伙伴觉得Python的可视化做的有点弱,恰好Marsilea的开发者投稿,分享了这个Python的生物数据可视化革命性武器,值得大家入手!

前言
Python作为数据科学中的重要生态语言,却在可视化表达上与R语言中的生态相去甚远。而最近新推出的Marsilea则试图改变这一现状。图注:TIOBE语言排行(2024年7月),由marsilea绘制

在展示一个研究结果时,我们常常需要使用多个图表来展示一份数据的方方面面,以确保数据正确且完整的被理解。例如在展示一个单细胞表达矩阵时,我们会在热图的基础上,在热图侧面添加柱状图展示细胞数目,添加小提琴图来表示基因的分布。这样子的可视化范式在Marsilea中被称之为composable visualization(组合可视化)。

marsilea可以直接使用pip进行安装,在命令行中输入:
pip install marsilea以下希望通过一个简单例子,来展示如何在marsilea中创建一个简单的热图,如果你使用过complexheatmap,可能会有助于你理解。
# 引入numpy和marsilea
import numpy as np
import marsilea as ma
import marsilea.plotter as mp
# 创建一些随机数据
data = np.random.rand(20, 20)
cat = np.random.choice(["A", "B", "C"], 20)
# 初始化热图
h = ma.Heatmap(data, linewidth=1)
# 在左边加入一个colors
# 设置了占位的大小(size)为0.2
# 设置与相邻的图间隔(pad)为0.1
h.add_left(mp.Colors(cat), size=.2, pad=.1)
# 在左边和顶部添加层次聚类
h.add_dendrogram("left")
h.add_dendrogram("top")
# 在右侧添加文字标记
h.add_right(mp.Labels(cat), pad=.1)
# 在右侧继续添加一个柱状图
h.add_right(mp.Bar(data.mean(axis=0)), pad=.1)
# 最终渲染
h.render()这是你将会看到的热图

marsilea提供了多种可视化模块,你可以任意拼接他们,而且可以随意制定不同模块的大小和他们之间的距离,提供了非常强大的定制化能力。
可视化单细胞pbmc3k数据(代码有点长,需要大家自行结构化理解哦 )
# 引入marsilea
import marsilea as ma
import marsilea.plotter as mp
# 引入其他相关的包
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
from sklearn.preprocessing import normalize
# 获取数据
pbmc3k = ma.load_data("pbmc3k")
exp = pbmc3k["exp"]
pct_cells = pbmc3k["pct_cells"]
count = pbmc3k["count"]
matrix = normalize(exp.to_numpy(), axis=0)
cell_cat = ["Lymphoid", "Myeloid", "Lymphoid", "Lymphoid",
"Lymphoid", "Myeloid", "Myeloid", "Myeloid"]
cell_names = ["CD4 T", "CD14\nMonocytes", "B", "CD8 T",
"NK", "FCGR3A\nMonocytes", "Dendritic", "Megakaryocytes"]
# 创建可视化
cells_proportion = mp.SizedMesh(
pct_cells,
size_norm=Normalize(vmin=0, vmax=100),
color="none",
edgecolor="#6E75A4",
linewidth=2,
sizes=(1, 600),
size_legend_kws=dict(title="% of cells", show_at=[0.3, 0.5, 0.8, 1]),
)
mark_high = mp.MarkerMesh(matrix > 0.7, color="#DB4D6D", label="High")
cell_count = mp.Numbers(count["Value"], color="#fac858", label="Cell Count")
cell_exp = mp.Violin(exp, label="Expression", linewidth=0, color="#ee6666", density_norm="count")
cell_types = mp.Labels(cell_names, align="center")
gene_names = mp.Labels(exp.columns)
# Group plots together
h = ma.Heatmap(matrix, cmap="Greens", label="Normalized\nExpression", width=4.5, height=5.5)
h.add_layer(cells_proportion)
h.add_layer(mark_high)
h.add_right(cell_count, pad=0.1, size=0.7)
h.add_top(cell_exp, pad=0.1, size=0.75, name="exp")
h.add_left(cell_types)
h.add_bottom(gene_names)
h.hsplit(labels=cell_cat, order=["Lymphoid", "Myeloid"])
h.add_left(mp.Chunk(["Lymphoid", "Myeloid"], ["#33A6B8", "#B481BB"]), pad=0.05)
h.add_dendrogram("left", colors=["#33A6B8", "#B481BB"])
h.add_dendrogram("bottom")
h.add_legends("right", align_stacks="center", align_legends="top", pad=0.2)
h.set_margin(0.2)
h.render()
sc-pbmc3k|400
当然,你也可以不绘制热图,marsilea不局限于热图!任意形式的图都可以绘制和组合!
import marsilea as ma
import marsilea.plotter as mp
import mpl_fontkit as fk
fk.install_fontawesome(verbose=False)
fk.install("Lato", verbose=False)
oils = ma.load_data("cooking_oils")
red = "#cd442a"
yellow = "#f0bd00"
green = "#7e9437"
gray = "#eee"
mapper = {0: "\uf58a", 1: "\uf11a", 2: "\uf567"}
cmapper = {0: "#609966", 1: "#DC8449", 2: "#F16767"}
flavour = [mapper[i] for i in oils["flavour"].values]
flavour_colors = [cmapper[i] for i in oils["flavour"].values]
fat_content = oils[
["saturated", "polyunsaturated (omega 3 & 6)", "monounsaturated", "other fat"]
]
fat_stack_bar = mp.StackBar(
fat_content.T * 100,
colors=[red, yellow, green, gray],
width=0.8,
orient="h",
label="Fat Content (%)",
legend_kws={"ncol": 2, "fontsize": 10},
)
fmt = lambda x: f"{x:.1f}" if x > 0 else ""
trans_fat_bar = mp.Numbers(
oils["trans fat"] * 100,
fmt=fmt,
color="#3A98B9",
label="Trans Fat (%)",
)
flavour_emoji = mp.Labels(
flavour, fontfamily="Font Awesome 6 Free", text_props={"color": flavour_colors}
)
oil_names = mp.Labels(oils.index.str.capitalize())
fmt = lambda x: f"{int(x)}" if x > 0 else ""
omege_bar = ma.plotter.CenterBar(
(oils[["omega 3", "omega 6"]] * 100).astype(int),
names=["Omega 3 (%)", "Omega 6 (%)"],
colors=["#7DB9B6", "#F5E9CF"],
fmt=fmt,
show_value=True,
)
conditions_text = [
"Control",
">230 °C\nDeep-frying",
"200-229 °C\nStir-frying",
"150-199 °C\nLight saute",
"<150 °C\nDressings",
]
colors = ["#e5e7eb", "#c2410c", "#fb923c", "#fca5a5", "#fecaca"]
conditions = ma.plotter.Chunk(conditions_text, colors, rotation=0, padding=10)
cb = ma.ClusterBoard(fat_content.to_numpy(), height=10)
cb.add_layer(fat_stack_bar)
cb.add_left(trans_fat_bar, pad=0.2, name="trans fat")
cb.add_right(flavour_emoji)
cb.add_right(oil_names, pad=0.1)
cb.add_right(omege_bar, size=2, pad=0.2)
order = [
"Control",
">230 °C (Deep-frying)",
"200-229 °C (Stir-frying)",
"150-199 °C (Light saute)",
"<150 °C (Dressings)",
]
cb.hsplit(labels=oils["cooking conditions"], order=order)
cb.add_left(conditions, pad=0.1)
cb.add_dendrogram(
"left", add_meta=False, colors=colors, linewidth=1.5, size=0.5, pad=0.02
)
cb.add_title(top="Fat in Cooking Oils", fontsize=16)
cb.add_legends("bottom", pad=0.3)
cb.render()
axes = cb.get_ax("trans fat")
for ax in axes:
ax.set_xlim(4.2, 0)
oil-content|400
marsilea也可以用于绘制其他生物信息学中常用的可视化图像!

upsetplot|400

在scanpy官方文档中,有一份详细的文档展示如何利用marsilea复刻scanpy中的可视化:https://scanpy.readthedocs.io/en/stable/how-to/plotting-with-marsilea.html 感兴趣的同学可以自己去阅读。以下展示一些案例

track plot|500

stack violin|400
Marsilea的名字来源于四叶草的拉丁语,而四叶草与composable visualization最终组合形成的图像相似。
marsilea将在9月10日开始的scverse conference(https://scverse.org/conference2024/)上提供workshop,欢迎感兴趣的同学前来参加学习。注:scverse是单细胞和空间组学领域最大的联盟