前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >单细胞转录组实战06: pySCENIC转录因子分析(docker)

单细胞转录组实战06: pySCENIC转录因子分析(docker)

原创
作者头像
生信探索
发布2023-02-21 19:06:08
1.1K0
发布2023-02-21 19:06:08
举报
文章被收录于专栏:生信探索生信探索

准备环境

  • pyscenic
代码语言:shell
复制
micromamba activate SC
  • 安装docker

需要有root权限或者在docker的用户组

代码语言:shell
复制
#1.Update the apt package index and install packages to allow apt to use a repository over HTTPS
sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
#2.Add Docker’s official GPG key
sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
#3.Use the following command to set up the repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
#4.install docker
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
#5.chmod
sudo groupadd docker
sudo usermod -aG docker ${USER} # 把非root用户添加到用户组
sudo systemctl restart docker
  • docker pyscenic
代码语言:shell
复制
docker pull aertslab/pyscenic:0.12.1

准备数据库文件

代码语言:shell
复制
#>>>downlod_SCENIC.sh>>>
# 准备数据库 (人)
mkdir -p ~/DataHub/SCENIC
cd ~/DataHub/SCENIC

# get ranking database
wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather.sha1sum.txt

sha1sum -c hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather.sha1sum.txt hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

# get motif database
wget https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl

# TF list # 查看文本文件是不是只有一列基因
wget https://github.com/aertslab/pySCENIC/blob/master/resources/hs_hgnc_tfs.txt

#>>>downlod_SCENIC.sh>>>
nohup zsh downlod_SCENIC.sh &> downlod_SCENIC.sh.log &

准备count数据

代码语言:Python
复制
from pathlib import Path

import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import loompy

OUTPUT_DIR = "output/05.SCENIC"
Path(OUTPUT_DIR).mkdir(parents=True,exist_ok=True)
adata = sc.read_h5ad('output/03.inferCNV/adata.h5')
adata_raw = adata.raw.to_adata()
rownames = dict(Gene=adata_raw.var_names.tolist())
colnames = dict(CellID=adata_raw.obs_names.tolist())
loompy.create(filename=OUTPUT_DIR+"/X.loom", layers=adata_raw.X.transpose(), row_attrs=rownames, col_attrs=colnames)

pyscenic CLI

建议使用下边的docker运行这两个步骤

代码语言:shell
复制
# human
n_jobs=12
mtx_path=X.loom

dir=/home/victor/DataHub/SCENIC
tfs=$dir/hs_hgnc_tfs.txt
feather=$dir/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
tbl=$dir/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl

# STEP 1/3: Gene regulatory network inference, and generation of co-expression modules
pyscenic grn \
    --seed 1314 \
    --num_workers ${n_jobs} \
    --method grnboost2 \
    --output grn.csv \
    --sparse \
    ${mtx_path} \
    ${tfs}


# STEP 2/3: Regulon prediction (cisTarget)
pyscenic ctx \
    --num_workers ${n_jobs} \
    --mode dask_multiprocessing \
    --mask_dropouts \
    --sparse \
    --output ctx.csv \
    --expression_mtx_fname ${mtx_path} \
    --annotations_fname ${tbl} \
    grn.csv \
    ${feather}
  • 运行pySCENIC
代码语言:shell
复制
cd output/05.SCENIC
micromamba activate SC
nohup zsh ~/Project/SC10X/src/run_SCENIC.sh &> run_SCENIC.sh.log &

docker运行脚本

  • docker脚本
代码语言:shell
复制
#>>>scenic_docker.sh>>>
n_jobs=20
input_dir=/home/victor/DataHub/SCENIC
output_dir=/home/victor/Project/SC10X/output/05.SCENIC

# STEP 1/3: Gene regulatory network inference, and generation of co-expression modules
docker run --rm \
    -v ${input_dir}:/input_data \
    -v ${output_dir}:/output_data \
    aertslab/pyscenic:0.12.1 pyscenic grn \
        --num_workers ${n_jobs} \
        --method grnboost2 \
        --output /output_data/grn.csv \
        --sparse \
        /input_data/X.loom \
        /input_data/hs_hgnc_tfs.txt

# STEP 2/3: Regulon prediction (cisTarget)
docker run --rm \
    -v ${input_dir}:/input_data \
    -v ${output_dir}:/output_data \
    aertslab/pyscenic:0.12.1 pyscenic ctx \
        /output_data/grn.csv \
        /input_data/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
        --mask_dropouts \
        --sparse \
        --annotations_fname /input_data/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl \
        --expression_mtx_fname /output_data/X.loom \
        --mode "custom_multiprocessing" \
        --output /output_data/ctx.csv \
        --num_workers ${n_jobs}
#<<<scenic_docker.sh<<<
  • 运行
代码语言:shell
复制
cd output/05.SCENIC
nohup zsh ~/Project/SC10X/src/scenic_docker.sh &> scenic_docker.sh.log &

在Jupyter中运行

代码语言:shell
复制
from pathlib import Path
import operator
import cytoolz

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scanpy as sc
from pyscenic.utils import load_motifs
from pyscenic.aucell import aucell
from pyscenic.binarization import binarize
from pyscenic.plotting import plot_binarization, plot_rss
from pyscenic.transform import df2regulons

import bioquest as bq #https://jihulab.com/BioQuest/bioquest
代码语言:shell
复制
OUTPUT_DIR='output/05.SCENIC'
Path(OUTPUT_DIR).mkdir(parents=True,exist_ok=True)
代码语言:shell
复制
adata = sc.read_h5ad('output/03.inferCNV/adata.h5')
adata_raw = adata.raw.to_adata()

自定义函数

代码语言:shell
复制
def filter_regulons(motifs, db_names=("hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings",)):
    """
    从ctx.csv中筛选重要的regulons,之后再运行AUCell
    """
    motifs.columns = motifs.columns.droplevel(0)
    def contains(*elems):
        def f(context):
            return any(elem in context for elem in elems)
        return f
    # For the creation of regulons we only keep the 10-species databases and the activating modules. We also remove the
    # enriched motifs for the modules that were created using the method 'weight>50.0%' (because these modules are not part
    # of the default settings of modules_from_adjacencies anymore.
    lg = np.fromiter(map(cytoolz.compose(operator.not_, contains('weight>50.0%')), motifs.Context), dtype=np.bool) & \
        np.fromiter(map(contains(*db_names), motifs.Context), dtype=np.bool) & \
        np.fromiter(map(contains('activating'), motifs.Context), dtype=np.bool)
    motifs = motifs.loc[lg,:]

    # We build regulons only using enriched motifs with a NES of 3.0 or higher; we take only directly annotated TFs or TF annotated
    # for an orthologous gene into account; and we only keep regulons with at least 10 genes.
    regulons = list(filter(lambda r: len(r) >= 10, 
        df2regulons(motifs[(motifs.NES >= 3.0) 
        & ((motifs['Annotation'] == 'gene is directly annotated')
        | (motifs['Annotation'].str.startswith('gene is orthologous to')
            & motifs['Annotation'].str.endswith('which is directly annotated for motif')))
        ])))

    # Rename regulons, i.e. remove suffix.
    return list(map(lambda r: r.rename(r.transcription_factor), regulons))

STEP 3/3: Cellular enrichment (AUCell)

  • 从ctx.csv中筛选重要的regulons,之后再运行AUCell
代码语言:shell
复制
df_motifs = load_motifs("output/05.SCENIC/ctx.csv")
regulons = filter_regulons(df_motifs)
  • count数据
代码语言:shell
复制
exp_mtx=pd.DataFrame(adata_raw.X.toarray(),columns=adata_raw.var_names,index=adata_raw.obs_names)
  • 运行aucell
代码语言:shell
复制
auc_mtx = aucell(exp_mtx=exp_mtx, signatures=regulons,seed=1314, num_workers=12) 
auc_mtx.to_csv(OUTPUT_DIR+"/aucell.csv")

STEP 4/3: Regulon activity binarization (optional)

代码语言:shell
复制
auc_mtx=pd.read_csv(OUTPUT_DIR+"/aucell.csv", index_col=0)
bin_mtx, thresholds = binarize(auc_mtx,seed=1314,num_workers=12)
bin_mtx.to_csv("bin_mtx.csv")
thresholds.to_frame().rename(columns={0:'threshold'}).to_csv("thresholds.csv")

输出文件

  • pyscenic grn

Input: raw count或者log后的count

Output: List of adjacencies between a TF and its targets stored in grn.csv

  • pyscenic ctx

Output: List of adjacencies between a TF and its targets stored in ctx.csv

  • pyscenic aucell

Output: aucell.csv

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 准备环境
    • 准备数据库文件
      • 准备count数据
        • pyscenic CLI
          • docker运行脚本
            • 在Jupyter中运行
              • 自定义函数
                • STEP 3/3: Cellular enrichment (AUCell)
                  • STEP 4/3: Regulon activity binarization (optional)
                    • 输出文件
                    相关产品与服务
                    容器镜像服务
                    容器镜像服务(Tencent Container Registry,TCR)为您提供安全独享、高性能的容器镜像托管分发服务。您可同时在全球多个地域创建独享实例,以实现容器镜像的就近拉取,降低拉取时间,节约带宽成本。TCR 提供细颗粒度的权限管理及访问控制,保障您的数据安全。
                    领券
                    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档