前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >CellPhoneDB 更新4.0 版速度更快

CellPhoneDB 更新4.0 版速度更快

原创
作者头像
生信探索
发布2023-03-31 21:01:23
9000
发布2023-03-31 21:01:23
举报
文章被收录于专栏:生信探索

之前版本的CellPhoneDB依赖的anndata不兼容导致使用h5ad的文件作为count matrix输入报错,没想到CellPhoneDB更新到了4.0解决了这个问题,而且运行速度超级快几分钟就跑完了10几万细胞的主要流程

代码语言:Python
复制
cd ~
mamba create -n cpdb python=3.8
mamba activate cpdb
mamba install -y ipykernel numpy pandas scikit-learn
pip install cellphonedb gseapy -i https://pypi.tuna.tsinghua.edu.cn/simple

下载数据库文件

https://www.cellphonedb.org

Count data

  1. 如果是人的基因就直接使用adata_noramlised_annotated.h5ad文件就可以不需要下边的步骤
  2. 如果是小鼠基因,则需要把基因名转换为人的基因名
  3. 输入的count文件也可以是文本文件,但是h5ad文件速度更快
代码语言:Python
复制
from gseapy import Biomart
import pandas as pd

bm = Biomart()
m2h_df = bm.query(dataset='mmusculus_gene_ensembl',
               attributes=['ensembl_gene_id','external_gene_name',
                           'hsapiens_homolog_ensembl_gene',
                           'hsapiens_homolog_associated_gene_name'])
m2h=m2h_df.dropna(subset=['hsapiens_homolog_associated_gene_name'])
m2h=m2h.loc[:,['external_gene_name','hsapiens_homolog_associated_gene_name']]
代码语言:Python
复制
import anndata

adata = anndata.read_h5ad('adata_noramlised_annotated.h5ad')
adata_raw=adata.raw.to_adata()
bdata=anndata.AnnData(X=adata_raw.X,
    obs=pd.DataFrame({'cell_type':adata_raw.obs.CellType2},index=adata_raw.obs_names),
    var=pd.DataFrame(index=adata_raw.var_names)
    )
merged =pd.merge(bdata.var,m2h,left_index=True,right_on='external_gene_name')
bdata=bdata[:,merged.external_gene_name]
bdata.var_names=merged.hsapiens_homolog_associated_gene_name.values
bdata.write_h5ad('adata_for_cellphonedb.h5ad',compression='lzf')

Meta data

代码语言:Python
复制
meta_file = pd.DataFrame({'Cell':bdata.obs.index,'cell_type':bdata.obs.cell_type})
meta_file.to_csv("meta_file.csv",index=False)
  • 删除不需要的变量
代码语言:Python
复制
del adata,meta_file,bdata,merged,m2h

cpdb_statistical_analysis_method

代码语言:Python
复制
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
    cpdb_file_path = './cellphonedb.zip',            # mandatory: CellPhoneDB database zip file.
    meta_file_path = "./meta_file.csv",              # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = './adata_for_cellphonedb.h5ad',# mandatory: normalized count matrix.
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
    microenvs_file_path = None,                      # optional (default: None): defines cells per microenvironment.
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 8,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.05,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = './',                              # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )

文件目录

代码语言:Python
复制
├── adata.h5ad
├── meta_file.csv
├── statistical_analysis_deconvoluted_03_30_2023_11:11:04.txt
├── statistical_analysis_means_03_30_2023_11:11:04.txt
├── statistical_analysis_pvalues_03_30_2023_11:11:04.txt
└── statistical_analysis_significant_means_03_30_2023_11:11:04.txt

输出文件和可视化可以参考

https://mp.weixin.qq.com/s/FG8oQJEoM1BRclcaSFC3mw

Reference

代码语言:Python
复制
https://cellphonedb.readthedocs.io/en/latest/RESULTS-DOCUMENTATION.html

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 下载数据库文件
  • Count data
  • Meta data
  • cpdb_statistical_analysis_method
  • Reference
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档