前往小程序,Get更优阅读体验!
立即前往
发布
社区首页 >专栏 >一步一个坑:单细胞数据的h5ad格式转换成R可读取对象

一步一个坑:单细胞数据的h5ad格式转换成R可读取对象

作者头像
生信技能树
发布2025-02-20 12:19:44
发布2025-02-20 12:19:44
10700
代码可运行
举报
文章被收录于专栏:生信技能树
运行总次数:0
代码可运行

我们生信技能树的马拉松授课群里有个学员在分析单细胞,然后这个单细胞数据是个h5ad,但是呢他又不会python,可急坏了:

报错:

数据为:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE222427

曲折前奏

一开始想挺简单的吧,随手搜了一个帖子:读取h5ad格式的单细胞文件

第一步就报错:

代码语言:javascript
代码运行次数:0
复制
library(SeuratDisk)
错误: package or namespace load failed for ‘SeuratDisk’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
 不存在叫‘hdf5r’这个名字的程辑包

缺啥就就安装它:

代码语言:javascript
代码运行次数:0
复制
## 使用西湖大学的 Bioconductor镜像
options(BioC_mirror="https://mirrors.westlake.edu.cn/bioconductor")
options("repos"=c(CRAN="https://mirrors.westlake.edu.cn/CRAN/"))

# 安装
devtools::install_github("hhoeflin/hdf5r")

然后再次尝试:

代码语言:javascript
代码运行次数:0
复制
# h5ad读取方式
# 自己安装  mojaveazure/seurat-disk 这个GitHub包:
# remotes::install_github("mojaveazure/seurat-disk")
library(SeuratDisk)
library(patchwork)
#~~~~~开始读数据~~~~~
## h5ad是python的Scanpy读取文件格式,需要转换
#~~~~读取adipose~~~~

Convert('./GSE222427/GSM6923183_MC_scRNA.h5ad', "h5seurat", overwrite = TRUE,assay = "RNA")

动态如下:

代码语言:javascript
代码运行次数:0
复制
Warning: Unknown file type: h5ad
Creating h5Seurat file for version 3.1.5.9900
Adding X as data
Adding X as counts
Adding meta.features from var
Adding X_pca as cell embeddings for pca
Adding X_pca_harmony as cell embeddings for pca_harmony
Adding X_umap as cell embeddings for umap
Adding miscellaneous information for pca
Adding standard deviations for pca
Adding miscellaneous information for umap
Adding Major_cluster_colors to miscellaneous data
Adding Phenotype_colors to miscellaneous data
Adding donor_id_colors to miscellaneous data
Adding hvg to miscellaneous data
Adding leiden to miscellaneous data
Adding leiden_colors to miscellaneous data
Adding log1p to miscellaneous data
Adding majority_voting_colors to miscellaneous data
Adding predicted_labels_NC2022_colors to miscellaneous data
Adding predicted_labels_colors to miscellaneous data
Adding rank_genes_groups to miscellaneous data
Adding layer ambiguous as data in assay ambiguous
Adding layer ambiguous as counts in assay ambiguous
Adding layer counts as data in assay counts
Adding layer counts as counts in assay counts
Adding layer matrix as data in assay matrix
Adding layer matrix as counts in assay matrix
Adding layer spliced as data in assay spliced
Adding layer spliced as counts in assay spliced
Adding layer unspliced as data in assay unspliced
Adding layer unspliced as counts in assay unspliced

目录下多了一个文件:GSM6923183_MC_scRNA.h5seurat

再读取进来:

代码语言:javascript
代码运行次数:0
复制
sce.all <- LoadH5Seurat("./GSE222427/GSM6923183_MC_scRNA.h5seurat")

真是一步一报错:

代码语言:javascript
代码运行次数:0
复制
Validating h5Seurat file
错误: Ambiguous assays

随手一搜,找到一个帖子:https://zhuanlan.zhihu.com/p/12861008987#:~:text=%E4%BD%BF%E7%94%A8%20SeuratDisk%20%E5%8C%85%E4%B8%AD%E7%9A%84%20LoadH5Seurat%20%E5%87%BD%E6%95%B0%E6%97%B6%E6%8A%A5%E9%94%99%EF%BC%9A,%E8%A7%A3%E5%86%B3%E5%8A%9E%E6%B3%95%EF%BC%9A%20overwrite%20%3D%20TRUE%2Cassay%20%3D%20%22RNA%22%29

他后面的报错也精准踩雷:

代码语言:javascript
代码运行次数:0
复制
Validating h5Seurat file
Initializing RNA with data
Adding counts for RNA
Adding feature-level metadata for RNA
错误: Missing required datasets 'levels' and 'values'

但是上面的链接里面的方法不奏效,去github上看了一下,这个问题吵得真热闹啊!!!:https://github.com/mojaveazure/seurat-disk/issues/109,超多人一样的报错,就是没有人出来解决!试试看这个页面最后提到的办法吧:

代码语言:javascript
代码运行次数:0
复制
library("Seurat")
library("anndata")
print("Convert from Scanpy to Seurat...")
data <- read_h5ad("example.hd5ad")
data <- CreateSeuratObject(counts = t(data$X), meta.data = data$obs)
print(str(data))

还是报错!如果是初学者,这个时候可能已经崩溃了!@!

终极大招

又又又搜了一个办法

然后又搜了一个办法,生信技能树的优秀学徒写的:单细胞Seruat和h5ad数据格式互换(R与python)方法学习和整理

这里需要创建python的conda环境sc,以及R环境中的Seurat为V4:

代码语言:javascript
代码运行次数:0
复制
# 其他方法:R代码
# install.packages("anndata")
library(sceasy)
library(reticulate)
library(Seurat)
# v4
packageVersion("Seurat")
# [1] ‘4.4.0’
library(BiocParallel)
register(MulticoreParam(workers = 4, progressbar = TRUE)) 
use_condaenv(condaenv = '/nas2/zhangj/biosoft/miniconda3/envs/sc')
# h5ad转为Seurat
sceasy::convertFormat(obj = "GSE222427/GSM6923183_MC_scRNA.h5ad", from="anndata",to="seurat",outFile = 'scRNA.rds')

# 
# X -> counts
# An object of class Seurat
# 20320 features across 53748 samples within 1 assay
# Active assay: RNA (20320 features, 0 variable features)
# 2 layers present: counts, data
# 3 dimensional reductions calculated: pca, pca_harmony, umap

读取进来看看:

R环境中的Seurat为V4

代码语言:javascript
代码运行次数:0
复制
sce.all <- readRDS("scRNA.rds")
sce.all

# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
colnames(sce.all@meta.data)
table(sce.all$donor_id) 
table(sce.all$batch) 
sce.all$orig.ident <- sce.all$donor_id

成功:

代码语言:javascript
代码运行次数:0
复制
An object of class Seurat 
20320 features across 53748 samples within 1 assay 
Active assay: RNA (20320 features, 0 variable features)
 2 layers present: counts, data
 3 dimensional reductions calculated: pca, pca_harmony, umap

变成 v5对象

这里读取到seuart v5的R环境中:

代码语言:javascript
代码运行次数:0
复制
# 变成v5
packageVersion("Seurat")
# [1] ‘5.1.0’
sce.all <- readRDS("scRNA.rds")
sce.all

sce.all_v5 <- CreateSeuratObject(counts = sce.all@assays$RNA@counts, meta.data = sce.all@meta.data)
sce.all_v5
sce.all_v5$orig.ident <- sce.all_v5$donor_id
head(sce.all_v5@meta.data)

library(qs)
qsave(sce.all_v5, file="GSE222427/sce.all_v5.qs")

更进一步:如果是使用python呢?

那就超级简单了:你需要首先在Python里面,先导出python中的annadata主要数据。

创建文件:touch h5ad2rds.ipynb,使用vscode打开h5ad2rds.ipynb

逐步运行:

代码语言:javascript
代码运行次数:0
复制
# python中导出数据
import scipy.sparse as sparse
import scipy.io as sio
import scipy.stats as stats
import numpy as np
import scanpy as sc
import os

读取:

代码语言:javascript
代码运行次数:0
复制
all_data = sc.read_h5ad("./GSE222427/GSM6923183_MC_scRNA.h5ad")
all_data

all_data.var.head()
cellinfo = all_data.obs
cellinfo = all_data.obs
geneinfo = all_data.var
mtx=all_data.X

# 保存
cellinfo.to_csv("cellinfo.csv")
geneinfo.to_csv("geneinfo.csv")
sio.mmwrite("sparse_matrix.mtx",mtx)

读取到R中创建 seurat 对象:

代码语言:javascript
代码运行次数:0
复制
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(ggplot2)
library(patchwork)
library(stringr)
library(qs)
library(Matrix)

mtx <- readMM( "sparse_matrix.mtx" )
mtx[1:4,1:4]
dim(mtx)

cl <- fread( "cellinfo.csv", header = T,data.table = F ) 
head(cl)
dim(cl)

rl <- fread( "geneinfo.csv", header = T,data.table = F ) 
head(rl)
dim(rl)

rownames(mtx) <- cl$V1
colnames(mtx) <- rl$V1 #paste0('c',cl$V2)
mtx[1:4,1:4]
mtx <- t(mtx)
dim(mtx)

meta <- cl
rownames(meta) <- cl$V1
identical(rownames(meta),colnames(mtx))

library(Seurat)
sce.all <- CreateSeuratObject(counts = mtx,  meta.data = meta, min.cells = 5)
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)

完美!

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2025-02-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 报错:
  • 数据为:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE222427
  • 曲折前奏
    • 第一步就报错:
    • 缺啥就就安装它:
    • 然后再次尝试:
    • 动态如下:
    • 再读取进来:
    • 真是一步一报错:
    • 他后面的报错也精准踩雷:
  • 终极大招
    • 又又又搜了一个办法
    • 读取进来看看:
    • 成功:
    • 变成 v5对象
  • 更进一步:如果是使用python呢?
    • 逐步运行:
    • 读取:
    • 读取到R中创建 seurat 对象:
    • 完美!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档