前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >不知道10x单细胞转录组样品和fastq文件的对应关系

不知道10x单细胞转录组样品和fastq文件的对应关系

作者头像
生信技能树
发布2022-03-03 13:08:29
1.4K0
发布2022-03-03 13:08:29
举报
文章被收录于专栏:生信技能树

我们在单细胞天地多次分享过cellranger流程的笔记,大家可以自行前往学习,如下:

因为这个流程其实是需要10X单细胞转录组的fastq文件,而且呢,命名是有规则的!

如果你的样品被分散到了多个library、flowcell,就会出现一个样品有84个fastq文件的情况,恰好我看到了一个文献里面的数据就是这样的情况。该研究于2021年3月发表在《Nature Communications》杂志的文章, 标题是:《Time-resolved single-cell analysis of Brca1 associated mammary tumourigenesis reveals aberrant differentiation of luminal progenitors》,链接是:https://www.nature.com/articles/s41467-021-21783-3

如下所示的一个样品,是SIGAA11, 足足有84个fastq文件,如果你仔细观察这84个fastq文件的名字,就会发现规律,如果以下划线为分隔符,那么

  • 第2列是S37到S40这4种情况
  • 第3列是L003到L009这7种情况
  • 第五列是R1,R1,I1这样的3种情况

总共就是 4x7x3=84个fastq文件。

当然了,并不是每个10X样品都有84个fastq文件哈。甚至绝大多数情况下,就3个文件,或者两个文件,都是可以跑我们前面分享过cellranger流程。

接下来我要介绍的一种特殊情况是,有44个fastq文件,但是却没办法对应到10x的样品:

代码语言:javascript
复制
SRR15860129 week 11 MMTV-PyMT Week11-1
SRR15860128 week 11 MMTV-PyMT Week11-2
SRR15860127 week 11 MMTV-PyMT Week11-3
SRR15860126 week 11 MMTV-PyMT Week11-4
SRR15860125 week 17 MMTV-PyMT Week17-1
SRR15860124 week 17 MMTV-PyMT Week17-2
SRR15860123 week 17 MMTV-PyMT Week17-3
SRR15860122 week 17 MMTV-PyMT Week17-4
SRR15860120 week 17 MMTV-PyMT Week17-5
SRR15860119 week 17 MMTV-PyMT Week17-6
SRR15860118 week 17 MMTV-PyMT Week17-7
SRR15860117 week 17 MMTV-PyMT Week17-8
SRR15860155 week 7 MMTV-PyMT Week7-1
SRR15860154 week 7 MMTV-PyMT Week7-2
SRR15860143 week 7 MMTV-PyMT Week7-3
SRR15860132 week 7 MMTV-PyMT Week7-4
SRR15860121 week 9 MMTV-PyMT Week9-1
SRR15860150 week 9 MMTV-PyMT Week9-10
SRR15860149 week 9 MMTV-PyMT Week9-11
SRR15860148 week 9 MMTV-PyMT Week9-12
SRR15860147 week 9 MMTV-PyMT Week9-13
SRR15860146 week 9 MMTV-PyMT Week9-14
SRR15860145 week 9 MMTV-PyMT Week9-15
SRR15860144 week 9 MMTV-PyMT Week9-16
SRR15860142 week 9 MMTV-PyMT Week9-17
SRR15860141 week 9 MMTV-PyMT Week9-18
SRR15860140 week 9 MMTV-PyMT Week9-19
SRR15860116 week 9 MMTV-PyMT Week9-2
SRR15860139 week 9 MMTV-PyMT Week9-20
SRR15860138 week 9 MMTV-PyMT Week9-21
SRR15860137 week 9 MMTV-PyMT Week9-22
SRR15860136 week 9 MMTV-PyMT Week9-23
SRR15860135 week 9 MMTV-PyMT Week9-24
SRR15860134 week 9 MMTV-PyMT Week9-25
SRR15860133 week 9 MMTV-PyMT Week9-26
SRR15860131 week 9 MMTV-PyMT Week9-27
SRR15860130 week 9 MMTV-PyMT Week9-28
SRR15860115 week 9 MMTV-PyMT Week9-3
SRR15860114 week 9 MMTV-PyMT Week9-4
SRR15860113 week 9 MMTV-PyMT Week9-5
SRR15860112 week 9 MMTV-PyMT Week9-6
SRR15860153 week 9 MMTV-PyMT Week9-7
SRR15860152 week 9 MMTV-PyMT Week9-8
SRR15860151 week 9 MMTV-PyMT Week9-9

可以看到,如果是按照第二列,这44个fastq文件,应该是属于4个10x样品,所以跑四次cellranger流程即可,但是如果你这样改名,去运行cellranger流程,就会报错。

为了解决这个问题,我首先对这44个fastq文件各自独立跑cellranger流程,得到的结果批量读取,代码如下所示:

代码语言:javascript
复制
rm(list=ls())
library(data.table)
dir='/home/data/jmzeng/scRNA/mice-4-stage/tmp/matrix' 
# 这个文件夹下面是44个fastq文件各自独立跑cellranger流程的矩阵文件夹 
samples=list.files( dir  )
samples 
sceList = lapply(samples,function(pro){ 
  # pro=samples[1]
  folder=file.path( dir ,pro) 
  print(pro)
  print(folder)
  print(list.files(folder))
  sce=CreateSeuratObject(counts = Read10X(folder),
                         project =  pro ,
                         min.cells = 5,
                         min.features = 300)
  
  return(sce)
})
names(sceList) 
 

如果这44个fastq文件属于不同的10x样品,它们独立走cellranger流程的时候的细胞barcodes理论上应该是会有交集,所以我做了一个统计。

代码语言:javascript
复制

out = do.call(rbind,
        lapply(1:length(sceList), function(i){
          do.call(rbind,
                  lapply(1:length(sceList), function(j){
                    sample1 <- as.data.frame(sceList[[i]]@meta.data)
                    sample1_barcodes <- as.numeric(length(sample1$orig.ident))
                    sample2 <- as.data.frame(sceList[[j]]@meta.data)
                    sample2_barcodes <- as.numeric(length(sample2$orig.ident))
                    both <- intersect(rownames(sample1),rownames(sample2))
                    both_barcodes <- as.numeric(length(both))
                    rate <- as.numeric(both_barcodes/(sample1_barcodes+sample2_barcodes))
                    c(sample1_name=as.character(unique(sample1$orig.ident)),
                      sample1_barcodes=sample1_barcodes,
                      sample2_name=as.character(unique(sample2$orig.ident)),
                      sample2_barcodes=sample2_barcodes,
                      rate=rate)
                  }))
        }))
save(out,file = 'out.Rdata') 

对这44个fastq文件的单细胞矩阵进行组合比较:

代码语言:javascript
复制
> head(out)
     sample1_name  sample1_barcodes sample2_name  sample2_barcodes rate                  
[1,] "SRR15860112" "2197"           "SRR15860112" "2197"           "0.5"                 
[2,] "SRR15860112" "2197"           "SRR15860113" "1923"           "0.465776699029126"   
[3,] "SRR15860112" "2197"           "SRR15860114" "2746"           "0.00101153145862836" 
[4,] "SRR15860112" "2197"           "SRR15860115" "2768"           "0.00100704934541793" 
[5,] "SRR15860112" "2197"           "SRR15860116" "2811"           "0.000998402555910543"
[6,] "SRR15860112" "2197"           "SRR15860117" "5100"           "0.00315198026586268"

如果两个结果配对的细胞barcodes重合度很高,就说明是同一个样品。如下所示,可以看到SRR15860112和SRR15860113就是同一个样品。

代码语言:javascript
复制
library(tidyr)
outs <- as.data.frame(out[,-c(2,4)] )
library(reshape2)
out_wide <- dcast(outs,sample1_name~sample2_name) 
name=out_wide[,1]
out_wide=out_wide[,-1]
out_wide <- apply(out_wide,2,as.numeric)
rownames(out_wide) <- name

library(pheatmap)

如下所示:

有一些样品的细胞barcodes重合度很高

可以看到我们的44个fastq文件,应该是属于7个样品,所以我输出如下所示的文件列表:

代码语言:javascript
复制
head(tmp)
                    srr hc     age
SRR15860126 SRR15860126  5 week 11
SRR15860127 SRR15860127  5 week 11
SRR15860128 SRR15860128  5 week 11
SRR15860129 SRR15860129  5 week 11
SRR15860117 SRR15860117  3 week 17
SRR15860118 SRR15860118  3 week 17

然后去改名即可:

代码语言:javascript
复制
1_S1_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860146_1.fastq.gz
1_S1_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860146_2.fastq.gz
1_S1_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860147_1.fastq.gz
1_S1_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860147_2.fastq.gz
1_S1_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860148_1.fastq.gz
1_S1_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860148_2.fastq.gz
1_S1_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860149_1.fastq.gz
1_S1_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860149_2.fastq.gz
1_S1_L005_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860150_1.fastq.gz
1_S1_L005_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860150_2.fastq.gz
1_S1_L006_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860151_1.fastq.gz
1_S1_L006_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860151_2.fastq.gz
1_S1_L007_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860152_1.fastq.gz
1_S1_L007_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860152_2.fastq.gz
1_S1_L008_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860153_1.fastq.gz
1_S1_L008_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860153_2.fastq.gz
1_S2_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860112_1.fastq.gz
1_S2_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860112_2.fastq.gz
1_S2_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860113_1.fastq.gz
1_S2_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860113_2.fastq.gz
1_S2_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860144_1.fastq.gz
1_S2_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860144_2.fastq.gz
1_S2_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860145_1.fastq.gz
1_S2_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860145_2.fastq.gz

2_S1_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860114_1.fastq.gz
2_S1_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860114_2.fastq.gz
2_S1_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860115_1.fastq.gz
2_S1_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860115_2.fastq.gz
2_S1_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860116_1.fastq.gz
2_S1_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860116_2.fastq.gz
2_S1_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860121_1.fastq.gz
2_S1_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860121_2.fastq.gz

3_S1_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860117_1.fastq.gz
3_S1_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860117_2.fastq.gz
3_S1_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860118_1.fastq.gz
3_S1_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860118_2.fastq.gz
3_S1_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860119_1.fastq.gz
3_S1_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860119_2.fastq.gz
3_S1_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860120_1.fastq.gz
3_S1_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860120_2.fastq.gz

4_S1_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860122_1.fastq.gz
4_S1_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860122_2.fastq.gz
4_S1_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860123_1.fastq.gz
4_S1_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860123_2.fastq.gz
4_S1_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860124_1.fastq.gz
4_S1_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860124_2.fastq.gz
4_S1_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860125_1.fastq.gz
4_S1_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860125_2.fastq.gz

5_S1_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860126_1.fastq.gz
5_S1_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860126_2.fastq.gz
5_S1_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860127_1.fastq.gz
5_S1_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860127_2.fastq.gz
5_S1_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860128_1.fastq.gz
5_S1_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860128_2.fastq.gz
5_S1_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860129_1.fastq.gz
5_S1_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860129_2.fastq.gz

6_S1_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860130_1.fastq.gz
6_S1_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860130_2.fastq.gz
6_S1_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860131_1.fastq.gz
6_S1_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860131_2.fastq.gz
6_S1_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860133_1.fastq.gz
6_S1_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860133_2.fastq.gz
6_S1_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860134_1.fastq.gz
6_S1_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860134_2.fastq.gz
6_S1_L005_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860135_1.fastq.gz
6_S1_L005_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860135_2.fastq.gz
6_S1_L006_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860136_1.fastq.gz
6_S1_L006_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860136_2.fastq.gz
6_S1_L007_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860137_1.fastq.gz
6_S1_L007_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860137_2.fastq.gz
6_S1_L008_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860138_1.fastq.gz
6_S1_L008_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860138_2.fastq.gz
6_S2_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860139_1.fastq.gz
6_S2_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860139_2.fastq.gz
6_S2_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860140_1.fastq.gz
6_S2_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860140_2.fastq.gz
6_S2_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860141_1.fastq.gz
6_S2_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860141_2.fastq.gz
6_S2_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860142_1.fastq.gz
6_S2_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860142_2.fastq.gz

7_S1_L001_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860132_1.fastq.gz
7_S1_L001_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860132_2.fastq.gz
7_S1_L002_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860143_1.fastq.gz
7_S1_L002_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860143_2.fastq.gz
7_S1_L003_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860154_1.fastq.gz
7_S1_L003_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860154_2.fastq.gz
7_S1_L004_R1_001.fastq.gz -> /home/PRJNA762594/SRR15860155_1.fastq.gz
7_S1_L004_R2_001.fastq.gz -> /home/PRJNA762594/SRR15860155_2.fastq.gz

然后,这样的文件,就可以跑cellranger流程流程啦。

最后得到的降维聚类分群和生物学命名,如下所示:

image-20220223150214864

如果是单细胞常规分析可以参考前面的例子:人人都能学会的单细胞聚类分群注释 ,我们演示了第一层次的分群。如果你对单细胞数据分析还没有基础认知,可以看基础10讲:

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-02-23,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档