前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用velocyto进行bam转loom吐血踩坑记录

使用velocyto进行bam转loom吐血踩坑记录

作者头像
生信技能树jimmy
发布2021-09-15 15:43:03
3.6K1
发布2021-09-15 15:43:03
举报
文章被收录于专栏:单细胞天地单细胞天地

分享是一种态度

报错信息如下:

代码语言:javascript
复制
Traceback (most recent call last):
  File "miniconda3/envs/velocyto/bin/velocyto", line 8, in <module>
    sys.exit(cli())
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/velocyto/commands/run10x.py", line 112, in run10x
    return _run(bamfile=(bamfile, ), gtffile=gtffile, bcfile=bcfile, outputfolder=outputfolder,
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/velocyto/commands/_run.py", line 196, in _run
    mask_ivls_by_chromstrand = exincounter.read_repeats(repmask)
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/velocyto/counter.py", line 347, in read_repeats
    gtf_lines = sorted(gtf_lines, key=sorting_key)
  File "miniconda3/envs/velocyto/lib/python3.8/site-packages/velocyto/counter.py", line 345, in sorting_key
    return (x[0], x[6] == "+", int(x[3]), entry)  # The last element of the touple corresponds to the `last resort comparison`
IndexError: list index out of range

运行日志如下:

代码语言:javascript
复制
2021-09-03 01:07:06,282 - DEBUG - Using logic: Default
2021-09-03 01:07:06,295 - INFO - Read 13734 cell barcodes from Cell_ranger_count/200739E_GDLC_PWZ/outs/filtered_feature_bc_matrix/barcodes.tsv.gz
2021-09-03 01:07:06,296 - DEBUG - Example of barcode: AAACCCAAGATAGTCA and cell_id: 200739E_GDLC_PWZ:AAACCCAAGATAGTCA-1
2021-09-03 01:07:06,306 - WARNING - Your system does not support calling `grep MemAvailable /proc/meminfo` so the memory effort for the samtools command could not be chosen appropriately. 32Gb will be assumed
2021-09-03 01:07:06,306 - DEBUG - Peeking into Cell_ranger_count/200739E_GDLC_PWZ/outs/possorted_genome_bam.bam
2021-09-03 01:07:06,329 - WARNING - Not found cell and umi barcode in entry 10 of the bam file
2021-09-03 01:07:06,329 - WARNING - Not found cell and umi barcode in entry 63 of the bam file
2021-09-03 01:07:06,329 - WARNING - Not found cell and umi barcode in entry 240 of the bam file
2021-09-03 01:07:06,330 - WARNING - Not found cell and umi barcode in entry 544 of the bam file
2021-09-03 01:07:06,331 - WARNING - Not found cell and umi barcode in entry 853 of the bam file
2021-09-03 01:07:06,331 - WARNING - Not found cell and umi barcode in entry 855 of the bam file
2021-09-03 01:07:06,331 - WARNING - Not found cell and umi barcode in entry 860 of the bam file
2021-09-03 01:07:06,336 - WARNING - The file Cell_ranger_count/200739E_GDLC_PWZ/outs/cellsorted_possorted_genome_bam.bam already exists. The sorting step will be skipped and the existing file will be used.
2021-09-03 01:07:06,336 - INFO - Load the annotation from Cell_Ranger/genome/refdata-gex-GRCh38-2020-A/genes/genes.gtf
2021-09-03 01:07:12,783 - DEBUG - Parsing Chromosome GL000009.2 strand - [line 0]
2021-09-03 01:07:12,784 - DEBUG - Done with GL000009.2- [line 7]
2021-09-03 01:07:12,784 - DEBUG - Assigning indexes to genes
2021-09-03 01:07:12,784 - DEBUG - Seen 1 genes until now
2021-09-03 01:07:12,784 - DEBUG - Parsing Chromosome GL000194.1 strand - [line 8]
2021-09-03 01:07:12,784 - DEBUG - Done with GL000194.1- [line 33]
2021-09-03 01:07:12,784 - DEBUG - Assigning indexes to genes
...
...
2021-09-03 01:07:26,663 - DEBUG - Assigning indexes to genes
2021-09-03 01:07:26,664 - DEBUG - Done with Y+ [line 2765967]
2021-09-03 01:07:26,664 - DEBUG - Fixing corner cases of transcript models containg intron longer than 1000Kbp
2021-09-03 01:07:28,588 - DEBUG - Generated 2411536 features corresponding to 199138 transcript models from Cell_Ranger/genome/refdata-gex-GRCh38-2020-A/genes/genes.gtf
2021-09-03 01:07:28,611 - INFO - Load the repeat masking annotation from /share/nas5/zhangj/project/SingleCell_Analysis_Tools/velocyto/hg38_rmsk.gtf
2021-09-03 01:07:28,611 - DEBUG - Reading hg38_rmsk.gtf, the file will be sorted in memory

安装环境命令

代码语言:javascript
复制
# 创建环境并激活
conda create -y -n velocyto python=3.8
conda activate velocyto
python --version

# 安装依赖包
conda install -y numpy scipy cython numba matplotlib scikit-learn h5py click
conda install samtools

# 更新 setuptools 和 pip
pip install --upgrade setuptools
python -m pip install --upgrade pip
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pysam


# 安装velocyto
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple velocyto

# 检查是否成功
velocyto --help
Usage: velocyto [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  run            Runs the velocity analysis outputting a loom file
  run10x         Runs the velocity analysis for a Chromium Sample
  run-dropest    Runs the velocity analysis on DropEst preprocessed data
  run-smartseq2  Runs the velocity analysis on SmartSeq2 data (independent bam file per cell)
  tools          helper tools for velocyto

# 版本
velocyto, version 0.17.17


# 安装4.1.0版本的R
conda install r-base=4.1.0

# 依赖包
conda install -y bioconductor-pcamethods
conda install -y r-velocyto.r

运行代码

代码语言:javascript
复制
# 先给bam按照CB排序
nohup samtools sort -@ 10  -t CB -O BAM -o cellsorted_possorted_genome_bam.bam possorted_genome_bam.bam &

# 运行
velocyto/bin/velocyto run10x -m hg38_rmsk.gtf   Cell_ranger_count/200739C_GDLC_WXJ Cell_Ranger/genome/refdata-gex-GRCh38-2020-A/genes/genes.gtf

gtf下载自UCSC

gtf的内容

refdata-gex-GRCh38-2020-A/genes/genes.gtf

我的barcode里面的内容

代码语言:javascript
复制
zless -S outs/raw_feature_bc_matrix/features.tsv.gz

我一开始定位到了运行日志中的:

代码语言:javascript
复制
2021-09-03 01:07:06,329 - WARNING - Not found cell and umi barcode in entry 10 of the bam file
2021-09-03 01:07:06,329 - WARNING - Not found cell and umi barcode in entry 63 of the bam file
2021-09-03 01:07:06,329 - WARNING - Not found cell and umi barcode in entry 240 of the bam file
2021-09-03 01:07:06,330 - WARNING - Not found cell and umi barcode in entry 544 of the bam file
2021-09-03 01:07:06,331 - WARNING - Not found cell and umi barcode in entry 853 of the bam file
2021-09-03 01:07:06,331 - WARNING - Not found cell and umi barcode in entry 855 of the bam file
2021-09-03 01:07:06,331 - WARNING - Not found cell and umi barcode in entry 860 of the bam file

然后找了帖子看:https://github.com/velocyto-team/velocyto.py/issues?page=2&q=is%3Aissue+is%3Aopen

把基友网站的所有帖子都扫了一个遍,有几个相似的问题,特别是很多人提到上面这个"Not found cell and umi barcode"的问题。我看了我的数据里面有CB这个东西!!!

然后还问了吴老师他的日志有没有这个东西,他说有的!!!

我再看了我的日志,这个警告只是说我的bam有上面的少数几行没有barcode!!!

还有 index out of range:https://github.com/velocyto-team/velocyto.py/issues/183

仔细对比 他的index out of range跟我的不一样其实报错内容不一样

啊啊啊啊 到这里的时候,感觉要奔溃了。。。

我鬼使神差的想着我的repeat_mask.gtf文件是不是下载错了,然后重新去下载了一次

突然发现,前后两次下载的文件大小不一样

前后差的太多了也!!!一开始没在意,但是注意到了。

然后也突然看到了跟我一摸一样的bug的issue:https://github.com/velocyto-team/velocyto.py/issues/263

再次就是确定,因为这个文件下载不完整,导致出错!!!

完整的人的这个文件大小应该在550M左右!

啊,日常怀疑人生

没想到是这种这么弱智的东西导致我一整个晚上坐立不安辗转反侧

本来都是想好了这个帖子前后梳理一下我的过程,人生第一次要跟曾老大求助的!!!

运行日志开始跳过了repeat_mask.gtf部分往下走了:

代码语言:javascript
复制
2021-09-03 02:42:31,326 - DEBUG - Fixing corner cases of transcript models containg intron longer than 1000Kbp
2021-09-03 02:42:35,990 - DEBUG - Generated 2411536 features corresponding to 199138 transcript models from Cell_Ranger/genome/refdata-gex-GRCh38-2020-A/genes/genes.gtf
2021-09-03 02:42:36,015 - INFO - Load the repeat masking annotation from /share/nas5/zhangj/project/SingleCell_Analysis_Tools/velocyto/GRCh38_rmsk.gtf
2021-09-03 02:42:36,015 - DEBUG - Reading /share/nas5/zhangj/project/SingleCell_Analysis_Tools/velocyto/GRCh38_rmsk.gtf, the file will be sorted in memory
2021-09-03 02:43:17,308 - DEBUG - Processed masked annotation .gtf and generated 4525530 intervals to mask!
2021-09-03 02:43:17,661 - INFO - Scan Cell_ranger_count/200739E_GDLC_PWZ/outs/possorted_genome_bam.bam to validate intron intervals
2021-09-03 02:43:20,862 - DEBUG - Reading Cell_ranger_count/200739E_GDLC_PWZ/outs/possorted_genome_bam.bam
2021-09-03 02:43:20,881 - DEBUG - Read first 0 million reads
2021-09-03 02:43:20,881 - DEBUG - Marking up chromosome 1
2021-09-03 02:45:19,495 - DEBUG - Read first 10 million reads

祝大家开心科研每一天!!!

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-09-08,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 单细胞天地 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 报错信息如下:
  • 运行日志如下:
  • 安装环境命令
  • 运行代码
  • gtf下载自UCSC
    • gtf的内容
      • 我的barcode里面的内容
      • 我一开始定位到了运行日志中的:
      • 运行日志开始跳过了repeat_mask.gtf部分往下走了:
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档