前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >fasta/fastq文件处理的瑞士军刀-seqtk

fasta/fastq文件处理的瑞士军刀-seqtk

作者头像
阿凡亮
发布2020-04-14 09:10:52
2.3K0
发布2020-04-14 09:10:52
举报
文章被收录于专栏:生物信息学生物信息学

引言

上次在只用一行颠覆你处理文件的方式里面说了可以用Seqtk来处理fasta/fastq文件。那么这一期就来讲讲怎么来使用seqtk。

Seqtk简介及安装

Seqtk是Heng Li(https://github.com/lh3)大神开发的一款用于处理fasta/fastq文件的工具,因其操作轻便且跨平台,继而受到广大科研人员的青睐,目前这个项目在github上已经被标星602次。Seqtk的安装就和Heng Li大神的图像一样简单

代码语言:javascript
复制
git clone https://github.com/lh3/seqtk.git;cd seqtk; make

这样seqtk的二进制文件就生成了。

Seqtk有几个常用的子命令

  • seq 就是最常用的子命令了

> seqtk seqUsage: seqtk seq [options] <in.fq>|<in.fa>Options: -q INT 将测序质量小于INT的碱基变成小写(默认为0) -X INT 测序质量大于INT的碱基变成小写(默认为255) -n CHAR 满足-q或-X条件下的碱基都被转成CHAR字符 -l INT 每一行碱基的数量,最大为2^32-1 -Q INT 测序质量在不同测序平台中的质量偏移量(默认为33) -s INT 随机种子(默认为11) -f FLOAT 随机取用户提供比例的子序列(默认是全部序列) -M FILE 用BED格式或含有序列名的文件来将所选序列变成小写 -L INT 丢弃长度小于一定长度的序列 -c 互补 -r 反向互补 -A 强制将序列转化为FASTA格式 -C 去掉文件头中的注释行 -N 去掉含有不确定碱基的行 -1 输出奇数行的reads -2 输出偶数行的reads -V 通过'(-Q) - 33'来改变质量值 -U 将所有的碱基变成大写 -S 去掉序列里面的空白

我们可以将fastq文件长序列折叠成多行短序列(-l),反向互补(-r),并生成fasta文件(-A)

代码语言:javascript
复制
> head -8 test.fq@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF
# 用60bp的长度对序列进行折叠> seqtk seq -l 60 -r -A test.fq | head -8>A00679:63:HGVWCDSXX:4:1271:5927:18176GCAGCAATGGCGGCCCGCCCCGTTGTTTGAGTCACGGCAACAACGGCCGCCTCCGTCGCGAAAACCGCAGCAACAGAGCCGCCTCTAACTCTCTGGCTCAACCGCCGCCGCCGCCTTGGCCGGCACAACGCGACTAGCGTCATCTCAACG>A00679:63:HGVWCDSXX:4:2461:25970:10614GCAGCAATGGCGGCCCGCCCCGTTGTTTGAGTCACGGCAACAACGGCCGCCTCCGTCGCGAAAACCGCAGCAACAGAGCCGCCTCTAACTCTCTGGCTCAACCGCCGCCGCCGCCTAGGCCGGCACAACGCGACTAGCGTCATCTCAACG

过滤掉长度小于一定长度的序列(-L),并将质量值小于一定值的碱基进行mask(-q),并生成fasta文件(-A)

代码语言:javascript
复制
# 质量值小于20的碱基都变成了小写,长度小于100bp的序列不会被输出> seqtk seq -L 100 -q 20 test.fq | head -8@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGaGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF@A00679:63:HGVWCDSXX:4:1625:12680:18912GCGGtTTTCGCGACGGAGGCGGCcGTTgTTGCCGaGACTCAAACAACggGgCGGGCCgCCATTGCtgCTCACgCTGCCGAGCAGCTTCCGCGATCAAgGATGCaGGCCgCCAtCGACGCCTCCgTaGCCGCtGACCtgGgAGAGgGATGG+:FFF,FFF:FFF:FF:FFFFFFF,:F:,FFFFF:,:FF:FFFFFFFF,,F,F:FFFF,FFFFFFF,,F:FFF,FFFFF:FFFFFFFFFFFFFFFFFF,FFFFF,FFFF,FFF,FFFF:FFFFF,F,FFFFF,FFFF,,F,F:FF,FF:FF
  • sample 使用随机种子(-s,保证重复性)提取一定比例(0.4)的子序列
代码语言:javascript
复制
# 以10为种子,提取全部序列的40%> seqtk sample -s 10 test.fq 0.4@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGaGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF@A00679:63:HGVWCDSXX:4:1625:12680:18912GCGGtTTTCGCGACGGAGGCGGCcGTTgTTGCCGaGACTCAAACAACggGgCGGGCCgCCATTGCtgCTCACgCTGCCGAGCAGCTTCCGCGATCAAgGATGCaGGCCgCCAtCGACGCCTCCgTaGCCGCtGACCtgGgAGAGgGATGG+:FFF,FFF:FFF:FF:FFFFFFF,:F:,FFFFF:,:FF:FFFFFFFF,,F,F:FFFF,FFFFFFF,,F:FFF,FFFFF:FFFFFFFFFFFFFFFFFF,FFFFF,FFFF,FFF,FFFF:FFFFF,F,FFFFF,FFFF,,F,F:FF,FF:FF
  • comp 如果你想知道每条序列的碱基组成这个命令你一定会使用的。而且支持使用bed文件(-r),对子序列进行查看。
代码语言:javascript
复制
> cat test.bedA00679:63:HGVWCDSXX:4:1625:12680:18912	4	30# 结果里面的列含义分别为:chr, length, #A, #C, #G, #T, #2, #3, #4, #CpG, #tv, #ts, #CpG-ts> seqtk comp -r test.bed test.fqA00679:63:HGVWCDSXX:4:1625:12680:18912	4	30	2	6	10	8	00	0	10	0	0	0
  • subseq 如果你只想提取某几条序列/或者某一段区间里的序列,那么就可以使用这个命令;也可以指定一行输出(-t
代码语言:javascript
复制
> echo "A00679:63:HGVWCDSXX:4:1403:24569:25911" | seqtk subseq -t test.fq -@A00679:63:HGVWCDSXX:4:1403:24569:25911ATTCACTCATGTACACCTTTCTTCCTCCTCTCTTCATCTCCTATCCCAAATATCTATCTCAACCATCTACATGGCTTCATCTCCTCCTTTGTTCCCGTCGTCCGATCCATTTGCTATCTTAGCCTTAGCTAGCTAGCTAGGGTTTCTTGA+FFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFF:FFFFFFF::F::FFFFFF::FF:FFFFFFF:FFFF:F:FFFFF:F:F:FFFFFFFFFFFFFFF,:FF,FFF,,FFF,,FFFFFFFF> echo "A00679:63:HGVWCDSXX:4:1403:24569:25911" | seqtk subseq -t test.fq -A00679:63:HGVWCDSXX:4:1403:24569:25911	1	ATTCACTCATGTACACCTTTCTTCCTCCTCTCTTCATCTCCTATCCCAAATATCTATCTCAACCATCTACATGGCTTCATCTCCTCCTTTGTTCCCGTCGTCCGATCCATTTGCTATCTTAGCCTTAGCTAGCTAGCTAGGGTTTCTTGA
  • fqchk 查看每个碱基位点在不同序列上的碱基分布情况,错误率,质量值等
代码语言:javascript
复制
> seqtk fqchk test.fqmin_len: 150; max_len: 150; avg_len: 150.00; 3 distinct quality valuesPOS	#bases	%A	%C	%G	%T	%N	avgQ	errQ	%low	%highALL	900	15.2	31.0	31.3	22.4	0.0	35.2	24.3	4.2	95.81	6	16.7	33.3	50.0	0.0	0.0	35.0	31.6	0.0	100.02	6	0.0	50.0	33.3	16.7	0.0	37.0	37.0	0.0	100.03	6	0.0	33.3	16.7	50.0	0.0	37.0	37.0	0.0	100.04	6	0.0	16.7	16.7	66.7	0.0	35.0	31.6	0.0	100.05	6	16.7	0.0	33.3	50.0	0.0	30.7	18.6	16.7	83.36	6	33.3	16.7	33.3	16.7	0.0	37.0	37.0	0.0	100.07	6	33.3	0.0	33.3	33.3	0.0	37.0	37.0	0.0	100.08	6	33.3	16.7	0.0	50.0	0.0	37.0	37.0	0.0	100.09	6	16.7	16.7	33.3	33.3	0.0	35.0	31.6	0.0	100.010	6	0.0	0.0	83.3	16.7	0.0	37.0	37.0	0.0	100.011	6	33.3	50.0	16.7	0.0	0.0	37.0	37.0	0.0	100.012	6	0.0	33.3	50.0	16.7	0.0	37.0	37.0	0.0	100.0...149	6	0.0	0.0	100.0	0.0	0.0	37.0	37.0	0.0	100.0150	6	16.7	33.3	50.0	0.0	0.0	37.0	37.0	0.0	100.0
  • mergepe 通常我们Illunima双端测序会得到两个文件R1.fq.gz和R2.fq.gz,这个命令就是帮助怎么完美实现两两配对
代码语言:javascript
复制
> cat test.1.fq@A00679:63:HGVWCDSXX:4:1101:2392:1438 1:N:0:CCTTAATA+CCTTAATCCGGCGGGCGCATCGTGGTGGGCTGCATCCCGTACCGCGTGCGGTGCGACGGCGAGCTGGAGGTGCTGGCGATCACGTCCCAGAAGGGGCACGGCATGATGTTCCCCAAGGGCGGGTGGGAGGTGGACGAGTCCATGGACGAGGCCGCCA+FFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:1101:5285:1438 1:N:0:CCTTAATA+CCTTAATCCTGGGCTTTCTGATGGTCCTCCCAAGCCTGGATCTTGATCTCTTCTCGCTTGAATCTCGCTAATTACTTGGCCGTCTCCGCCTCCTCCCACGCAGCCGCACGGATCGCGGTTATGTTCTGCGTCGACTGGTCCATGGGCACCGGTTTCA+,FFFFF,:F:F:F::FFFFFFF::FFF,FF:,FFF,:FFF:FFFFFFFFFF:F:FFFFFF,FF,,FFFFFF,F:,FFF,,FFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFF:FFFF:F:FF:F:F,FF,FF,FFFF:,F,@A00679:63:HGVWCDSXX:4:1101:12391:1438 1:N:0:CCTTAATA+CCTTAATAGGGGTGGATTGAGAGTGCCTCATCCTATCTGAAGCCCTAATGAAGAGTGAGACTATTCTTGGAGCTGCTCCTACACCATAAAGTGGTGGGATGCTCATTGTAAACCAATGTCCCATCAATGCTTGGAAGGGCTGCACACTTTCAGCACG+FFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFF,FFFFFF:FFFFF:FFFFFFFF:FFF:FFFF::FFFF,FFFFFFFF
> cat test.2.fq@A00679:63:HGVWCDSXX:4:1101:2392:1438 2:N:0:CCTTAATA+CCTTAATGCTCGACGAGCCGCTCCAGCGCCTCGCGCATCCACCAGTGCGGGCACCCGTCCATCACCTGCTGCACCGTTGCCCAGGTGCGCTTGCGGGACGCCATCTCGGGCCACTGGTGGAGCTCGTCGGCGACGCGGAGCGGGAACATGAACCCCT+FFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF,FFFFF,FFFFF@A00679:63:HGVWCDSXX:4:1101:5285:1438 2:N:0:CCTTAATA+CCTTAATGCAAGCCAAGAGCCGTCCCGGACCGGGACGCCGGTGAGGGCGACGAGCCCAAACTGCTCCCAGCCAACGACTCCACGGAGGACGCTCGGTGGCCCCAGCGCAGTCGGCTCCTTCATCAGCCACGGCGGAGAATGCAGCAGCTCGGAGCTG+:FFFFFFFF:FF::FFFFFFFF:FFFFFFFFFF::FFFFFF:FFFFFFFFFFFFFFFFF,FF:FFFFF:FFF:,FFFFFFF::FFFFF,FF:FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFF:,,FFFF:FFFFFF:FF:FFFFFFFFF@A00679:63:HGVWCDSXX:4:1101:12391:1438 2:N:0:CCTTAATA+CCTTAATTCCTGTCATACTAATATCTTTGTTTCTGGCGATAACACGGAACAGTCGTAGTGGCTTTAGACTACTATGGTACTAGCAAAGAATTGAAAATGTCAAGTGGCTGTAGAGATATTGCAATACGAAAGGTAGCTGTTCATAATGTAGAAATCA+FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFF,FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFF
# 使用mergepe以后,序列A00679:63:HGVWCDSXX:4:1101:2392:1438的R1和R2就并到一起了> seqtk mergepe test.1.fq test.2.fq | head -8@A00679:63:HGVWCDSXX:4:1101:2392:1438 1:N:0:CCTTAATA+CCTTAATCCGGCGGGCGCATCGTGGTGGGCTGCATCCCGTACCGCGTGCGGTGCGACGGCGAGCTGGAGGTGCTGGCGATCACGTCCCAGAAGGGGCACGGCATGATGTTCCCCAAGGGCGGGTGGGAGGTGGACGAGTCCATGGACGAGGCCGCCA+FFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:1101:2392:1438 2:N:0:CCTTAATA+CCTTAATGCTCGACGAGCCGCTCCAGCGCCTCGCGCATCCACCAGTGCGGGCACCCGTCCATCACCTGCTGCACCGTTGCCCAGGTGCGCTTGCGGGACGCCATCTCGGGCCACTGGTGGAGCTCGTCGGCGACGCGGAGCGGGAACATGAACCCCT+FFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF,FFFFF,FFFFF
  • trimfq 通常序列两端会出现质量相对较低的情况,这个时候为了序列高质量,需要对序列进行修剪(-b 5'端的碱基,-e 3'端的碱基)
代码语言:javascript
复制
> head -8 test.fq@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF
# 5'端剪切30bp碱基,3'端剪切30bp碱基> seqtk trimfq -b 30 -e 30 test.fq | head -8@A00679:63:HGVWCDSXX:4:1271:5927:18176GCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGAC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614GCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGAC+FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • rename 在每条序列的名称前面加上前缀
代码语言:javascript
复制
> seqtk rename test.fq test_prefix | head -8@test_prefix1CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@test_prefix2CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF
  • cutN 去掉序列中的N/或者查看N所在的位置信息(-n 指定N的最小长度)
代码语言:javascript
复制
> head -8 testN.fa>A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCNNNNCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTNNNNNNNTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC>A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGNNNNNACTCAAACAACGGGGCGGGCCGCCATTGCTGC>A00679:63:HGVWCDSXX:4:1625:12680:18912GCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGAGACTCAAACAACGGGGCGGGCCGCCATTGCTGCTCACGCTGCCGAGCAGCTTCCGCGATCAAGGATGCAGGCCGCCATCGACGCCTCCGTAGCCGCTGACCTGGGAGAGGGATGG>A00679:63:HGVWCDSXX:4:1329:18557:30592GCCTTGATGGCGTCGGCATCCCCATCCGCCGCCGGCGACGACATCTCTCCCTGCACTGTTGCCATGTCCGCCTTCCGATGGCTATGCTGCGCTGCCGACGACGGGTCGGCCAGTAACAGTAACTGTCGCAACGGGATGCGCGAGCTGTGG
# 指定N的最小长度为1,然后输出N在序列中的坐标> seqtk cutN -n 1 -g testN.faA00679:63:HGVWCDSXX:4:1271:5927:18176	33	37A00679:63:HGVWCDSXX:4:1271:5927:18176	82	89A00679:63:HGVWCDSXX:4:2461:25970:10614	113	118
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-12-07,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生物信息学 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 引言
  • Seqtk简介及安装
  • Seqtk有几个常用的子命令
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档