上次在只用一行颠覆你处理文件的方式里面说了可以用Seqtk来处理fasta/fastq文件。那么这一期就来讲讲怎么来使用seqtk。
Seqtk是Heng Li(https://github.com/lh3)大神开发的一款用于处理fasta/fastq文件的工具,因其操作轻便且跨平台,继而受到广大科研人员的青睐,目前这个项目在github上已经被标星602次。Seqtk的安装就和Heng Li大神的图像一样简单
git clone https://github.com/lh3/seqtk.git;cd seqtk; make
这样seqtk的二进制文件就生成了。
seq
就是最常用的子命令了> seqtk seqUsage: seqtk seq [options] <in.fq>|<in.fa>Options: -q INT 将测序质量小于INT的碱基变成小写(默认为0) -X INT 测序质量大于INT的碱基变成小写(默认为255) -n CHAR 满足-q或-X条件下的碱基都被转成CHAR字符 -l INT 每一行碱基的数量,最大为2^32-1 -Q INT 测序质量在不同测序平台中的质量偏移量(默认为33) -s INT 随机种子(默认为11) -f FLOAT 随机取用户提供比例的子序列(默认是全部序列) -M FILE 用BED格式或含有序列名的文件来将所选序列变成小写 -L INT 丢弃长度小于一定长度的序列 -c 互补 -r 反向互补 -A 强制将序列转化为FASTA格式 -C 去掉文件头中的注释行 -N 去掉含有不确定碱基的行 -1 输出奇数行的reads -2 输出偶数行的reads -V 通过'(-Q) - 33'来改变质量值 -U 将所有的碱基变成大写 -S 去掉序列里面的空白
我们可以将fastq文件长序列折叠成多行短序列(-l
),反向互补(-r
),并生成fasta文件(-A
)
> head -8 test.fq@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF
# 用60bp的长度对序列进行折叠> seqtk seq -l 60 -r -A test.fq | head -8>A00679:63:HGVWCDSXX:4:1271:5927:18176GCAGCAATGGCGGCCCGCCCCGTTGTTTGAGTCACGGCAACAACGGCCGCCTCCGTCGCGAAAACCGCAGCAACAGAGCCGCCTCTAACTCTCTGGCTCAACCGCCGCCGCCGCCTTGGCCGGCACAACGCGACTAGCGTCATCTCAACG>A00679:63:HGVWCDSXX:4:2461:25970:10614GCAGCAATGGCGGCCCGCCCCGTTGTTTGAGTCACGGCAACAACGGCCGCCTCCGTCGCGAAAACCGCAGCAACAGAGCCGCCTCTAACTCTCTGGCTCAACCGCCGCCGCCGCCTAGGCCGGCACAACGCGACTAGCGTCATCTCAACG
过滤掉长度小于一定长度的序列(-L),并将质量值小于一定值的碱基进行mask(-q
),并生成fasta文件(-A
)
# 质量值小于20的碱基都变成了小写,长度小于100bp的序列不会被输出> seqtk seq -L 100 -q 20 test.fq | head -8@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGaGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF@A00679:63:HGVWCDSXX:4:1625:12680:18912GCGGtTTTCGCGACGGAGGCGGCcGTTgTTGCCGaGACTCAAACAACggGgCGGGCCgCCATTGCtgCTCACgCTGCCGAGCAGCTTCCGCGATCAAgGATGCaGGCCgCCAtCGACGCCTCCgTaGCCGCtGACCtgGgAGAGgGATGG+:FFF,FFF:FFF:FF:FFFFFFF,:F:,FFFFF:,:FF:FFFFFFFF,,F,F:FFFF,FFFFFFF,,F:FFF,FFFFF:FFFFFFFFFFFFFFFFFF,FFFFF,FFFF,FFF,FFFF:FFFFF,F,FFFFF,FFFF,,F,F:FF,FF:FF
sample
使用随机种子(-s
,保证重复性)提取一定比例(0.4
)的子序列# 以10为种子,提取全部序列的40%> seqtk sample -s 10 test.fq 0.4@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGaGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF@A00679:63:HGVWCDSXX:4:1625:12680:18912GCGGtTTTCGCGACGGAGGCGGCcGTTgTTGCCGaGACTCAAACAACggGgCGGGCCgCCATTGCtgCTCACgCTGCCGAGCAGCTTCCGCGATCAAgGATGCaGGCCgCCAtCGACGCCTCCgTaGCCGCtGACCtgGgAGAGgGATGG+:FFF,FFF:FFF:FF:FFFFFFF,:F:,FFFFF:,:FF:FFFFFFFF,,F,F:FFFF,FFFFFFF,,F:FFF,FFFFF:FFFFFFFFFFFFFFFFFF,FFFFF,FFFF,FFF,FFFF:FFFFF,F,FFFFF,FFFF,,F,F:FF,FF:FF
comp
如果你想知道每条序列的碱基组成这个命令你一定会使用的。而且支持使用bed文件(-r
),对子序列进行查看。> cat test.bedA00679:63:HGVWCDSXX:4:1625:12680:18912 4 30# 结果里面的列含义分别为:chr, length, #A, #C, #G, #T, #2, #3, #4, #CpG, #tv, #ts, #CpG-ts> seqtk comp -r test.bed test.fqA00679:63:HGVWCDSXX:4:1625:12680:18912 4 30 2 6 10 8 00 0 10 0 0 0
subseq
如果你只想提取某几条序列/或者某一段区间里的序列,那么就可以使用这个命令;也可以指定一行输出(-t
)> echo "A00679:63:HGVWCDSXX:4:1403:24569:25911" | seqtk subseq -t test.fq -@A00679:63:HGVWCDSXX:4:1403:24569:25911ATTCACTCATGTACACCTTTCTTCCTCCTCTCTTCATCTCCTATCCCAAATATCTATCTCAACCATCTACATGGCTTCATCTCCTCCTTTGTTCCCGTCGTCCGATCCATTTGCTATCTTAGCCTTAGCTAGCTAGCTAGGGTTTCTTGA+FFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFF:FFFFFFF::F::FFFFFF::FF:FFFFFFF:FFFF:F:FFFFF:F:F:FFFFFFFFFFFFFFF,:FF,FFF,,FFF,,FFFFFFFF> echo "A00679:63:HGVWCDSXX:4:1403:24569:25911" | seqtk subseq -t test.fq -A00679:63:HGVWCDSXX:4:1403:24569:25911 1 ATTCACTCATGTACACCTTTCTTCCTCCTCTCTTCATCTCCTATCCCAAATATCTATCTCAACCATCTACATGGCTTCATCTCCTCCTTTGTTCCCGTCGTCCGATCCATTTGCTATCTTAGCCTTAGCTAGCTAGCTAGGGTTTCTTGA
fqchk
查看每个碱基位点在不同序列上的碱基分布情况,错误率,质量值等> seqtk fqchk test.fqmin_len: 150; max_len: 150; avg_len: 150.00; 3 distinct quality valuesPOS #bases %A %C %G %T %N avgQ errQ %low %highALL 900 15.2 31.0 31.3 22.4 0.0 35.2 24.3 4.2 95.81 6 16.7 33.3 50.0 0.0 0.0 35.0 31.6 0.0 100.02 6 0.0 50.0 33.3 16.7 0.0 37.0 37.0 0.0 100.03 6 0.0 33.3 16.7 50.0 0.0 37.0 37.0 0.0 100.04 6 0.0 16.7 16.7 66.7 0.0 35.0 31.6 0.0 100.05 6 16.7 0.0 33.3 50.0 0.0 30.7 18.6 16.7 83.36 6 33.3 16.7 33.3 16.7 0.0 37.0 37.0 0.0 100.07 6 33.3 0.0 33.3 33.3 0.0 37.0 37.0 0.0 100.08 6 33.3 16.7 0.0 50.0 0.0 37.0 37.0 0.0 100.09 6 16.7 16.7 33.3 33.3 0.0 35.0 31.6 0.0 100.010 6 0.0 0.0 83.3 16.7 0.0 37.0 37.0 0.0 100.011 6 33.3 50.0 16.7 0.0 0.0 37.0 37.0 0.0 100.012 6 0.0 33.3 50.0 16.7 0.0 37.0 37.0 0.0 100.0...149 6 0.0 0.0 100.0 0.0 0.0 37.0 37.0 0.0 100.0150 6 16.7 33.3 50.0 0.0 0.0 37.0 37.0 0.0 100.0
mergepe
通常我们Illunima双端测序会得到两个文件R1.fq.gz和R2.fq.gz,这个命令就是帮助怎么完美实现两两配对> cat test.1.fq@A00679:63:HGVWCDSXX:4:1101:2392:1438 1:N:0:CCTTAATA+CCTTAATCCGGCGGGCGCATCGTGGTGGGCTGCATCCCGTACCGCGTGCGGTGCGACGGCGAGCTGGAGGTGCTGGCGATCACGTCCCAGAAGGGGCACGGCATGATGTTCCCCAAGGGCGGGTGGGAGGTGGACGAGTCCATGGACGAGGCCGCCA+FFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:1101:5285:1438 1:N:0:CCTTAATA+CCTTAATCCTGGGCTTTCTGATGGTCCTCCCAAGCCTGGATCTTGATCTCTTCTCGCTTGAATCTCGCTAATTACTTGGCCGTCTCCGCCTCCTCCCACGCAGCCGCACGGATCGCGGTTATGTTCTGCGTCGACTGGTCCATGGGCACCGGTTTCA+,FFFFF,:F:F:F::FFFFFFF::FFF,FF:,FFF,:FFF:FFFFFFFFFF:F:FFFFFF,FF,,FFFFFF,F:,FFF,,FFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFF:FFFF:F:FF:F:F,FF,FF,FFFF:,F,@A00679:63:HGVWCDSXX:4:1101:12391:1438 1:N:0:CCTTAATA+CCTTAATAGGGGTGGATTGAGAGTGCCTCATCCTATCTGAAGCCCTAATGAAGAGTGAGACTATTCTTGGAGCTGCTCCTACACCATAAAGTGGTGGGATGCTCATTGTAAACCAATGTCCCATCAATGCTTGGAAGGGCTGCACACTTTCAGCACG+FFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFF,FFFFFF:FFFFF:FFFFFFFF:FFF:FFFF::FFFF,FFFFFFFF
> cat test.2.fq@A00679:63:HGVWCDSXX:4:1101:2392:1438 2:N:0:CCTTAATA+CCTTAATGCTCGACGAGCCGCTCCAGCGCCTCGCGCATCCACCAGTGCGGGCACCCGTCCATCACCTGCTGCACCGTTGCCCAGGTGCGCTTGCGGGACGCCATCTCGGGCCACTGGTGGAGCTCGTCGGCGACGCGGAGCGGGAACATGAACCCCT+FFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF,FFFFF,FFFFF@A00679:63:HGVWCDSXX:4:1101:5285:1438 2:N:0:CCTTAATA+CCTTAATGCAAGCCAAGAGCCGTCCCGGACCGGGACGCCGGTGAGGGCGACGAGCCCAAACTGCTCCCAGCCAACGACTCCACGGAGGACGCTCGGTGGCCCCAGCGCAGTCGGCTCCTTCATCAGCCACGGCGGAGAATGCAGCAGCTCGGAGCTG+:FFFFFFFF:FF::FFFFFFFF:FFFFFFFFFF::FFFFFF:FFFFFFFFFFFFFFFFF,FF:FFFFF:FFF:,FFFFFFF::FFFFF,FF:FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFF:,,FFFF:FFFFFF:FF:FFFFFFFFF@A00679:63:HGVWCDSXX:4:1101:12391:1438 2:N:0:CCTTAATA+CCTTAATTCCTGTCATACTAATATCTTTGTTTCTGGCGATAACACGGAACAGTCGTAGTGGCTTTAGACTACTATGGTACTAGCAAAGAATTGAAAATGTCAAGTGGCTGTAGAGATATTGCAATACGAAAGGTAGCTGTTCATAATGTAGAAATCA+FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFF,FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFF
# 使用mergepe以后,序列A00679:63:HGVWCDSXX:4:1101:2392:1438的R1和R2就并到一起了> seqtk mergepe test.1.fq test.2.fq | head -8@A00679:63:HGVWCDSXX:4:1101:2392:1438 1:N:0:CCTTAATA+CCTTAATCCGGCGGGCGCATCGTGGTGGGCTGCATCCCGTACCGCGTGCGGTGCGACGGCGAGCTGGAGGTGCTGGCGATCACGTCCCAGAAGGGGCACGGCATGATGTTCCCCAAGGGCGGGTGGGAGGTGGACGAGTCCATGGACGAGGCCGCCA+FFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:1101:2392:1438 2:N:0:CCTTAATA+CCTTAATGCTCGACGAGCCGCTCCAGCGCCTCGCGCATCCACCAGTGCGGGCACCCGTCCATCACCTGCTGCACCGTTGCCCAGGTGCGCTTGCGGGACGCCATCTCGGGCCACTGGTGGAGCTCGTCGGCGACGCGGAGCGGGAACATGAACCCCT+FFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF,FFFFF,FFFFF
trimfq
通常序列两端会出现质量相对较低的情况,这个时候为了序列高质量,需要对序列进行修剪(-b
5'端的碱基,-e
3'端的碱基)> head -8 test.fq@A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF
# 5'端剪切30bp碱基,3'端剪切30bp碱基> seqtk trimfq -b 30 -e 30 test.fq | head -8@A00679:63:HGVWCDSXX:4:1271:5927:18176GCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGAC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00679:63:HGVWCDSXX:4:2461:25970:10614GCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGAC+FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
rename
在每条序列的名称前面加上前缀> seqtk rename test.fq test_prefix | head -8@test_prefix1CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCAAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF@test_prefix2CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC+FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF
cutN
去掉序列中的N/或者查看N所在的位置信息(-n
指定N的最小长度)> head -8 testN.fa>A00679:63:HGVWCDSXX:4:1271:5927:18176CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCNNNNCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTNNNNNNNTCGCGACGGAGGCGGCCGTTGTTGCCGTGACTCAAACAACGGGGCGGGCCGCCATTGCTGC>A00679:63:HGVWCDSXX:4:2461:25970:10614CGTTGAGATGACGCTAGTCGCGTTGTGCCGGCCTAGGCGGCGGCGGCGGTTGAGCCAGAGAGTTAGAGGCGGCTCTGTTGCTGCGGTTTTCGCGACGGAGGCGGCCGTTGTTGNNNNNACTCAAACAACGGGGCGGGCCGCCATTGCTGC>A00679:63:HGVWCDSXX:4:1625:12680:18912GCGGTTTTCGCGACGGAGGCGGCCGTTGTTGCCGAGACTCAAACAACGGGGCGGGCCGCCATTGCTGCTCACGCTGCCGAGCAGCTTCCGCGATCAAGGATGCAGGCCGCCATCGACGCCTCCGTAGCCGCTGACCTGGGAGAGGGATGG>A00679:63:HGVWCDSXX:4:1329:18557:30592GCCTTGATGGCGTCGGCATCCCCATCCGCCGCCGGCGACGACATCTCTCCCTGCACTGTTGCCATGTCCGCCTTCCGATGGCTATGCTGCGCTGCCGACGACGGGTCGGCCAGTAACAGTAACTGTCGCAACGGGATGCGCGAGCTGTGG
# 指定N的最小长度为1,然后输出N在序列中的坐标> seqtk cutN -n 1 -g testN.faA00679:63:HGVWCDSXX:4:1271:5927:18176 33 37A00679:63:HGVWCDSXX:4:1271:5927:18176 82 89A00679:63:HGVWCDSXX:4:2461:25970:10614 113 118