首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在每条新行上分割线并保留起始和结束

在每条新行上分割线并保留起始和结束
EN

Stack Overflow用户
提问于 2020-07-09 09:00:40
回答 2查看 44关注 0票数 0

我有一个多行的.txt,如下所示:

代码语言:javascript
运行
复制
> X 147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1),EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
> X 147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1),EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0

每个字段都是制表符分隔的,第四个字段包含由逗号分隔的多个信息。我知道我可以和tr , '\n'一起分享:

代码语言:javascript
运行
复制
X   147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1)
EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1)
EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0

但我想要的是:

代码语言:javascript
运行
复制
X   147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1) NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1) NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1)  NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1)  NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1)  NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1)  NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA  11161.p1    NA  A/A 77  A/A 87  A/C 97  A/C 0
X   147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1) NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0
X   147010263   SNP EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1) NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1)  NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1)  NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1)  NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1)  NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0
X   147010263   SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA  NA  13829.p1    A/A 46  A/A 83  A/C 17  A/C 0

请注意,这条线的开头(X 147010263,它的染色体位置)也可能不同,例如3 41278119,4 114275304

我怎样才能做到这一点?

谢谢!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-07-09 09:41:18

纯bash的解决方案可以是:

代码语言:javascript
运行
复制
#!/bin/bash

while IFS=$'\t' read -r f1 f2 f3 f4 rest; do
    IFS=, read -r -a items <<< "$f4"
    for item in "${items[@]}"; do
        printf "%s\t%s\t%s\t%s\t%s\n" "$f1" "$f2" "$f3" "$item" "$rest"
    done
done < input.txt

解释:

外部while循环读取行,直到遇到文件结束为止。IFS=$'\t'指示read内置字符使用选项卡字符作为正在处理的行的字段分隔符。前四个字段分别分配给变量f1f2f3f4。剩余的字段和中间的制表符字符(如果有的话)分配给变量rest (在这里,变量名不是特殊的)。任何有效名称都可以使用)。-r选项用于read内置,因此反斜杠不充当转义字符。

while循环的主体中,read内置器读取变量f4的内容,该变量存储正在处理的行的第四个字段,将其拆分为使用,作为分隔符的字段,并将字段分配给数组items的顺序索引(由-a选项指示)。构造command <<< string称为here string (在Bash参考手册中读取这里的Strings )。

内部for循环(有时称为-每个循环)依次处理数组items的每个元素。"${items[@]}"将数组items的每个元素扩展到一个单独的字段,并且按顺序将字段分配给变量itemprintf内置与C标准库类似。

票数 2
EN

Stack Overflow用户

发布于 2020-07-09 10:19:48

使用awk。请注意,行的开头--X 147010263 --我假设记录不像样本数据所显示的那样以>开头。

代码语言:javascript
运行
复制
$ awk '
BEGIN {
    FS=OFS="\t"                                          # tab-delimitied
}
{
    n=split($4,a,/,/)                                    # split the 4th by commas
    for(i=1;i<=n;i++)                                    # for all comps of 4th
        for(j=1;j<=NF;j++)                               # and all fields
            printf "%s%s",(j==4?a[i]:$j),(j==NF?ORS:OFS) # output 
}' file

输出的开始:

代码语言:javascript
运行
复制
X       147010263       SNP     EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1) NA      11161.p1        NA      A/A     77      A/A     87      A/C     97      A/C     0
X       147010263       SNP     EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1) NA      11161.p1        NA      A/A     77      A/A     87      A/C     97      A/C     0
...
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62811195

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档