我有一个模式文件(fileA.txt),需要在一个大文件(fileB.txt)中搜索它,并且需要用另一个文件(fileC.txt)示例中的模式替换它们:
fileB.txt
4472534
8BC4232
3533221
333553D
8645141
2412AAA我想在fileB中搜索这个模式:
fileA.txt
BC423
33221
12AAA然后,我想用fileC中的模式逐行替换它们:
fileC.txt
66FF7
11GYT
2HHJK预期产出:
4472534
866FF72
3511GYT
333553D
8645141
242HHJK我写了这样的东西:
grep -f fileA.txt fileB.txt | xargs sed -i fileC.txt然而,它正确地搜索模式,但替代可能是不正确的。有什么建议吗?
fileA (pattern to search)
CAAGATTTTCTTTGCCGAGACTCAGTGGGG
fileB
>AMP_4 RS0255 CENPF__ENST00000366955.7__6322__30__0.43333__69.25__1 RS0247
CAGTTGTGCAATTTGGTTTTCCAGCTCACA
>AMP_4 RS0451 CENPF__ENST00000366955.7__10108__30__0.5__71.1396__1 RS0247
GAAGCCTGCAGCCCTCACTGGAAATAAACA
>AMP_4 RS0451 CENPF__ENST00000366955.7__9236__30__0.5__69.816__1 RS0332
CAAGATTTTCTTTGCCGAGACTCAGTGGGG
>AMP_4 RS0451 CENPF__ENST00000366955.7__8140__30__0.43333__68.033__1RS0255
GAGCTCCTTCAATTGATCTTTGCTGCTCTT
fileC (pattern to replace)
GGAGGATGGTGCCTGAATCTACTGGGCTCC发布于 2021-02-12 10:00:37
paste fileA fileC \
|awk 'NR==FNR{ mapping[$1] =$2; next }
{ for(pat in mapping){
gsub(pat, mapping[pat])
};
print
}' - fileB发布于 2021-02-12 09:09:43
这应该是awk的一项任务,请您尝试使用GNU awk中显示的示例进行编写和测试。
awk '
FNR==NR{
arr[$0]=FNR
next
}
FILENAME=="fileC.txt"{
arrVal[++count]=$0
next
}
FILENAME=="fileB.txt"{
for(key in arr){
if(sub(key,arrVal[arr[key]])){
break
}
}
print
}
' fileA.txt fileC.txt fileB.txt输出如下。
4472534
866FF72
3511GYT
333553D
8645141
242HHJK解释:添加了上面的详细说明。
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when fileA.txt is being read.
arr[$0]=FNR ##Creating arr with index of current line and value of current line number.
next ##next will skip all further statements from here.
}
FILENAME=="fileC.txt"{ ##Checking condition if file name is fileC.txt then do following.
arrVal[++count]=$0 ##Creating arrVal with index of count increasing value of 1 and having current line as its value.
next ##next will skip all further statements from here.
}
FILENAME=="fileB.txt"{ ##Checking condition if file name is fileB.txt then do
for(key in arr){ ##Traversing through array arr here.
if(sub(key,arrVal[arr[key]])){ ##Checking condition if substitution of arrVal[arr[key]] is successfully done with key in current line, which basically changes the values in fileB values.
break ##Come out of loop to save some cycles.
}
}
print ##Printing current line here.
}
' fileA.txt fileC.txt fileB.txt ##Mentioning Input_file names here.注意:我们也可以使用ARGC条件检查来代替上面的文件名检查。
发布于 2021-02-12 10:04:36
您可以使用sed生成一个可以替换它们的sed脚本:
sed "$(paste fileA.txt fileC.txt | sed 's/\(.*\)\t\(.*\)/s@\1@\2@g/')" fileB.txthttps://stackoverflow.com/questions/66168741
复制相似问题