我正在尝试删除字段中的重复项(并用空格替换它们),但前提是前面的字段是相同的。例如:
示例输入:
France Paris Museum of Fine Arts blabala
France Paris Museum of Fine Arts blajlk
France Paris Yet another museum lqmsjdf
France Paris Museum of National History mlqskjf
France Bordeaux Museum of Fine Arts qsfsqf
France Bordeaux City Hall lmqjflqsk
France Bordeaux City Hall lqkjfqlskjflqskfj
Spain Madrid Museum of Fine Arts lqksjfh
Spain Madrid Museum of Fine Arts qlmfjlqsjf
Spain Barcelona City Hall nvqjvvnqk
Spain Barcelona Museum of Fine Arts lmkqjflqksfj所需输出:
France Paris Museum of FineArts blabala
blajlk
Yet another museum lqmsjdf
Museum of National History mlqskjf
Bordeaux Museum of Fine Arts qsfsqf
City Hall lmqjflqsk
lqkjfqlskjflqskfj
Spain Madrid Museum of Fine Arts lqksjfh
qlmfjlqsjf
Barcelona City Hall nvqjvvnqk
Museum of Fine Arts lmkqjflqksfj提前感谢您对我的帮助。
发布于 2011-01-25 03:22:41
试一试:
awk -F '\t' 'BEGIN {OFS=FS} {if ($1 == prev1) $1 = ""; else prev1 = $1; if ($2 == prev2) $2 = ""; else prev2 = $2; if ($3 == prev3) $3 = ""; else prev3 = $3; print}' inputfile下面是一个较短的版本,适用于任意数量的字段(最后一个字段总是打印出来的):
awk -F '\t' 'BEGIN {OFS=FS} {for (i=1; i<=NF-1;i++) if ($i == prev[i]) $i = ""; else prev[i] = $i; print}' inputfile输出不会对齐以供屏幕使用,但会有正确数量的选项卡。
输出将如下所示:
field1 TAB field2 TAB field3 TAB field4
TAB TAB TAB field4
TAB TAB field3 TAB field4
TAB field2 TAB field3 TAB field4
etc.如果您需要列对齐,这也是可能的。
编辑:
此版本允许您指定要执行重复数据消除的字段:
#!/usr/bin/awk -f
BEGIN {
FS="\t"; OFS=FS
deduplist=ARGV[1]
ARGV[1]=""
split(deduplist,tmp," ")
for (i in tmp) dedup[tmp[i]]=1
}
{
for (i=1; i<=NF;i++)
if (i in dedup) {
if ($i == prev[i])
$i = ""
else
prev[i] = $i
}
# prevent printing lines that are completely blank because
# it's an exact duplicate of the preceding line and all fields
# are being deduplicated
if ($0 !~ /^[[:blank:]]*$/)
print
}像这样运行它:./script.awk "2 3" inputfile对字段2和字段3进行重复数据删除。
发布于 2011-01-25 02:59:40
试试这个Perl一行程序:
perl -F"\t" -nae '@O=@F;if(!$x){$x=1}else{for($i=0;$i<=$#S;$i++){$F[$i]=""if($S[$i] eq "" || $S[$i] eq $F[$i])}};print join "\t",@F;@S=@O;'我假设字段是制表符分隔的。
https://stackoverflow.com/questions/4785566
复制相似问题