我有如下所示的文本文件:
this is the code ;rfc1234;rfc1234
this is the code ;rfc1234;rfc1234;rfc1234;rfc1234如何将文件中的重复单词压缩为单个单词,如下所示:
this is the code ;rfc1234
this is the code ;rfc1234我试过“tr”命令,但它仅限于压缩字符
发布于 2018-05-22 21:37:44
sed 's/\(;[^;]*\).*/\1/' file发布于 2018-05-22 15:51:59
对于以sed为前缀的任意重复字符串,则为;
$ sed -E 's/(;[^;]+)(\1)+/\1/g' file或者,如果您想删除第一个令牌之后的所有内容,而不检查它们是否与前面的令牌匹配
$ sed -E 's/(\S);.*/\1/' file解释
(;[^;]+)是捕获以分号开头的字符串。(\1)+后面跟着同一个捕获的字符串一次或多次/\1/g用一个实例替换整个链,然后重复
发布于 2018-05-22 15:33:53
在这里跟踪awk可能会有所帮助。它将查找Input_file最后一栏中的所有项,并且只保留其中唯一的值。
awk '{num=split($NF,array,";");for(i=1;i<=num;i++){if(!array1[array[i]]++){val=val?val ";" array[i]:array[i]}};NF--;print $0";"val;val="";delete array;delete array1}' Input_file现在也加入了一种非单一的线性解决方案。
awk '
{
num=split($NF,array,";");
for(i=1;i<=num;i++){
if(!array1[array[i]]++){
val=val?val ";" array[i]:array[i]}
};
NF--;
print $0";"val;
val="";
delete array;
delete array1
}' Input_file解释:
awk '
{
num=split($NF,array,";"); ##Creating a variable named num whose value is length of array named array, which is created on last field of line with ; as a delimiter.
for(i=1;i<=num;i++){ ##Starting a for loop from i=1 to till value of num each time increment i as 1.
if(!array1[array[i]]++){ ##Chrcking here a condition if array named array1 index is value of array[i] is NOT coming more than 1 value then do following.
val=val?val ";" array[i]:array[i]}##Creating a variable named val here whose value is array[i] value and keep concatenating its own value of it.
};
NF--; ##Reducing the value of NF(number of fields) in current line to remove the last field from it.
print $0";"val; ##Printing the current line(without last field) ; and then value of val here.
val=""; ##Nullifying variable val here.
delete array; ##Deleting array named array here.
delete array1 ##Deleting array named array1 here.
}' Input_file ##Mentioning Input_file name here.https://stackoverflow.com/questions/50470988
复制相似问题