我目前正在使用grep尝试从文件的每一行提取特定的文本。它成功地提取了匹配,但是,我希望它保留任何没有匹配的行(将它们保留为空行)。
到目前为止,我已经尝试过这样做(在每一行上都有城市的名称):
grep -o -P '(?<="city":").*?(?=")' input.txt示例输入:
email":"addictedtotlick7@gmail.com","last_name":"THOMPSON","first_name":"ERIN",,"__v":0,,,,"state":"NY","city":"north tonawanda"}
first_name":"chris","last_name":"caul",,"email":"dawgzn@mail.com",,,,"__v":0}
email":"lesliebo993@hotmail.com",,"first_name":"LESLIE","last_name":"RAMBO",,"city":"DOTHAN","state":"AL",,,"__v":0,
email":"malala@yahoo.com",,,"state":"GA","city":"NORCROSS",,"last_name":"KEO","first_name":"CATHY",,"__v":0,
email":"kdela@gmail.com",,"state":"FL","city":"HOLLYWOOD",,"last_name":"DE LA CRUZ","first_name":"KIDA",,"__v":0,期望产出:
north tonawanda
DOTHAN
NORCROSS
HOLLYWOOD很高兴在SED中尝试一些更简单的东西,但更愿意避免AWK,因为我必须处理大文件,不确定我有足够的RAM。
发布于 2019-05-29 08:21:57
你可以用GNU awk来做这个。
gawk '{print index($0, "\"city\":\"") == 0 ? "" : gensub(/.*\"city\":\"([^\"]*).*/, "\\1", 1);}' file > newfile这意味着:如果该行中包含"city":" (index($0, "\"city\":\"") == 0),则(?)打印空行("")或(:)打印gensub(/.*\"city\":\"([^\"]*).*/, "\\1", 1)正则表达式替换的结果:
.* -任何0+字符\"city\":\" -一个"city":"子字符串([^\"]*) -捕获组1 (\1):除"以外的任何0+字符.* -任何0+字符。结果是Group 1的值,我们需要gensub,因此我们需要GNU,因为我们需要访问捕获组值。
发布于 2019-05-29 15:59:34
您可以尝试Perl。
$ perl -nle ' if(/"city":"(.*?)"/) { print $1 } else { print "" } ' input.txt
north tonawanda
DOTHAN
NORCROSS
HOLLYWOOD
$发布于 2019-05-29 13:08:13
Sed:
sed 's/.*city":"\([^"]*\).*/|\1/; /^[^|]/s/.*//; s/^|//'https://stackoverflow.com/questions/56355517
复制相似问题