我想从第三个字段捕捉csv线,直到双引号(")
more test
"linux02","PLD26","net2-thrift-netconf","net.driver.memory","2"
"linux02","PLD26","net2-thrift-netconf","net.executor.cores","2"
"linux02","PLD26","net2-thrift-netconf","net.executor.instances","2"
"linux02","PLD26","net2-thrift-netconf","net.executor.memory","2"
"linux02","PLD26","net2-thrift-netconf","net.sql.shuffle.partitions","141"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.enabled","true"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.initialExecutors","2"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.minExecutors","2"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.maxExecutors","20"
我试过这个
sed s'/,/ /g' test | awk '{print $3","$4","$5}' | sed s'/"//g'
,,
net2-thrift-netconf,net.driver.memory
net2-thrift-netconf,net.executor.cores
net2-thrift-netconf,net.executor.instances
net2-thrift-netconf,net.executor.memory
net2-thrift-netconf,net.sql.shuffle.partitions
net2-thrift-netconf,net.dynamicAllocation.enabled
net2-thrift-netconf,net.dynamicAllocation.initialExecutors
net2-thrift-netconf,net.dynamicAllocation.minExecutors
net2-thrift-netconf,net.dynamicAllocation.maxExecutors
,,
但是我的语法有问题,因为这个语法也打印",“第二种语法不优雅。
预期产出:
net2-thrift-netconf,net.driver.memory,2
net2-thrift-netconf,net.executor.cores,2
net2-thrift-netconf,net.executor.instances,2
net2-thrift-netconf,net.executor.memory,2
net2-thrift-netconf,net.sql.shuffle.partitions,141
net2-thrift-netconf,net.dynamicAllocation.enabled,true
net2-thrift-netconf,net.dynamicAllocation.initialExecutors,2
net2-thrift-netconf,net.dynamicAllocation.minExecutors,2
net2-thrift-netconf,net.dynamicAllocation.maxExecutors,20
发布于 2018-03-21 10:07:12
只使用sed
:
sed -E 's/"//g; s/^([^,]*,){2}//' infile
s/"//g
,去掉所有双引号。^([^,]*,){2}
,从乞求行开始,去掉所有后面的逗号,最多重复两次。或者使用awk
:
awk -F\" '{$1=$2=$3=$4=$5=""}1' OFS="" infile
发布于 2018-03-21 10:03:46
看起来这只是一个问题,或者删除引号,然后从第三个字段打印到行尾:
$ tr -d \" < file | cut -d, -f3-
net2-thrift-netconf,net.driver.memory,2
net2-thrift-netconf,net.executor.cores,2
net2-thrift-netconf,net.executor.instances,2
net2-thrift-netconf,net.executor.memory,2
net2-thrift-netconf,net.sql.shuffle.partitions,141
net2-thrift-netconf,net.dynamicAllocation.enabled,true
net2-thrift-netconf,net.dynamicAllocation.initialExecutors,2
net2-thrift-netconf,net.dynamicAllocation.minExecutors,2
net2-thrift-netconf,net.dynamicAllocation.maxExecutors,20
因此,tr -d \"
从第三个到最后一个,
-separated字段中删除引号和cut -d, -f3-
打印。
发布于 2018-03-21 13:20:13
您确实应该对CSV数据使用正确的CSV解析器。这里有一种使用红宝石的方法
ruby -rcsv -e '
CSV.foreach(ARGV.shift) do |row|
wanted = row.drop(2) # ignore first 2 fields
puts CSV.generate_line(wanted, :force_quotes=>false)
end
' test
net2-thrift-netconf,net.driver.memory,2
net2-thrift-netconf,net.executor.cores,2
net2-thrift-netconf,net.executor.instances,2
net2-thrift-netconf,net.executor.memory,2
net2-thrift-netconf,net.sql.shuffle.partitions,141
net2-thrift-netconf,net.dynamicAllocation.enabled,true
net2-thrift-netconf,net.dynamicAllocation.initialExecutors,2
net2-thrift-netconf,net.dynamicAllocation.minExecutors,2
net2-thrift-netconf,net.dynamicAllocation.maxExecutors,20
或者是一字一句
ruby -rcsv -e 'CSV.foreach(ARGV.shift) {|r| puts CSV.generate_line(r.drop(2), :force_quotes=>false)}' test
https://unix.stackexchange.com/questions/432522
复制相似问题