首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用Bash在重复行中的求和值

使用Bash在重复行中的求和值
EN

Stack Overflow用户
提问于 2020-06-20 00:00:17
回答 3查看 172关注 0票数 0

我在堆栈溢出上看到了这个问题的不同版本,但没有遇到解决这个特定用例的版本。

目标

根据整个行(不只是一列)查找重复行,最后一列中的值除外。删除除一个重复行之外的所有重复行,但首先将每个重复的最后一列中的值和起来,并在其余重复行的最后一列中显示结果值。我想在巴斯做这个。

用例

我有一个表格,每个网页在一个网站,它已经收到的数量,以及其他一些元数据。但是,表中的某些行表示相同的页,只具有不同的视图数。需要将这些视图相加,以显示每个页面的所有时间视图。

示例

原始档案:

代码语言:javascript
运行
复制
url,title,tag,version,guide,views
"https://website.com/1-1/section/product/page-title","Page Title 1",tag-1,"1-1","guide-1",100
"https://website.com/2-2/section/product/page-title","Page Title 2",tag-2,"2-2","guide-2",5
"https://website.com/1-1/section/product/page-title","Page Title 1",tag-1,"1-1","guide-1",15
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",10
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",20
"https://website.com/4-4/section/product/page-title","Page Title 4",tag-4,"4-4","guide-4",7
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",30

想要的档案:

代码语言:javascript
运行
复制
url,title,tag,version,guide,views
"https://website.com/1-1/section/product/page-title","Page Title 1",tag-1,"1-1","guide-1",115
"https://website.com/2-2/section/product/page-title","Page Title 2",tag-2,"2-2","guide-2",5
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",60
"https://website.com/4-4/section/product/page-title","Page Title 4",tag-4,"4-4","guide-4",7

在这里,我想做的是分享我尝试过的每个脚本迭代,并分解出成功和失败的地方。这是我的头绪,我甚至很难做到。我的过程是利用一些类似的堆栈溢出问题的答案(所有这些问题都在awk中,这对我来说是有意义的),并更改比较列。但是因为有些答案只是比较一列,所以我修改的结果是不一致和奇怪的。这些脚本非常复杂,我很难理解其中的原因。

有没有人能提供教育,告诉我如何去发现一个答案,或者一个指引我走向正确方向的例子?如果是的话谢谢你。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-06-20 01:03:12

无论引用的任何字段是否可以包含, (例如,如果任何带有"Page Title 1"占位符文本的字段实际上类似于"I, Robot - Page 1"),这都是可行的:

代码语言:javascript
运行
复制
$ awk '
    BEGIN { FS=OFS="," }
    NR==1 { print; next }
    { num=$NF; sub(/,[^,]*$/,""); sum[$0]+=num }
    END { for (key in sum) print key, sum[key] }
' file
url,title,tag,version,guide,views
"https://website.com/2-2/section/product/page-title","Page Title 2",tag-2,"2-2","guide-2",5
"https://website.com/4-4/section/product/page-title","Page Title 4",tag-4,"4-4","guide-4",7
"https://website.com/1-1/section/product/page-title","Page Title 1",tag-1,"1-1","guide-1",115
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",60
票数 4
EN

Stack Overflow用户

发布于 2020-06-20 00:27:53

使用GNU数据的一种方法

代码语言:javascript
运行
复制
$ echo "url,title,tag,version,guide,views" && datamash --header-in -st, -g1,2,3,4,5 sum 6 < input.txt
url,title,tag,version,guide,views
"https://website.com/1-1/section/product/page-title","Page Title 1",tag-1,"1-1","guide-1",115
"https://website.com/2-2/section/product/page-title","Page Title 2",tag-2,"2-2","guide-2",5
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",60
"https://website.com/4-4/section/product/page-title","Page Title 4",tag-4,"4-4","guide-4",7

或使用awk:

代码语言:javascript
运行
复制
$ awk -F, 'NR==1 { print; next }
           { groups[$1 "," $2 "," $3 "," $4 "," $5] += $6 }
           END { PROCINFO["sorted_in"] = "@ind_str_asc" # Sorted output when using GNU awk
                 for (g in groups) print g "," groups[g]
           }' input.txt
url,title,tag,version,guide,views
"https://website.com/1-1/section/product/page-title","Page Title 1",tag-1,"1-1","guide-1",115
"https://website.com/2-2/section/product/page-title","Page Title 2",tag-2,"2-2","guide-2",5
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",60
"https://website.com/4-4/section/product/page-title","Page Title 4",tag-4,"4-4","guide-4",7
票数 2
EN

Stack Overflow用户

发布于 2020-06-20 01:27:06

另一个awk

代码语言:javascript
运行
复制
$ awk -F, -v OFS=, 'NR==1 {print; next} 
                          {v=$NF; NF--; a[$0]+=v} 
                    END   {for(k in a) print k,a[k] | "sort"}' file

url,title,tag,version,guide,views
"https://website.com/1-1/section/product/page-title","Page Title 1",tag-1,"1-1","guide-1",115
"https://website.com/2-2/section/product/page-title","Page Title 2",tag-2,"2-2","guide-2",5
"https://website.com/3-3/section/product/page-title","Page Title 3",tag-3,"3-3","guide-3",60
"https://website.com/4-4/section/product/page-title","Page Title 4",tag-4,"4-4","guide-4",7

解释

打印标题行;保存值(最后一个字段),减少字段数,以便剩余的记录成为键($0),用键将值添加到累加器(将相等的键值之和)。最后打印键、值和排序。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62480103

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档