blocks|key|1523770|text|对原始数据使用awk|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1523771|你要找的是一份打破控制的报告。这一次，维基百科的词条在这个问题上没有多大帮助。示例数据显示为已排序；因此，此解决方案假定数据已排序(但如果数据未排序，则在awk脚本之前添加排序操作非常简单；哦，因为数据来自数据库，所以数据库管理系统可以很好地对数据进行排序)。|1523772|出于测试目的，我创建了一个包含以下内容的文件awk.script：|1523773|{+++f1+=+$1
++++f2+=+substr($2,+1,+5)
++++if+(oldf1+!=+f1)
++++{
++++++++if+(oldf1+!=+0)
++++++++{
++++++++++++summary+=+summary+";"+oldf2+"="+f2_count
++++++++++++printf("%25s=%25d%25s\n",+oldf1,+f1_count,+summary)
++++++++}
++++++++oldf1+=+f1
++++++++f1_count+=+0
++++++++oldf2+=+f2
++++++++f2_count+=+0
++++++++summary+=+""
++++}
++++else+if+(oldf2+!=+f2)
++++{
++++++++summary+=+summary+";"+oldf2+"="+f2_count
++++++++oldf2+=+f2
++++++++f2_count+=+0
++++}
++++f1_count%2B%2B
++++f2_count%2B%2B
}
END+{
++++if+(oldf1+!=+0)
++++{
++++++++summary+=+summary+";"+oldf2+"="+f2_count
++++++++printf("%25s=%25d%25s\n",+oldf1,+f1_count,+summary)
++++}
}|code-block|syntax|javascript|1523774|并将这七行样本数据放入名为data的文件中，然后运行：|1523775|$+awk+-f+awk.script+data
3277654321=4;33301=2;33310=1;33320=1
3291234567=3;33399=2;33301=1
$|1523776|让DBMS做更多的工作|1523777|目前，数据类似于查询的输出，例如：|1523778|SELECT+Field1,+Field2
++FROM+SomeTable
+ORDER+BY+Field1,+Field2|1523779|通过让DBMS生成第一个字段、第二个字段的前5个字符和条目数计数，可以使输出更适合您的报表：|1523780|SELECT+field1,+SUBSTR(field2,+1,+5)+AS+field2,+COUNT(*)+AS+number
++FROM+SomeTable
+GROUP+BY+field1,+field2
+ORDER+BY+field1,+field2|1523781|这样就减少了通过网络传输的数据，如果数据库是远程的，这会有很大帮助。您还有一个更简单的报告。数据文件变为(data2)：|1523782|3277654321+33301+2
3277654321+33310+1
3277654321+33320+1
3291234567+33399+2
3291234567+33301+1|1523783|awk脚本变为(awk.script2)：|1523784|{+++
++++if+(oldf1+!=+$1)
++++{
++++++++if+(oldf1+!=+0)
++++++++++++printf("%25s=%25d%25s\n",+oldf1,+f1_count,+summary)
++++++++oldf1+=+$1
++++++++f1_count+=+0
++++++++summary+=+""
++++}
++++summary+=+summary+";"+$2+"="+$3
++++f1_count+%2B=+$3
}
END+{
++++if+(oldf1+!=+0)
++++++++printf("%25s=%25d%25s\n",+oldf1,+f1_count,+summary)
}|1523785|示例运行：|1523786|$+awk+-f+awk.script2+data2
3277654321=4;33301=2;33310=1;33320=1
3291234567=3;33399=2;33301=1
$|1523787|1523788|1523789|根据您的数据库管理系统以及它是否支持子查询中的GROUP_CONCAT和ORDER+BY子句，您可以注意到rici+suggested“它并没有那么凌乱”。|1523790|1523791|以下内容在SQLite3中似乎运行良好；对于MySQL，您需要在GROUP_CONCAT中将逗号更改为单词分隔符：|blockquote|1523792|1523793|SELECT+field1+%7C%7C+"=“%7C%7C+SUM(+count2+)+%7C%7C+";”%7C%7C+group_concat(field2+%7C%7C+"=“%7C%7C+count2，";")+AS+field+FROM+(SELECT+field1，SUBSTR(+field2，1，5)+AS+field2，COUNT(*)+AS+count2+FROM+tmp+GROUP+BY+field1，field2+ORDER+BY+field1，field2)+GROUP+BY+field1+ORDER+BY+field1|1523794|1523795|注意，据我所知，子查询中的GROUP_CONCAT和ORDER+BY子句都不是由ISO标准SQL定义的，因此并不是所有的数据库管理系统都支持这些特性。(由于原因，ORDER+BY功能被省略了，但推理不包括“正交性”的考虑。)|1523796|如果数据库管理系统以您需要的格式生成数据，则不需要awk脚本对其进行后处理。什么才是最好的，最终将取决于你还在做什么。通常，在有意义的地方使用DBMS进行计算。请不要使用数据库管理系统进行所有的格式化- I期望报告生成与分页等在数据库管理系统之外完成-但如果可以说服它生成您需要的数据，请务必让它完成这项工作。|1523797|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/users/1566221/rici|1|https://stackoverflow.com/questions/28001979/bash-script-group-and-count-by-a-specific-field?noredirect=1#comment44396600_28001979^0|7|3|0|25|3|0|M|A|0|0|D|4|0|0|0|0|0|0|0|1H|5|0|0|0|3|8|B|0|0|0|0|0|0|N|C|10|8|1H|4|0|1M|9|1|0|0|0|0|0|0|D|C|Q|8|0|P|3|0^^$0|@$1|2|3|4|5|6|7|24|8|@$9|25|A|26|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|27|8|@$9|28|A|29|B|C]]|D|@]|E|$]]|$1|H|3|I|5|6|7|2A|8|@$9|2B|A|2C|B|C]]|D|@]|E|$]]|$1|J|3|K|5|L|7|2D|8|@]|D|@]|E|$M|N]]|$1|O|3|P|5|6|7|2E|8|@$9|2F|A|2G|B|C]]|D|@]|E|$]]|$1|Q|3|R|5|L|7|2H|8|@]|D|@]|E|$M|N]]|$1|S|3|T|5|6|7|2I|8|@]|D|@]|E|$]]|$1|U|3|V|5|6|7|2J|8|@]|D|@]|E|$]]|$1|W|3|X|5|L|7|2K|8|@]|D|@]|E|$M|N]]|$1|Y|3|Z|5|6|7|2L|8|@]|D|@]|E|$]]|$1|10|3|11|5|L|7|2M|8|@]|D|@]|E|$M|N]]|$1|12|3|13|5|6|7|2N|8|@$9|2O|A|2P|B|C]]|D|@]|E|$]]|$1|14|3|15|5|L|7|2Q|8|@]|D|@]|E|$M|N]]|$1|16|3|17|5|6|7|2R|8|@$9|2S|A|2T|B|C]|$9|2U|A|2V|B|C]]|D|@]|E|$]]|$1|18|3|19|5|L|7|2W|8|@]|D|@]|E|$M|N]]|$1|1A|3|1B|5|6|7|2X|8|@]|D|@]|E|$]]|$1|1C|3|1D|5|L|7|2Y|8|@]|D|@]|E|$M|N]]|$1|1E|3|-4|5|6|7|2Z|8|@]|D|@]|E|$]]|$1|1F|3|T|5|6|7|30|8|@]|D|@]|E|$]]|$1|1G|3|1H|5|6|7|31|8|@$9|32|A|33|B|C]|$9|34|A|35|B|C]]|D|@$9|36|A|37|1|38]|$9|39|A|3A|1|3B]]|E|$]]|$1|1I|3|-4|5|6|7|3C|8|@]|D|@]|E|$]]|$1|1J|3|1K|5|1L|7|3D|8|@]|D|@]|E|$]]|$1|1M|3|-4|5|6|7|3E|8|@]|D|@]|E|$]]|$1|1N|3|1O|5|6|7|3F|8|@]|D|@]|E|$]]|$1|1P|3|-4|5|6|7|3G|8|@]|D|@]|E|$]]|$1|1Q|3|1R|5|6|7|3H|8|@$9|3I|A|3J|B|C]|$9|3K|A|3L|B|C]]|D|@]|E|$]]|$1|1S|3|1T|5|6|7|3M|8|@$9|3N|A|3O|B|C]]|D|@]|E|$]]|$1|1U|3|-4|5|6|7|3P|8|@]|D|@]|E|$]]]|1V|$1W|$5|1X|1Y|1Z|E|$20|21]]|22|$5|1X|1Y|1Z|E|$20|23]]]]

<h3>Using <code>awk</code> on the original data</h3>

What you're seeking is a control-break report. For once, the Wikipedia entry isn't much help on the subject. The sample data is shown sorted; this solution assumes that the data is sorted, therefore (but it is trivial to add a sort operation before the <code>awk</code> script if it isn't sorted; OTOH, since the data comes from a database, the DBMS could perfectly well sort the data).

For testing purposes, I created a file <code>awk.script</code> containing:

<pre><code>{ f1 = $1
 f2 = substr($2, 1, 5)
 if (oldf1 != f1)
 {
 if (oldf1 != 0)
 {
 summary = summary ";" oldf2 "=" f2_count
 printf("%s=%d%s\n", oldf1, f1_count, summary)
 }
 oldf1 = f1
 f1_count = 0
 oldf2 = f2
 f2_count = 0
 summary = ""
 }
 else if (oldf2 != f2)
 {
 summary = summary ";" oldf2 "=" f2_count
 oldf2 = f2
 f2_count = 0
 }
 f1_count++
 f2_count++
}
END {
 if (oldf1 != 0)
 {
 summary = summary ";" oldf2 "=" f2_count
 printf("%s=%d%s\n", oldf1, f1_count, summary)
 }
}
</code></pre>

And put the seven lines of sample data into a file called <code>data</code>, and then ran:

<pre><code>$ awk -f awk.script data
3277654321=4;33301=2;33310=1;33320=1
3291234567=3;33399=2;33301=1
$
</code></pre>

<h3>Make the DBMS do more work</h3>

At the moment, the data is similar to the output from a query such as:

<pre><code>SELECT Field1, Field2
 FROM SomeTable
 ORDER BY Field1, Field2
</code></pre>

The output could be made better for your report by having the DBMS generate the first field, the first 5 characters of the second field, and a count of the number of entries:

<pre><code>SELECT field1, SUBSTR(field2, 1, 5) AS field2, COUNT(*) AS number
 FROM SomeTable
 GROUP BY field1, field2
 ORDER BY field1, field2
</code></pre>

You then have less data being transferred over the wire, which helps a lot if the database is remote. You also have a simpler report. The data file becomes (<code>data2</code>):

<pre><code>3277654321 33301 2
3277654321 33310 1
3277654321 33320 1
3291234567 33399 2
3291234567 33301 1
</code></pre>

The <code>awk</code> script becomes (<code>awk.script2</code>):

<pre><code>{ 
 if (oldf1 != $1)
 {
 if (oldf1 != 0)
 printf("%s=%d%s\n", oldf1, f1_count, summary)
 oldf1 = $1
 f1_count = 0
 summary = ""
 }
 summary = summary ";" $2 "=" $3
 f1_count += $3
}
END {
 if (oldf1 != 0)
 printf("%s=%d%s\n", oldf1, f1_count, summary)
}
</code></pre>

Sample run:

<pre><code>$ awk -f awk.script2 data2
3277654321=4;33301=2;33310=1;33320=1
3291234567=3;33399=2;33301=1
$
</code></pre>

<hr>

<h3>Make the DBMS do even more work</h3>

Depending on your DBMS and whether it supports <code>GROUP_CONCAT</code> and <code>ORDER BY</code> clauses in sub-queries, you can note that <a href="https://stackoverflow.com/users/1566221/rici">rici</a> <a href="https://stackoverflow.com/questions/28001979/bash-script-group-and-count-by-a-specific-field?noredirect=1#comment44396600_28001979">suggested</a> "It's not that messy, IMHO".

<blockquote>
 The following seems to work fine in SQLite3; for MySQL, you'd need to change the comma to the word SEPARATOR in GROUP_CONCAT:

<pre><code>SELECT field1 || "=" || SUM(count2) || ";" ||
 group_concat(field2 || "=" || count2, ";") AS fields
 FROM (SELECT field1, SUBSTR(field2, 1, 5) AS field2, COUNT(*) AS count2
 FROM tmp
 GROUP BY field1, field2
 ORDER BY field1, field2
 )
 GROUP BY field1
 ORDER BY field1
</code></pre>
</blockquote>

Note that both <code>GROUP_CONCAT</code> and <code>ORDER BY</code> clauses in sub-queries are not defined by ISO standard SQL, as far as I know, so not all DBMS will support the features. (The ORDER BY feature was omitted for reason, but the reasoning didn't include considerations of 'orthogonality'.)

If the DBMS produces the data in the format you need, there's no need for an <code>awk</code> script to post-process it. What's best will ultimately depend on what else you're doing. Generally, use the DBMS to do calculations where it makes sense. IMO, don't use the DBMS for all formatting — I expect report generation with pagination etc to be done outside the DBMS proper — but if it can be persuaded to generate the data you need, by all means make it do the work.

blocks|key|5935096|text|朋友们，我想分享一个“优雅”的解决方案。感谢其他社区用户，他们为我提供了一些建议。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|5935097|awk+++++'NR>0+++{C1[$1]%2B%2B
+++++++++++++++++C2[$1,substr($2,1,5)]%2B%2B
++++++++++++++++}
+++++++++END+{for+(c2+in+C2)+{split+(c2,+cx,+SUBSEP);+print+cx[1]+"="+C1[cx[1]]+";"+cx[2]+"="+C2[c2]}}
++++++++'+SUBSEP=";"+out.txt+%7C+sort+%7C+awk+++++'$1+!=+L++++++++{printf+"%25s%25s",+LT,+$1;+L=$1;+LT="\n"}
++++++++++++++++++++++++{printf+";%25s",+$2}
+++++++++END+{printf+"\n"}
++++++++'+FS=";"

3277654321=4;33301=2;33310=1;33320=1
3291234567=3;33399=2;33301=1|code-block|syntax|javascript|5935098|rici，这不是我要求别人为我写代码的情况。这只是一个大脚本中非常小的一部分，所以我只是请求帮助如何做一件小事情。我对不同的方法很感兴趣，这就是为什么我不提供任何代码示例的原因。感谢所有参与这个问题的SO用户，我仍然愿意尝试不同的方法。|5935099|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

People, I would like to share an "elegant" solution. Thanks are flying to other community user who drive me suggesting some steps.

<pre><code>awk 'NR&gt;0 {C1[$1]++
 C2[$1,substr($2,1,5)]++
 }
 END {for (c2 in C2) {split (c2, cx, SUBSEP); print cx[1] "=" C1[cx[1]] ";" cx[2] "=" C2[c2]}}
 ' SUBSEP=";" out.txt | sort | awk '$1 != L {printf "%s%s", LT, $1; L=$1; LT="\n"}
 {printf ";%s", $2}
 END {printf "\n"}
 ' FS=";"

3277654321=4;33301=2;33310=1;33320=1
3291234567=3;33399=2;33301=1
</code></pre>

And rici, this is not the case I ask someone to write code for me. This is a very little part of a big script, so I just ask help on how to do a little thing. I'm interesting in different approach, that's why I ask without providing any code examples. Thanks to all SO users who partecipating to this question, I'm still open to try different approach.

sorry if I open a new question, but it's not related to a previous one since now I need a bash command to analyze the output.

I have an output from query stored in a file like this: 

<pre><code>3277654321 333011123456789
3277654321 333015123456789
3277654321 333103123456789
3277654321 333201123456789
3291234567 333991123456789
3291234567 333991123456789
3291234567 333011123456789
</code></pre>

What I need is a bash command to count the field1 and field2 having the same first 5 digits and report an output like this:

<pre><code>3277654321=4;33301=2;33310=1;33320=1 
3291234567=3;33399=2;33301=1
</code></pre>

Thanks
Lucas.

Bash script group and count by a specific field

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

如果我打开了一个新问题，很抱歉，但它与上一个问题无关，因为现在我需要一个bash命令来分析输出。我将query的输出存储在一个文件中，如下所示：3277654321    3330111234567893277654321    3330151234567893277654321    3331031234567893...

问Bash脚本按特定字段分组和计数
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Bash脚本按特定字段分组和计数EN