blocks|key|572196|text|awk+'BEGIN{print+"Ident+-+MD,Node+ID,Date,Time,Sub+Seq#,NO2..."}
+++++if(FNR+>+1){print}'+*.csv+>+bigfile.csv|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|572197|FNR在每个文件之后重置那个awk进程，但是NR不和NR=FNR只用于第一个文件。|unstyled|offset|length|style|CODE|572198|一个小插图(当然是用我的测试数据)|BOLD|572199|$+cat+f1
Name,Roll
A,10
B,5
5$+cat+f2
Name,Roll
C,56
D,44
$+awk+'BEGIN{print+"Naam,RollNo"}FNR+>+1{print}'+f*>final
$+cat+final+
Naam,RollNo
A,10
B,5
C,56
D,44|572200|Note|572201|正如您所看到的，最后一个文件的新头被转到awk+BEGIN部分，该部分在开始时才被执行。|572202|来到您的目标|572203|我想要的原始.csv文件中的每一行都有第一列"MD“项|blockquote|572204|awk+'BEGIN{FS=",";print+"Ident+-+MD,Node+ID,Date,Time,Sub+Seq#,NO2..."}
+++++if(FNR+>+1+&&+$1+==+"MD"+&&+NF+==+51){print}'+*.csv+>+bigfile.csv|572205|Notes|572206|这一条与第一种一般情况没有什么区别。|572207|它引入了,作为现场分选器。|unordered-list-item|572208|FNR+>+1+&&+$1+==+"MD"的意思是，只有当第一个字段是MD($1+==+"MD")且字段数为51(NF+==+51)时，我才不想要标题(FNR=1)和打印内容。|572209|惯用方式|572210|正如[+@ghoti+]在他的评论中提到的：|572211|awk的“默认”命令已经是{print}|572212|因此，上面的脚本可以重写为：|572213|awk+'BEGIN{FS=",";print+"Ident+-+MD,Node+ID,Date,Time,Sub+Seq#,NO2..."}
+++++++++(FNR+>+1+&&+NF+==+51+&&+$1+==+"MD")'+*.csv+>+bigfile.csv|572214|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/users/1072112/ghoti^0|0|0|3|Q|6|0|0|5|0|0|0|4|0|O|5|0|0|6|0|0|0|0|5|0|0|4|1|0|0|L|0|0|4|0|2|A|0|0|D|7|0|0|0^^$0|@$1|2|3|4|5|6|7|1R|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|1S|8|@$G|1T|H|1U|I|J]|$G|1V|H|1W|I|J]]|9|@]|A|$]]|$1|K|3|L|5|F|7|1X|8|@$G|1Y|H|1Z|I|M]]|9|@]|A|$]]|$1|N|3|O|5|6|7|20|8|@]|9|@]|A|$B|C]]|$1|P|3|Q|5|F|7|21|8|@$G|22|H|23|I|M]]|9|@]|A|$]]|$1|R|3|S|5|F|7|24|8|@$G|25|H|26|I|J]]|9|@]|A|$]]|$1|T|3|U|5|F|7|27|8|@$G|28|H|29|I|M]]|9|@]|A|$]]|$1|V|3|W|5|X|7|2A|8|@]|9|@]|A|$]]|$1|Y|3|Z|5|6|7|2B|8|@]|9|@]|A|$B|C]]|$1|10|3|11|5|F|7|2C|8|@$G|2D|H|2E|I|M]]|9|@]|A|$]]|$1|12|3|13|5|F|7|2F|8|@]|9|@]|A|$]]|$1|14|3|15|5|16|7|2G|8|@$G|2H|H|2I|I|J]]|9|@]|A|$]]|$1|17|3|18|5|16|7|2J|8|@$G|2K|H|2L|I|J]]|9|@]|A|$]]|$1|19|3|1A|5|F|7|2M|8|@$G|2N|H|2O|I|M]]|9|@]|A|$]]|$1|1B|3|1C|5|F|7|2P|8|@]|9|@$G|2Q|H|2R|1|2S]]|A|$]]|$1|1D|3|1E|5|X|7|2T|8|@$G|2U|H|2V|I|J]]|9|@]|A|$]]|$1|1F|3|1G|5|F|7|2W|8|@]|9|@]|A|$]]|$1|1H|3|1I|5|6|7|2X|8|@]|9|@]|A|$B|C]]|$1|1J|3|-4|5|F|7|2Y|8|@]|9|@]|A|$]]]|1K|$1L|$5|1M|1N|1O|A|$1P|1Q]]]]

<pre><code>awk 'BEGIN{print "Ident - MD,Node ID,Date,Time,Sub Seq#,NO2..."}
 if(FNR &gt; 1){print}' *.csv &gt; bigfile.csv
</code></pre>

<code>FNR</code> resets after each file that awk process, but NR doesn't and <code>NR=FNR</code> only for the first file.

<hr>

A small Illustration (of course with my test data)

<pre><code>$ cat f1
Name,Roll
A,10
B,5
5$ cat f2
Name,Roll
C,56
D,44
$ awk 'BEGIN{print "Naam,RollNo"}FNR &gt; 1{print}' f*&gt;final
$ cat final 
Naam,RollNo
A,10
B,5
C,56
D,44
</code></pre>

Note

As you could see, the new header for the final file went to awk <code>BEGIN</code> section which get executed only at the beginning.

<hr>

Coming to your objective

<blockquote>
 Every row that I want from the original .csv files has the entry "MD"
 the first column
</blockquote>

<pre><code>awk 'BEGIN{FS=",";print "Ident - MD,Node ID,Date,Time,Sub Seq#,NO2..."}
 if(FNR &gt; 1 &amp;&amp; $1 == "MD" &amp;&amp; NF == 51){print}' *.csv &gt; bigfile.csv
</code></pre>

Notes

This one has few differences from the first general case.

<ul>
<li>It introduces <code>,</code> as the field seperator</li>
<li><code>FNR &gt; 1 &amp;&amp; $1 == "MD"</code> means hey I don't want the header(FNR=1) and print stuff only when first field is MD($1 == "MD") and the number of fields is 51(NF == 51)</li>
</ul>

<hr>

The Idiomatic way

As <a href="https://stackoverflow.com/users/1072112/ghoti">[ @ghoti ]</a> mentioned in his comment :

<blockquote>
 awk's "default" command is already <code>{print}</code>
</blockquote>

So the above script may be re-written as :

<pre><code>awk 'BEGIN{FS=",";print "Ident - MD,Node ID,Date,Time,Sub Seq#,NO2..."}
 (FNR &gt; 1 &amp;&amp; NF == 51 &amp;&amp; $1 == "MD")' *.csv &gt; bigfile.csv
</code></pre>

blocks|key|611121|text|一条花哨的单线邮轮会想：-|type|unstyled|depth|inlineStyleRanges|entityRanges|data|611122|awk+-F','+'NR+>+1+&&+$1+~+/%5EMD/+&&+NF+==+51+{+print+}'+*.csv+>+/someotherpath/bigfile.csv|code-block|syntax|javascript|611123|使用完整的bash脚本的适当方法应该是类似于类似的东西，而不是花哨的一行：-|offset|length|style|CODE|611124|#!/bin/bash

#+Am+assuming+the+the+'.csv'+files+are+a+single+','+separated+

for+i+in+*.csv;+do
++++[+-e+"$i"+]+%7C%7C+continue++++#+To+handle+when+no+input+*.csv+files+present
++++awk+-F','+'NR+>+1+&&+$1+~+/%5EMD/+&&+NF+==+51++{+print+}'+"$i"+>+/someotherpath/bigfile.csv
done|611125|解决方案的关键是使用awk的NR+&+NF变量，该变量跟踪行内的当前行和nth字段，因此理想情况下，NR+>+1将跳过正在处理的标题部分，$1+~+/%5EMD/只返回第一列以模式开头的行，NF+==51打印包含51个字段的行。|611126|entityMap^0|0|0|5|4|0|0|A|3|E|2|J|2|10|3|1E|6|1X|A|2L|7|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|T|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|U|8|@$I|V|J|W|K|L]]|9|@]|A|$]]|$1|M|3|N|5|D|7|X|8|@]|9|@]|A|$E|F]]|$1|O|3|P|5|6|7|Y|8|@$I|Z|J|10|K|L]|$I|11|J|12|K|L]|$I|13|J|14|K|L]|$I|15|J|16|K|L]|$I|17|J|18|K|L]|$I|19|J|1A|K|L]|$I|1B|J|1C|K|L]]|9|@]|A|$]]|$1|Q|3|-4|5|6|7|1D|8|@]|9|@]|A|$]]]|R|$]]

A fancy one-liner would like:-

<pre><code>awk -F',' 'NR &gt; 1 &amp;&amp; $1 ~ /^MD/ &amp;&amp; NF == 51 { print }' *.csv &gt; /someotherpath/bigfile.csv
</code></pre>

A proper way with the complete <code>bash</code> script would be something like instead of fancy one-liners:-

<pre><code>#!/bin/bash

# Am assuming the the '.csv' files are a single ',' separated 

for i in *.csv; do
 [ -e "$i" ] || continue # To handle when no input *.csv files present
 awk -F',' 'NR &gt; 1 &amp;&amp; $1 ~ /^MD/ &amp;&amp; NF == 51 { print }' "$i" &gt; /someotherpath/bigfile.csv
done
</code></pre>

The crux of the solution is using <code>awk</code>'s <code>NR</code> &amp; <code>NF</code> variables, which keeps track of the current row and the <code>nth</code> field within the row, so ideally <code>NR &gt; 1</code> would skip the header part from being processed and <code>$1 ~ /^MD/</code> returns only the lines in the file whose first column starts with the pattern and <code>NF ==51</code> prints those lines containing exactly 51 fields.

blocks|key|568864|text|我将在这里讨论一下，假设您在sed中添加的行实际上是您要去掉的标题。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|568865|如果是这样的话，我建议您跳过sed行，然后告诉awk删除不是第一行的文件的第一行。|568866|接下来，如果只想在第一个字段中包含文本MD的行，可以使用简单的regex进行测试。|568867|awk+-F,+'
++++FNR==1+&&+NR+>+1+{+next+}++#+skip+the+header+on+all+but+the+first+file
++++NF+!=+51+{+next+}++++++++++#+skip+this+line+if+field+count+is+wrong
++++$1+~+/MD/++++++++++++++++++#+print+the+line+if+the+first+field+matches
'+*.csv+>+/path/to/outputfile.csv|code-block|syntax|javascript|568868|-F,选项告诉awk使用逗号分隔字段。|unordered-list-item|568869|NR是处理的记录总数，而FNR是当前文件中的当前记录号。|568870|没有命令的条件假设print为命令(打印当前行)。|568871|当然，如果您愿意，可以将整个awk脚本放在一行上。为了便于阅读，我把它分开了。|568872|如果您的outputfile.csv位于获取输入csv文件的"glob“目录中，那么请注意，新文件将由shell创建，而不是由awk创建，并且也可能作为输入文件处理。如果您计划使用>>将重定向附加到现有文件，这可能会引起关注。|568873|更新|BOLD|568874|正如您已经提到的，您要添加的头与您去掉的标头不同，通过将awk脚本更改为如下所示，您仍然可以避免使用sed这样的单独命令：|568875|awk+-F,+'
++++BEGIN+{
++++++print+"Ident+-+MD,Node+ID,Date,Time,Sub+Seq#,NO2..."
++++}
++++FNR==1+{+next+}++++++++++++#+skip+the+header+on+all+files
++++NF+!=+51+{+next+}++++++++++#+skip+this+line+if+field+count+is+wrong
++++$1+~+/MD/++++++++++++++++++#+print+the+line+if+the+first+field+matches
'+*.csv+>+/path/to/outputfile.csv|568876|在处理任何输入行之前，将执行awk的BEGIN块中的命令，因此，如果在那里打印新的标题，它们将出现在(重定向)输出的开头。(请注意，如果要在所有输入处理后生成脚注/汇总/etc，则存在类似的END块。)|568877|entityMap^0|E|3|0|E|3|0|J|2|0|0|0|3|0|0|2|C|3|0|9|5|0|0|2I|2|0|0|2|0|0|0|I|5|2N|3|0^^$0|@$1|2|3|4|5|6|7|1A|8|@$9|1B|A|1C|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|1D|8|@$9|1E|A|1F|B|C]]|D|@]|E|$]]|$1|H|3|I|5|6|7|1G|8|@$9|1H|A|1I|B|C]]|D|@]|E|$]]|$1|J|3|K|5|L|7|1J|8|@]|D|@]|E|$M|N]]|$1|O|3|P|5|Q|7|1K|8|@$9|1L|A|1M|B|C]]|D|@]|E|$]]|$1|R|3|S|5|Q|7|1N|8|@$9|1O|A|1P|B|C]|$9|1Q|A|1R|B|C]]|D|@]|E|$]]|$1|T|3|U|5|Q|7|1S|8|@$9|1T|A|1U|B|C]]|D|@]|E|$]]|$1|V|3|W|5|6|7|1V|8|@]|D|@]|E|$]]|$1|X|3|Y|5|6|7|1W|8|@$9|1X|A|1Y|B|C]]|D|@]|E|$]]|$1|Z|3|10|5|6|7|1Z|8|@$9|20|A|21|B|11]]|D|@]|E|$]]|$1|12|3|13|5|6|7|22|8|@]|D|@]|E|$]]|$1|14|3|15|5|L|7|23|8|@]|D|@]|E|$M|N]]|$1|16|3|17|5|6|7|24|8|@$9|25|A|26|B|C]|$9|27|A|28|B|C]]|D|@]|E|$]]|$1|18|3|-4|5|6|7|29|8|@]|D|@]|E|$]]]|19|$]]

I'm going to go out on a limb here and assume that the line you're adding with <code>sed</code> is actually the headers that you're stripping off.

If that's the case, I'd suggest you skip the <code>sed</code> line, and just tell awk to strip the first line on files that are not the first one.

Next, if you only want lines containing the text <code>MD</code> in the first field, you can test that with a simple regex.

<pre><code>awk -F, '
 FNR==1 &amp;&amp; NR &gt; 1 { next } # skip the header on all but the first file
 NF != 51 { next } # skip this line if field count is wrong
 $1 ~ /MD/ # print the line if the first field matches
' *.csv &gt; /path/to/outputfile.csv
</code></pre>

<ul>
<li>The <code>-F,</code> option tells awk to split fields using a comma as field separator.</li>
<li><code>NR</code> is the total number of records processed, while <code>FNR</code> is the current record number in the current file.</li>
<li>A condition with no commands assumes <code>print</code> as the command (printing the current line).</li>
</ul>

You can of course put this entire awk script on one line if you like. I split it out for easier reading.

If your outputfile.csv is in the same directory where you are getting your "glob" of input csv files, then be aware that the new file will be created by the shell, not by awk, and might also be processed as an input file. This could be a concern if you were planning to append your redirect to an existing file with <code>&gt;&gt;</code>.

UPDATE

As you've mentioned that the headers you're adding are different from the ones you strip off, you can still avoid using a separate command like sed, by changing the awk script to something like this:

<pre><code>awk -F, '
 BEGIN {
 print "Ident - MD,Node ID,Date,Time,Sub Seq#,NO2..."
 }
 FNR==1 { next } # skip the header on all files
 NF != 51 { next } # skip this line if field count is wrong
 $1 ~ /MD/ # print the line if the first field matches
' *.csv &gt; /path/to/outputfile.csv
</code></pre>

Commands within awk's <code>BEGIN</code> block are executed before any input lines are processed, so if you print new headers there, they will appear at the beginning of your (redirected) output. (Note that there is a similar <code>END</code> block if you want to generate a footer/summary/etc after all input has been processed.)

I'm another bash scripting newbie (having just discovered it, it blew my mind! It's so exciting)
What I want to do is have a script that compiles a LOT of .csv files into just one bigfile.csv, removing the headers, and inserting my own header. I discovered the following solution: 

<pre><code>awk 'FNR &gt; 1' *.csv &gt; bigfile.csv
sed -i 1i"Ident - MD,Node ID,Date,Time,Sub Seq#,NO2..." bigfile.csv
</code></pre>

Great! But when I try and use this file for analysis I'm getting errors because of bad lines. I had a look at it and indeed, there are a few crazy entries in there.

Luckily, every row that I want from the original .csv files has the entry "MD" the first column. So does anyone know how I can tell awk to only take the lines form the .csv files that have "MD" in their first cell?

EDIT: Thanks for your help guys, it worked a charm!
Unfortunately there's still some weird data in there

<pre><code>CParserError: Error tokenizing data. C error: Expected 51 fields in line 6589, saw 54
</code></pre>

With a simple adjustment, is there a way to again only take lines with 51 fields?

Bash Scripting compliling specific csv rows

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我是另一个编写脚本的新手(刚刚发现了，这让我大吃一惊！)我想要做的是有一个脚本，将大量的.csv文件编译成一个bigfile.csv，移除头，并插入我自己的头。我发现了以下解决方案：awk 'FNR > 1' *.csv > bigfile.csvsed -i 1i"Ident - MD,Node ID,Date,Ti...

问编写特定csv行的Bash脚本
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问编写特定csv行的Bash脚本EN