首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >使用jq工具解析Json文件

使用jq工具解析Json文件
EN

Stack Overflow用户
提问于 2019-05-30 10:43:51
回答 5查看 346关注 0票数 1

我有以下嵌套的json文件,我想用jq工具解析它并以表格的形式打印出来,就像我在最后展示的那样。

input.json结构如下所示:

{
 "document":{
  "page":[
     {
        "@index":"0",
        "image":{
           "@data":"ABC",
           "@format":"png",
           "@height":"620.00",
           "@type":"base64encoded",
           "@width":"450.00",
           "@x":"85.00",
           "@y":"85.00"
        }
     },
     {
        "@index":"1",
        "row":[
           {
              "column":[
                 {
                    "text":""
                 },
                 {
                    "text":{
                       "#text":"Text1",
                       "@fontName":"Arial",
                       "@fontSize":"12.0",
                       "@height":"12.00",
                       "@width":"71.04",
                       "@x":"121.10",
                       "@y":"83.42"
                    }
                 }
              ]
           },
           {
              "column":[
                 {
                    "text":""
                 },
                 {
                    "text":{
                       "#text":"Text2",
                       "@fontName":"Arial",
                       "@fontSize":"12.0",
                       "@height":"12.00",
                       "@width":"101.07",
                       "@x":"121.10",
                       "@y":"124.82"
                    }
                 }
              ]
           }
        ]
     },
     {
        "@index":"2",
        "row":[
           {
              "column":{
                 "text":{
                    "#text":"Text3",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"363.44",
                    "@x":"85.10",
                    "@y":"69.62"
                 }
              }
           },
           {
              "column":{
                 "text":{
                    "#text":"Text4",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"382.36",
                    "@x":"85.10",
                    "@y":"83.42"
                 }
              }
           },
           {
              "column":{
                 "text":{
                    "#text":"Text5",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"435.05",
                    "@x":"85.10",
                    "@y":"97.22"
                 }
              }
           }
        ]
     },
     {
        "@index":"3"
     }
  ]
 }
}

根据以下问题(Parsing nested json with jq)的答案,我已经尝试了此代码,但不起作用

$ cat file.json | jq .document.page[].row | ["#text", "@x", "@y"] | @csv

我想要得到的输出是:

#text @x     @y
Text1 121.10 83.42
Text2 121.10 124.82
Text3 65.10  69.62
Text4 85.10  83.42
Text5 85.10  97.22

如何才能做到这一点?

谢谢

更新

非常感谢你的帮助。我用真实的文件尝试了更长的时间。

我能够采用第一个峰值的解决方案,如下所示:

["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"], 
( .. 
| objects 
| select(has("#text","@data")) 
| [.["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
)  
| @tsv

有了新的输入,我得到了这个表:

+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| #text         | @data | @fontName | @fontSize | @format | @height | @type         | @width | @x     | @y     |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
|               | ABC   |           |           | png     | 620     | base64encoded | 450    | 85     | 85     |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ä 1      |       | Tahoma    | 12        |         | 12      |               | 427.79 | 85.1   | 69.62  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ¢76      |       | Tahoma    | 12        |         | 12      |               | 270.5  | 85.1   | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text % 5      |       | Tahoma    | 12        |         | 12      |               | 130.84 | 358.86 | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 7Ç8      |       | Tahoma    | 12        |         | 12      |               | 115.95 | 85.1   | 704.52 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text • 2 Wñ79 |       | Tahoma    | 8         |         | 8.04    |               | 398.16 | 121.1  | 68.06  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text          |       | Tahoma    | 12        |         | 12      |               | 101.5  | 85.1   | 83.42  |
|   » 1 A\\\\CÓ |       |           |           |         |         |               |        |        |        |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 12       |       | Tahoma    | 12        |         | 12      |               | 312.26 | 189.83 | 83.42  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 82       |       | Tahoma    | 12        |         | 12      |               | 44.99  | 85.1   | 97.22  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 31       |       | Tahoma    | 8         |         | 8.04    |               | 381.83 | 133.1  | 95.66  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+

如果可能,如何添加以下3列(计数器、页和行)以了解每行对应的页和行?

预期输出将如下所示:

+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| counter | page | row | #text             | @data | @fontName | @fontSize | @format | @height | @type         | @width | @x     | @y     |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 1     | 0    |     |                   | ABC   |           |           | png     | 620     | base64encoded | 450    | 85     | 85     |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 2     | 1    | 0   | Text ä 1          |       | Tahoma    | 12        |         | 12      |               | 427.79 | 85.1   | 69.62  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 3     | 1    | 1   | Text ¢76          |       | Tahoma    | 12        |         | 12      |               | 270.5  | 85.1   | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 4     | 1    | 1   | Text % 5          |       | Tahoma    | 12        |         | 12      |               | 130.84 | 358.86 | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 5     | 2    | 2   | Text 7Ç8          |       | Tahoma    | 12        |         | 12      |               | 115.95 | 85.1   | 704.52 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 6     | 2    | 0   | Text • 2 Wñ79     |       | Tahoma    | 8         |         | 8.04    |               | 398.16 | 121.1  | 68.06  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 7     | 2    | 1   | Text  » 1 A\\\\CÓ |       | Tahoma    | 12        |         | 12      |               | 101.5  | 85.1   | 83.42  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 8     | 2    | 1   | Text 12           |       | Tahoma    | 12        |         | 12      |               | 312.26 | 189.83 | 83.42  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 9     | 2    | 2   | Text 82           |       | Tahoma    | 12        |         | 12      |               | 44.99  | 85.1   | 97.22  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 10    | 2    | 2   | Text 31           |       | Tahoma    | 8         |         | 8.04    |               | 381.83 | 133.1  | 95.66  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+

这是一个新的更具代表性的输入文件input2.json

通过查看下图中的Json结构,可以了解json文件中存在的page编号和row编号以及其中的值。

EN

回答 5

Stack Overflow用户

回答已采纳

发布于 2019-05-31 06:56:14

处理input2.json

由于input2.json对应的第二组需求需要一些上下文相关的信息,因此不能忽略上下文,因此下面的解决方案使用“向下钻取”方法。除非您理解foreach,否则下面的代码会有点难以理解,所以我只想提一下,该方法基本上使用了一个状态变量{counter,page,row}来跟踪这三个计数器。

["counter", "page", "row", "#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"], 
(foreach (.document.page[] | objects) as $page ({page: -1, counter: 0};
  .page += 1
  | foreach ($page | .row[]?) as $row (.row=-1;
    .row += 1
    | foreach ($row | (.column | (if type == "array" then .[] else . end )) | .text | objects) as $x (.;
      .counter += 1
      | .out = [.counter, .page, .row, $x["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
      ; . )
      ; . )
      ; .out )
)
| @tsv

这会产生所需的TSV,但第一行数据除外,因为它没有行。我在Relate elements in table form from Json file with jq的答案中给出了包含第一行的一种方法

票数 1
EN

Stack Overflow用户

发布于 2019-05-30 14:18:38

这里有一个简单的(也许太简单了?)专注于具有"#text“属性的嵌入式JSON对象的方法:

["#text", "@x", "@y"],       # the header
( ..
  | objects
  | select(has("#text"))  
  | [.["#text", "@x", "@y"]] # a row
) 
| @csv

当给定此程序和示例输入时,使用-r选项调用jq将生成:

"#text","@x","@y"
"Text1","121.10","83.42"
"Text2","121.10","124.82"
"Text3","85.10","69.62"
"Text4","85.10","83.42"
"Text5","85.10","97.22"

如果您不想要引号,并且愿意冒着输出不是严格意义上的CSV的风险,那么一种选择是在管道的末尾使用join(",")而不是@csv

变体

您可能希望使用@tsv而不是@csv

如果需要一种更严格的方法来选择相关的嵌入式对象,那么也许用.. | .text?替换..就足够了。

如果没有,可以根据具体要求添加额外的过滤器。

票数 2
EN

Stack Overflow用户

发布于 2019-05-30 14:33:16

这是一个使用“向下钻取”的解决方案,因此相当单调乏味:

["#text", "@x", "@y"],
( .document.page[]
  | .row[]?
  | .column
  | (if type == "array" then .[] else . end)
  | .text
  | objects
  | [.["#text", "@x", "@y"]]
)
| @tsv

这将与-r命令行选项一起使用。

我使用了@tsv,因为它产生的输出类似于给定的预期输出。正如本页其他地方所提到的,还有其他选择,例如使用join/1

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56370993

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档