首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用jq合并和重新格式化JSON文件

使用jq合并和重新格式化JSON文件
EN

Stack Overflow用户
提问于 2022-07-27 18:50:16
回答 2查看 120关注 0票数 0

对于Twitter数据,我有两个不同的JSON文件与Python代码一起返回。第一个JSON文件如下所示:

A.json

代码语言:javascript
复制
{"tweet_id": "1212024242595926028", "username": "THPDPIO", "created_at": "2019-12- 
31T14:54:32.000Z", "tweets": {"0": "Folks, it\u2019s simple!! You know what to do and 
what not to do...don\u2019t drink and drive and you don\u2019t have to worry about 
ANY consequences.  Btw...jail is just a small part of it...think about the 
possibility killing someone or killing yourself...again it\u2019s simple... 
\ud83d\ude42 #stayhome"}}
{"tweet_id": "1212024242595926028", "username": "TheAliciaRanae", "created_at": 
"2019-12-31T15:11:51.000Z", "tweets": {"1": "@THPDPIO Stay home and drink and pass 
out and leave everyone else alone lol that\u2019s what I\u2019ll be doing lol HAPPY 
NEW YEAR!"}}
{"tweet_id": "1212024242595926028", "username": "duane4343", "created_at": "2019-12- 
31T15:21:37.000Z", "tweets": {"1": "@THPDPIO Happy New Year"}}
{"tweet_id": "1212024242595926028", "username": "HollyBr34731868", "created_at": 
"2019-12-31T15:24:25.000Z", "tweets": {"1": "@THPDPIO Hope everyone has a safe 
night."}}

{"tweet_id": "1211503874395254785", "username": "UNDPoliceDept", "created_at": "2019- 
12-30T04:26:46.000Z", "tweets": {"0": "Typical North Dakotan.... #BestCopsAround 
#NoTravelAdvised #StayHome"}}
{"tweet_id": "1211503874395254785", "username": "UNDPoliceDept", "created_at": "2019- 
12-30T04:27:40.000Z", "tweets": {"1": "@NDHighwayPatrol"}}
{"tweet_id": "1211503874395254785", "username": "BorgenEthan", "created_at": "2019- 
12-30T05:28:48.000Z", "tweets": {"1": "@UNDPoliceDept Nah i definitely look like the 
first one"}}

使用jq,我编写了一些命令来选择我想要的字段{NB:这是在https://jqplay.org}中使用命令{tweet_id: .tweet_id, username: .username, reply: .tweets} | group_by(.tweet_id)完成的,但我得到的错误是

问:错误(at :1):不能用字符串"tweet_id“索引字符串

我想要的输出是为第一个文件获取下面的示例:

代码语言:javascript
复制
{
"tweet_id": "1212024242595926028",
"username": "THPDPIO",
"reply": {
  "0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and drive and you don’t have to worry about ANY consequences.  Btw...jail is just a small part of it...think about the possibility killing someone or killing yourself...again it’s simple...  #stayhome",
  "1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol that’s what I’ll be doing lol HAPPY NEW YEAR!",
  "1": "@THPDPIO Happy New Year",
  "1": "@THPDPIO Hope everyone has a safe night."
}}

My问题:将所有回复链接到特定的tweet_id

第二个文件

B.json

代码语言:javascript
复制
{
"author_id": 80083199,
"tweet_id": 1212150612026151000,
"username": "CTVdavidspence",
"author_followers": 19572,
"author_tweets": 73406,
"author_description": "Retired broadcast Meteorologist. 2017 RTDNA Lifetime 
Achievement Award. Best of Calgary 2018, 2019.  AHS Patient and Family Advisor 
(volunteer)",
"author_location": "Calgary",
"text": "The Trans Canada Highway near #Sicamous BC.  #stayhome   Image from 
@DriveBC.",
"created_at": 1577834201000,
"retweets": 12,
"replies": 4,
"likes": 27,
"quote_count": 0}
{
"author_id": 848959032370921500,
"tweet_id": 1212024242595926000,
"username": "THPDPIO",
"author_followers": 4626,
"author_tweets": 2383,
"author_description": "Police Sgt.",
"author_location": "Terre Haute, IN",
"text": "Folks, it’s simple!! You know what to do and what not to do...don’t drink 
 and drive and you don’t have to worry about ANY consequences.  Btw...jail is just a 
 small part of it...think about the possibility killing someone or killing 
 yourself...again it’s simple... \n#stayhome",
"created_at": 1577804072000,
"retweets": 11,
"replies": 9,
"likes": 84,
"quote_count": 1}

我希望合并我的最后一个json文件,这样tweet_id就成为引用,回复和用户名成为合并它们的键。因此,最后的json文件将如下所示:

代码语言:javascript
复制
{
"author_id": 848959032370921500,
"tweet_id": 1212024242595926000,
"username": "THPDPIO",
"author_followers": 4626,
"author_tweets": 2383,
"author_description": "Police Sgt.",
"author_location": "Terre Haute, IN",
"text": "Folks, it’s simple!! You know what to do and what not to do...don’t drink 
 and drive and you don’t have to worry about ANY consequences.  Btw...jail is just a 
 small part of it...think about the possibility killing someone or killing 
 yourself...again it’s simple... \n#stayhome",
"created_at": 1577804072000,
"retweets": 11,
"replies": 9,
"likes": 84,
"quote_count": 1,
"reply": {
  "0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and 
  drive and you don’t have to worry about ANY consequences.  Btw...jail is just a 
  small part of it...think about the possibility killing someone or killing 
  yourself...again it’s simple...  #stayhome",
  "1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol 
  that’s what I’ll be doing lol HAPPY NEW YEAR!",
  "1": "@THPDPIO Happy New Year",
  "1": "@THPDPIO Hope everyone has a safe night."
}}

我将感谢在这方面的任何帮助。谢谢。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-07-27 20:58:25

至于第二个问题,您可以在JOIN上使用B.json和自定义索引(例如。(第一期)在A.json

代码语言:javascript
复制
jq --slurpfile A A.json --slurpfile B B.json -n '
  JOIN(
    $A | reduce group_by(.tweet_id)[] as $g (
      {}; .[$g[0].tweet_id].reply += ($g | map(.tweets))
    );
    $B[]; .tweet_id | @text; add
  )
'
代码语言:javascript
复制
{
  "author_id": 80083199,
  "tweet_id": 1212150612026151000,
  "username": "CTVdavidspence",
  "author_followers": 19572,
  "author_tweets": 73406,
  "author_description": "Retired broadcast Meteorologist. 2017 RTDNA Lifetime Achievement Award. Best of Calgary 2018, 2019.  AHS Patient and Family Advisor (volunteer)",
  "author_location": "Calgary",
  "text": "The Trans Canada Highway near #Sicamous BC.  #stayhome   Image from @DriveBC.",
  "created_at": 1577834201000,
  "retweets": 12,
  "replies": 4,
  "likes": 27,
  "quote_count": 0
}
{
  "author_id": 848959032370921500,
  "tweet_id": 1212024242595926000,
  "username": "THPDPIO",
  "author_followers": 4626,
  "author_tweets": 2383,
  "author_description": "Police Sgt.",
  "author_location": "Terre Haute, IN",
  "text": "Folks, it’s simple!! You know what to do and what not to do...don’t drink  and drive and you don’t have to worry about ANY consequences.  Btw...jail is just a  small part of it...think about the possibility killing someone or killing  yourself...again it’s simple... \n#stayhome",
  "created_at": 1577804072000,
  "retweets": 11,
  "replies": 9,
  "likes": 84,
  "quote_count": 1,
  "reply": [
    {
      "0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and drive and you don’t have to worry about ANY consequences.  Btw...jail is just a small part of it...think about the possibility killing someone or killing yourself...again it’s simple...  #stayhome"
    },
    {
      "1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol that’s what I’ll be doing lol HAPPY NEW YEAR!"
    },
    {
      "1": "@THPDPIO Happy New Year"
    },
    {
      "1": "@THPDPIO Hope everyone has a safe night."
    }
  ]
}

演示

票数 1
EN

Stack Overflow用户

发布于 2022-07-27 20:28:22

本答复涉及第一个问题:

将所有答复链接到特定的tweet_id

正如@pmf所指出的,您的示例输出不会达到目标。而且,group_by需要一个数组作为输入。因此,应考虑:

代码语言:javascript
复制
< A.json jq -n '
  def accumulate_by(stream; f):
    reduce stream as $x ({}; (f + [$x|f]) as $v | . + $x | f = $v );
    
  [inputs | {tweet_id, username, reply: .tweets}] | group_by(.tweet_id)
  | map( accumulate_by(.[]; .reply ))
'

注意,在设计中,这忽略了.username值中的“冲突”;您可能需要进一步考虑这一点。不管怎么说,用你的样本,结果是:

代码语言:javascript
复制
[
  {
    "tweet_id": "1211503874395254785",
    "username": "BorgenEthan",
    "reply": [
      {
        "0": "Typical North Dakotan.... #BestCopsAround #NoTravelAdvised #StayHome"
      },
      {
        "1": "@NDHighwayPatrol"
      },
      {
        "1": "@UNDPoliceDept Nah i definitely look like the first one"
      }
    ]
  },
  {
    "tweet_id": "1212024242595926028",
    "username": "HollyBr34731868",
    "reply": [
      {
        "0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and drive and you don’t have to worry about ANY consequences.  Btw...jail is just a small part of it...think about the possibility killing someone or killing yourself...again it’s simple...  #stayhome"
      },
      {
        "1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol that’s what I’ll be doing lol HAPPY NEW YEAR!"
      },
      {
        "1": "@THPDPIO Happy New Year"
      },
      {
        "1": "@THPDPIO Hope everyone has a safe night."
      }
    ]
  }
]
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73143200

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档