对于Twitter数据,我有两个不同的JSON文件与Python代码一起返回。第一个JSON文件如下所示:
A.json
{"tweet_id": "1212024242595926028", "username": "THPDPIO", "created_at": "2019-12-
31T14:54:32.000Z", "tweets": {"0": "Folks, it\u2019s simple!! You know what to do and
what not to do...don\u2019t drink and drive and you don\u2019t have to worry about
ANY consequences. Btw...jail is just a small part of it...think about the
possibility killing someone or killing yourself...again it\u2019s simple...
\ud83d\ude42 #stayhome"}}
{"tweet_id": "1212024242595926028", "username": "TheAliciaRanae", "created_at":
"2019-12-31T15:11:51.000Z", "tweets": {"1": "@THPDPIO Stay home and drink and pass
out and leave everyone else alone lol that\u2019s what I\u2019ll be doing lol HAPPY
NEW YEAR!"}}
{"tweet_id": "1212024242595926028", "username": "duane4343", "created_at": "2019-12-
31T15:21:37.000Z", "tweets": {"1": "@THPDPIO Happy New Year"}}
{"tweet_id": "1212024242595926028", "username": "HollyBr34731868", "created_at":
"2019-12-31T15:24:25.000Z", "tweets": {"1": "@THPDPIO Hope everyone has a safe
night."}}
{"tweet_id": "1211503874395254785", "username": "UNDPoliceDept", "created_at": "2019-
12-30T04:26:46.000Z", "tweets": {"0": "Typical North Dakotan.... #BestCopsAround
#NoTravelAdvised #StayHome"}}
{"tweet_id": "1211503874395254785", "username": "UNDPoliceDept", "created_at": "2019-
12-30T04:27:40.000Z", "tweets": {"1": "@NDHighwayPatrol"}}
{"tweet_id": "1211503874395254785", "username": "BorgenEthan", "created_at": "2019-
12-30T05:28:48.000Z", "tweets": {"1": "@UNDPoliceDept Nah i definitely look like the
first one"}}使用jq,我编写了一些命令来选择我想要的字段{NB:这是在https://jqplay.org}中使用命令{tweet_id: .tweet_id, username: .username, reply: .tweets} | group_by(.tweet_id)完成的,但我得到的错误是
问:错误(at :1):不能用字符串"tweet_id“索引字符串
我想要的输出是为第一个文件获取下面的示例:
{
"tweet_id": "1212024242595926028",
"username": "THPDPIO",
"reply": {
"0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and drive and you don’t have to worry about ANY consequences. Btw...jail is just a small part of it...think about the possibility killing someone or killing yourself...again it’s simple... #stayhome",
"1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol that’s what I’ll be doing lol HAPPY NEW YEAR!",
"1": "@THPDPIO Happy New Year",
"1": "@THPDPIO Hope everyone has a safe night."
}}My问题:将所有回复链接到特定的tweet_id
第二个文件
B.json
{
"author_id": 80083199,
"tweet_id": 1212150612026151000,
"username": "CTVdavidspence",
"author_followers": 19572,
"author_tweets": 73406,
"author_description": "Retired broadcast Meteorologist. 2017 RTDNA Lifetime
Achievement Award. Best of Calgary 2018, 2019. AHS Patient and Family Advisor
(volunteer)",
"author_location": "Calgary",
"text": "The Trans Canada Highway near #Sicamous BC. #stayhome Image from
@DriveBC.",
"created_at": 1577834201000,
"retweets": 12,
"replies": 4,
"likes": 27,
"quote_count": 0}
{
"author_id": 848959032370921500,
"tweet_id": 1212024242595926000,
"username": "THPDPIO",
"author_followers": 4626,
"author_tweets": 2383,
"author_description": "Police Sgt.",
"author_location": "Terre Haute, IN",
"text": "Folks, it’s simple!! You know what to do and what not to do...don’t drink
and drive and you don’t have to worry about ANY consequences. Btw...jail is just a
small part of it...think about the possibility killing someone or killing
yourself...again it’s simple... \n#stayhome",
"created_at": 1577804072000,
"retweets": 11,
"replies": 9,
"likes": 84,
"quote_count": 1}我希望合并我的最后一个json文件,这样tweet_id就成为引用,回复和用户名成为合并它们的键。因此,最后的json文件将如下所示:
{
"author_id": 848959032370921500,
"tweet_id": 1212024242595926000,
"username": "THPDPIO",
"author_followers": 4626,
"author_tweets": 2383,
"author_description": "Police Sgt.",
"author_location": "Terre Haute, IN",
"text": "Folks, it’s simple!! You know what to do and what not to do...don’t drink
and drive and you don’t have to worry about ANY consequences. Btw...jail is just a
small part of it...think about the possibility killing someone or killing
yourself...again it’s simple... \n#stayhome",
"created_at": 1577804072000,
"retweets": 11,
"replies": 9,
"likes": 84,
"quote_count": 1,
"reply": {
"0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and
drive and you don’t have to worry about ANY consequences. Btw...jail is just a
small part of it...think about the possibility killing someone or killing
yourself...again it’s simple... #stayhome",
"1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol
that’s what I’ll be doing lol HAPPY NEW YEAR!",
"1": "@THPDPIO Happy New Year",
"1": "@THPDPIO Hope everyone has a safe night."
}}我将感谢在这方面的任何帮助。谢谢。
发布于 2022-07-27 20:58:25
至于第二个问题,您可以在JOIN上使用B.json和自定义索引(例如。(第一期)在A.json上
jq --slurpfile A A.json --slurpfile B B.json -n '
JOIN(
$A | reduce group_by(.tweet_id)[] as $g (
{}; .[$g[0].tweet_id].reply += ($g | map(.tweets))
);
$B[]; .tweet_id | @text; add
)
'{
"author_id": 80083199,
"tweet_id": 1212150612026151000,
"username": "CTVdavidspence",
"author_followers": 19572,
"author_tweets": 73406,
"author_description": "Retired broadcast Meteorologist. 2017 RTDNA Lifetime Achievement Award. Best of Calgary 2018, 2019. AHS Patient and Family Advisor (volunteer)",
"author_location": "Calgary",
"text": "The Trans Canada Highway near #Sicamous BC. #stayhome Image from @DriveBC.",
"created_at": 1577834201000,
"retweets": 12,
"replies": 4,
"likes": 27,
"quote_count": 0
}
{
"author_id": 848959032370921500,
"tweet_id": 1212024242595926000,
"username": "THPDPIO",
"author_followers": 4626,
"author_tweets": 2383,
"author_description": "Police Sgt.",
"author_location": "Terre Haute, IN",
"text": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and drive and you don’t have to worry about ANY consequences. Btw...jail is just a small part of it...think about the possibility killing someone or killing yourself...again it’s simple... \n#stayhome",
"created_at": 1577804072000,
"retweets": 11,
"replies": 9,
"likes": 84,
"quote_count": 1,
"reply": [
{
"0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and drive and you don’t have to worry about ANY consequences. Btw...jail is just a small part of it...think about the possibility killing someone or killing yourself...again it’s simple... #stayhome"
},
{
"1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol that’s what I’ll be doing lol HAPPY NEW YEAR!"
},
{
"1": "@THPDPIO Happy New Year"
},
{
"1": "@THPDPIO Hope everyone has a safe night."
}
]
}发布于 2022-07-27 20:28:22
本答复涉及第一个问题:
将所有答复链接到特定的tweet_id
正如@pmf所指出的,您的示例输出不会达到目标。而且,group_by需要一个数组作为输入。因此,应考虑:
< A.json jq -n '
def accumulate_by(stream; f):
reduce stream as $x ({}; (f + [$x|f]) as $v | . + $x | f = $v );
[inputs | {tweet_id, username, reply: .tweets}] | group_by(.tweet_id)
| map( accumulate_by(.[]; .reply ))
'注意,在设计中,这忽略了.username值中的“冲突”;您可能需要进一步考虑这一点。不管怎么说,用你的样本,结果是:
[
{
"tweet_id": "1211503874395254785",
"username": "BorgenEthan",
"reply": [
{
"0": "Typical North Dakotan.... #BestCopsAround #NoTravelAdvised #StayHome"
},
{
"1": "@NDHighwayPatrol"
},
{
"1": "@UNDPoliceDept Nah i definitely look like the first one"
}
]
},
{
"tweet_id": "1212024242595926028",
"username": "HollyBr34731868",
"reply": [
{
"0": "Folks, it’s simple!! You know what to do and what not to do...don’t drink and drive and you don’t have to worry about ANY consequences. Btw...jail is just a small part of it...think about the possibility killing someone or killing yourself...again it’s simple... #stayhome"
},
{
"1": "@THPDPIO Stay home and drink and pass out and leave everyone else alone lol that’s what I’ll be doing lol HAPPY NEW YEAR!"
},
{
"1": "@THPDPIO Happy New Year"
},
{
"1": "@THPDPIO Hope everyone has a safe night."
}
]
}
]https://stackoverflow.com/questions/73143200
复制相似问题