首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >在python中解析json文件中的tweet文本

在python中解析json文件中的tweet文本
EN

Stack Overflow用户
提问于 2015-09-09 10:43:30
回答 2查看 5.7K关注 0票数 1

我正在使用Python2.7和来自Twitter的流API来获取json文件中与特定主题相关的tweet。我想解析json文件中每个tweet的文本属性。我的代码如下:

代码语言:javascript
复制
import json

tweets = []
for line in open('science.json'):
    try:
        tweets.append(json.loads(line))
    except:
        continue

print len(tweets)

for tweet in tweets:
    try:
        print tweet['text']
    except:
        pass

我的json文件示例如下,其中包括两条tweet:

代码语言:javascript
复制
{"created_at":"Wed Sep 02 11:20:18 +0000 2015","id":639035115457388544,"id_str":"639035115457388544","text":"Police investigate suspected hit-and-run on Oak Street http:\/\/t.co\/B9NnybzIB1 via @lvshepard","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":628694263,"id_str":"628694263","name":"Beth LeBlanc","screen_name":"THBethLeBlanc","location":"","url":null,"description":"Lead investigative reporter for the Times Herald, covering St. Clair and Sanilac County. Retweets, follows, or friends don't equal endorsements.","protected":false,"verified":false,"followers_count":654,"friends_count":235,"listed_count":16,"favourites_count":255,"statuses_count":3966,"created_at":"Fri Jul 06 20:18:26 +0000 2012","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C6E2EE","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme2\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme2\/bg.gif","profile_background_tile":false,"profile_link_color":"1F98C7","profile_sidebar_border_color":"C6E2EE","profile_sidebar_fill_color":"DAECF4","profile_text_color":"663B12","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/2581088079\/s8mdnulnej4xi3prumpm_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/2581088079\/s8mdnulnej4xi3prumpm_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/628694263\/1401363471","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[{"url":"http:\/\/t.co\/B9NnybzIB1","expanded_url":"http:\/\/bwne.ws\/1EydNoJ","display_url":"bwne.ws\/1EydNoJ","indices":[55,77]}],"user_mentions":[{"screen_name":"LVShepard","name":"Liz Shepard","id":85887731,"id_str":"85887731","indices":[82,92]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1441192818880"}

/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* {"created_at":"Wed Sep 02 11:20:21 +0000 2015","id":639035127427923968,"id_str":"639035127427923968","text":"RT @LindaSuhler: Still waiting for Obama to comment on #BlackLivesMatter\u2019s involvement in the deaths of police officers\n***crickets*** http\u2026","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3025146671,"id_str":"3025146671","name":"Douglas MacCrae","screen_name":"DougMac156","location":"Socialist Republic of New York","url":null,"description":"Retired, working toward a right to work nation.  Conservative, Christian, NRA member and veteran, US Army.","protected":false,"verified":false,"followers_count":245,"friends_count":305,"listed_count":4,"favourites_count":315,"statuses_count":1350,"created_at":"Sun Feb 08 16:12:22 +0000 2015","utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"89C9FA","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/569517416997068801\/Xb0McW3C_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/569517416997068801\/Xb0McW3C_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/3025146671\/1426075331","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Sep 02 11:03:50 +0000 2015","id":639030968423215104,"id_str":"639030968423215104","text":"Still waiting for Obama to comment on #BlackLivesMatter\u2019s involvement in the deaths of police officers\n***crickets*** http:\/\/t.co\/6cZ82w2Yz3","source":"\u003ca href=\"http:\/\/www.hootsuite.com\" rel=\"nofollow\"\u003eHootsuite\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":347627434,"id_str":"347627434","name":"Linda Suhler, Ph.D.","screen_name":"LindaSuhler","location":"Scottsdale, Arizona","url":null,"description":"Unapologetic Constitutional Conservative #1A #2A #NRA #StandWithIsrael #TCOT #CCOT #LNYHBT #ProLife #SupportMilitary #BlueLivesMatter #GodBlessAmerica","protected":false,"verified":false,"followers_count":146662,"friends_count":111351,"listed_count":1345,"favourites_count":12296,"statuses_count":111887,"created_at":"Wed Aug 03 03:00:30 +0000 2011","utc_offset":-25200,"time_zone":"Arizona","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"0040FF","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/435620049223573504\/FnzC5S_5.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/435620049223573504\/FnzC5S_5.jpeg","profile_background_tile":true,"profile_link_color":"FF0F0F","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"A0C5C7","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/460575901046943744\/8mLlMfiL_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/460575901046943744\/8mLlMfiL_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/347627434\/1420230815","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":34,"favorite_count":25,"entities":{"hashtags":[{"text":"BlackLivesMatter","indices":[38,55]}],"trends":[],"urls":[],"user_mentions":[],"symbols":[],"media":[{"id":638810803416625152,"id_str":"638810803416625152","indices":[118,140],"media_url":"http:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","url":"http:\/\/t.co\/6cZ82w2Yz3","display_url":"pic.twitter.com\/6cZ82w2Yz3","expanded_url":"http:\/\/twitter.com\/LindaSuhler\/status\/638810803823472640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":267,"resize":"fit"},"medium":{"w":500,"h":394,"resize":"fit"},"large":{"w":500,"h":394,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}},"source_status_id":638810803823472640,"source_status_id_str":"638810803823472640"}]},"extended_entities":{"media":[{"id":638810803416625152,"id_str":"638810803416625152","indices":[118,140],"media_url":"http:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","url":"http:\/\/t.co\/6cZ82w2Yz3","display_url":"pic.twitter.com\/6cZ82w2Yz3","expanded_url":"http:\/\/twitter.com\/LindaSuhler\/status\/638810803823472640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":267,"resize":"fit"},"medium":{"w":500,"h":394,"resize":"fit"},"large":{"w":500,"h":394,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}},"source_status_id":638810803823472640,"source_status_id_str":"638810803823472640"}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"BlackLivesMatter","indices":[55,72]}],"trends":[],"urls":[],"user_mentions":[{"screen_name":"LindaSuhler","name":"Linda Suhler, Ph.D.","id":347627434,"id_str":"347627434","indices":[3,15]}],"symbols":[],"media":[{"id":638810803416625152,"id_str":"638810803416625152","indices":[139,140],"media_url":"http:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","url":"http:\/\/t.co\/6cZ82w2Yz3","display_url":"pic.twitter.com\/6cZ82w2Yz3","expanded_url":"http:\/\/twitter.com\/LindaSuhler\/status\/638810803823472640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":267,"resize":"fit"},"medium":{"w":500,"h":394,"resize":"fit"},"large":{"w":500,"h":394,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}},"source_status_id":638810803823472640,"source_status_id_str":"638810803823472640"}]},"extended_entities":{"media":[{"id":638810803416625152,"id_str":"638810803416625152","indices":[139,140],"media_url":"http:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CN2DCohUcAA4Yf1.jpg","url":"http:\/\/t.co\/6cZ82w2Yz3","display_url":"pic.twitter.com\/6cZ82w2Yz3","expanded_url":"http:\/\/twitter.com\/LindaSuhler\/status\/638810803823472640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":267,"resize":"fit"},"medium":{"w":500,"h":394,"resize":"fit"},"large":{"w":500,"h":394,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}},"source_status_id":638810803823472640,"source_status_id_str":"638810803823472640"}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1441192821734"}
*/

运行此脚本后,我得到的tweet数量为895条,但它只打印了大约30条tweet并停止。有什么理由做朋友吗?

EN

回答 2

Stack Overflow用户

发布于 2015-09-09 15:47:58

看起来您是从一个具有自己的格式的json文件中逐行读取的。

不是逐行阅读,而是尝试:

代码语言:javascript
复制
with open('science.json') as tweet_data:
    json_data = json.load(tweet_data)

这样,您可以一次读取所有数据,并可以使用json模块将其作为类似文件的对象加载。

如果这不起作用,那么您可能会在该文件中如何格式化您的json时遇到问题。

此外,不需要在with语句中使用tryexcept语句。with语句已经在使用它了!:)

票数 0
EN

Stack Overflow用户

发布于 2017-10-16 05:14:21

首先加载数据

代码语言:javascript
复制
all_data = json.loads(data)

然后获取tweet

代码语言:javascript
复制
tweet = all_data["text"]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/32470132

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档