我试着刮9插曲评论部分做一些情绪分析,并贴上正面或负面的标签。最终的目标是对数千条帖子的数据进行训练,并根据评论数、帖子数量、前十位评论和帖子标题来预测帖子的情绪。
我成功地抓取了标题和更新的热门部分,但是当涉及到抓取注释时,Html解析器将不会显示相关的标记。我尝试了不同的库,如BS4、请求、模式、urllib1 1/2,甚至尝试了'html.parser‘而不是lxml。
我的问题是9gag评论部分限制刮擦吗?如果没有,是否有任何解析器不能获得所有标记的原因?
更新#2-这是我使用的代码-
url = URL("http://9gag.com/gag/a1Mzz1D")
req = requests.get(url)
soup = BeautifulSoup(req.text, 'html.parser')
soup.findAll("div", attrs={"class":"comment-embed"})
输出看起来像空列表-
发布于 2016-06-06 21:51:50
数据是使用React加载的,但是您可以进行一些解析,并以json格式获取所需的所有数据:
import requests
from urlparse import urljoin
import ast
base = "http://9gag.com/"
# these are the params to get the json.
params = {"appId": "",
"url": "",
"count": "10",
"level": "2",
"order": "score",
"mentionMapping": "true",
"origin": "9gag.com"}
js = "Request URL:http://comment-cdn.9gag.com/v1/cacheable/comment-list.json"
with requests.session() as s:
r = s.get(base)
soup = BeautifulSoup(r.content,"lxml")
# links to each actual page.
links = [urljoin(base, a["href"]) for a in soup.select("a.badge-evt.point"")]
for link in links:
cont = s.get(link).content
soup = BeautifulSoup(cont,"lxml")
# the params are all in the script body
script = soup.find("script", text=re.compile('appId')).text
# convert to dict so we can pull what we need by key
data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
params["appId"] = data["appId"]
params["url"] = data["url"]
page_json = s.get(js, params=params).json()
for dct in page_json["payload"]["comments"]:
print(dct)
如果我们仅使用返回的第一个url运行该代码,就会得到:
In [28]: with requests.session() as s:
....: r = s.get(base)
....: soup = BeautifulSoup(r.content,"lxml")
....: links = [urljoin(base, a["href"]) for a in soup.select("a.comment.badge-evt")][:1]
....: for link in links:
....: cont = s.get(link).content
....: soup = BeautifulSoup(cont,"lxml")
....: script = soup.find("script", text=re.compile('appId')).text
....: data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
....: params["appId"] = data["appId"]
....: params["url"] = data["url"]
....: page_json = s.get(js, params=params).json()
....: for dct in page_json["payload"]["comments"]:
....: print(dct)
....:
{u'hasNext': True, u'dislikeCount': 0, u'text': u'This is so awkward to watch ... and funny', u'userId': u'u_13759018032623', u'likeCount': 343, u'orderKey': u'score_00000000004834_14651297124662', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@twistedpickle.and also fake.', u'userId': u'u_145548331532421082', u'likeCount': 26, u'children': [], u'isCollapsed': 0, u'mediaText': u'@twistedpickle.and also fake.', u'section': u'', u'mentionMapping': {u'@twistedpickle': u'aBL7q1'}, u'commentId': u'c_146513113612585611', u'type': u'text', u'status': 0, u'parent': u'c_146512971246623391', u'timestamp': 1465131136, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'savage_ali', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/34323189_100_45.jpg', u'timestamp': u'1455483315', u'userId': u'u_145548331532421082', u'hashedAccountId': u'anbN66n', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/savage_ali'}, u'accountId': u'34323189', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513113612585611', u'level': 2, u'suppData': {}, u'richtext': u'@twistedpickle.and also fake.', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'This is so awkward to watch ... and funny', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512971246623391', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129712, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'twistedpickle', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/1870095_100_1.jpg', u'timestamp': u'1375901803', u'userId': u'u_13759018032623', u'hashedAccountId': u'aBL7q1', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/twistedpickle'}, u'accountId': u'1870095', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146512971246623391', u'level': 1, u'suppData': {}, u'richtext': u'This is so awkward to watch ... and funny', u'childrenTotal': 19, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Hahaha PANTURA', u'userId': u'u_143454521023534763', u'likeCount': 231, u'orderKey': u'score_00000000004076_14649387351969', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@deadfight nussittuna nukut paremmin', u'userId': u'u_141790386790069041', u'likeCount': 39, u'children': [], u'isCollapsed': 0, u'mediaText': u'@deadfight nussittuna nukut paremmin', u'section': u'', u'mentionMapping': {u'@deadfight': u'aYLgpy7'}, u'commentId': u'c_146513018381635287', u'type': u'text', u'status': 0, u'parent': u'c_146493873519691145', u'timestamp': 1465130183, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'lady_kappa', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/22251683_100_38.jpg', u'timestamp': u'1417903867', u'userId': u'u_141790386790069041', u'hashedAccountId': u'a5K8b5N', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/lady_kappa'}, u'accountId': u'22251683', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513018381635287', u'level': 2, u'suppData': {}, u'richtext': u'@deadfight nussittuna nukut paremmin', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Hahaha PANTURA', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146493873519691145', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938735, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'deadfight', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/27180133_100_2.jpg', u'timestamp': u'1434545210', u'userId': u'u_143454521023534763', u'hashedAccountId': u'aYLgpy7', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/deadfight'}, u'accountId': u'27180133', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146493873519691145', u'level': 1, u'suppData': {}, u'richtext': u'Hahaha PANTURA', u'childrenTotal': 16, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'http://i.memeful.com/media/post/oMJ28xM_700wa_0.gif', u'userId': u'u_141680114571912397', u'likeCount': 225, u'orderKey': u'score_00000000003373_14649381081078', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@shogun_ka_yo up you go', u'userId': u'u_144283683005248817', u'likeCount': 2, u'children': [], u'isCollapsed': 0, u'mediaText': u'@shogun_ka_yo up you go', u'section': u'', u'mentionMapping': {u'@shogun_ka_yo': u'aMQRLRW'}, u'commentId': u'c_146513150738658348', u'type': u'text', u'status': 0, u'parent': u'c_146493810810784782', u'timestamp': 1465131507, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'dergermanyball', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29998985_100_29.jpg', u'timestamp': u'', u'userId': u'u_144283683005248817', u'hashedAccountId': u'a1dpXrY', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/dergermanyball'}, u'accountId': u'29998985', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513150738658348', u'level': 2, u'suppData': {}, u'richtext': u'@shogun_ka_yo up you go', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'http://i.memeful.com/media/post/oMJ28xM_700wa_0.gif', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146493810810784782', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938108, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700w_0.jpg', u'width': 400, u'height': 206}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700wa_0.gif', u'width': 400, u'height': 206}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700wv_0.mp4', u'width': 400, u'height': 206}}}, u'user': {u'displayName': u'shogun_ka_yo', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/22391718_100_2.jpg', u'timestamp': u'1416801145', u'userId': u'u_141680114571912397', u'hashedAccountId': u'aMQRLRW', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/shogun_ka_yo'}, u'accountId': u'22391718', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146493810810784782', u'level': 1, u'suppData': {}, u'richtext': u'[url]http://i.memeful.com/media/post/oMJ28xM_700wa_0.gif[/url]', u'childrenTotal': 4, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Now imagine if the genders were reversed', u'userId': u'u_143552720523387146', u'likeCount': 179, u'orderKey': u'score_00000000003144_14651301155438', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@rednotash hush little one. You're making sense now', u'userId': u'u_141363015125977644', u'likeCount': 77, u'children': [], u'isCollapsed': 0, u'mediaText': u'@rednotash hush little one. You're making sense now', u'section': u'', u'mentionMapping': {u'@rednotash': u'aOv8RMy'}, u'commentId': u'c_146513114535963914', u'type': u'text', u'status': 0, u'parent': u'c_146513011554386056', u'timestamp': 1465131145, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'srslydude', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/default-avatar/1_59_100_v0.jpg', u'timestamp': u'1413630151', u'userId': u'u_141363015125977644', u'hashedAccountId': u'aYwvpZx', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/srslydude'}, u'accountId': u'21558777', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513114535963914', u'level': 2, u'suppData': {}, u'richtext': u'@rednotash hush little one. You're making sense now', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Now imagine if the genders were reversed', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513011554386056', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130115, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'rednotash', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/27823975_100_5.jpg', u'timestamp': u'1435527205', u'userId': u'u_143552720523387146', u'hashedAccountId': u'aOv8RMy', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/rednotash'}, u'accountId': u'27823975', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513011554386056', u'level': 1, u'suppData': {}, u'richtext': u'Now imagine if the genders were reversed', u'childrenTotal': 9, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'userId': u'u_145321627176216569', u'likeCount': 78, u'orderKey': u'score_00000000002462_14651303108023', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'userId': u'u_143741207696358239', u'likeCount': 56, u'children': [], u'isCollapsed': 0, u'mediaText': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'section': u'', u'mentionMapping': {u'@marshmallowww': u'ab693MB'}, u'commentId': u'c_146513102333226094', u'type': u'text', u'status': 0, u'parent': u'c_146513031080236628', u'timestamp': 1465131023, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'the_hidden', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/28267060_100_15.jpg', u'timestamp': u'1437412076', u'userId': u'u_143741207696358239', u'hashedAccountId': u'aop4wG2', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/the_hidden'}, u'accountId': u'28267060', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513102333226094', u'level': 2, u'suppData': {}, u'richtext': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513031080236628', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130310, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'marshmallowww', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/33477821_100_134.jpg', u'timestamp': u'1453216271', u'userId': u'u_145321627176216569', u'hashedAccountId': u'ab693MB', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/marshmallowww'}, u'accountId': u'33477821', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513031080236628', u'level': 1, u'suppData': {}, u'richtext': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'childrenTotal': 20, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'userId': u'u_143329792027606743', u'likeCount': 54, u'orderKey': u'score_00000000001796_14651298735006', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@pcmasteracer yes it's correct', u'userId': u'u_143073218849877360', u'likeCount': 9, u'children': [], u'isCollapsed': 0, u'mediaText': u'@pcmasteracer yes it's correct', u'section': u'', u'mentionMapping': {u'@pcmasteracer': u'avnOvdq'}, u'commentId': u'c_146513013516459530', u'type': u'text', u'status': 0, u'parent': u'c_146512987350064451', u'timestamp': 1465130135, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'kkakuka97', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/26450856_100_3.jpg', u'timestamp': u'1430732188', u'userId': u'u_143073218849877360', u'hashedAccountId': u'a4j4NWy', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kkakuka97'}, u'accountId': u'26450856', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513013516459530', u'level': 2, u'suppData': {}, u'richtext': u'@pcmasteracer yes it's correct', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512987350064451', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129873, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'pcmasteracer', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/default-avatar/1_62_100_v0.jpg', u'timestamp': u'1433297920', u'userId': u'u_143329792027606743', u'hashedAccountId': u'avnOvdq', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/pcmasteracer'}, u'accountId': u'27225255', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146512987350064451', u'level': 1, u'suppData': {}, u'richtext': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'childrenTotal': 7, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'I can hear the 'BONG!'', u'userId': u'u_13987497367750', u'likeCount': 30, u'orderKey': u'score_00000000001168_14650124142865', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@yajirobe__ but not boing', u'userId': u'u_13775281935884', u'likeCount': 4, u'children': [], u'isCollapsed': 0, u'mediaText': u'@yajirobe__ but not boing', u'section': u'', u'mentionMapping': {u'@yajirobe__': u'avgE1Y5'}, u'commentId': u'c_146513060674619430', u'type': u'text', u'status': 0, u'parent': u'c_146501241428653553', u'timestamp': 1465130606, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'siophang', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/11455251_100_2.jpg', u'timestamp': u'1377528193', u'userId': u'u_13775281935884', u'hashedAccountId': u'aBQK6qO', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/siophang'}, u'accountId': u'11455251', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513060674619430', u'level': 2, u'suppData': {}, u'richtext': u'@yajirobe__ but not boing', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'I can hear the 'BONG!'', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146501241428653553', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465012414, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'yajirobe__', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/16992199_100_5.jpg', u'timestamp': u'1398749736', u'userId': u'u_13987497367750', u'hashedAccountId': u'avgE1Y5', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/yajirobe__'}, u'accountId': u'16992199', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146501241428653553', u'level': 1, u'suppData': {}, u'richtext': u'I can hear the 'BONG!'', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'http://i.memeful.com/media/post/PRoPBdo_700wa_0.gif', u'userId': u'u_13907047642371', u'likeCount': 21, u'orderKey': u'score_00000000000967_14649476233018', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@kaylaruffalo mfw', u'userId': u'u_13907047642371', u'likeCount': 0, u'children': [], u'isCollapsed': 0, u'mediaText': u'@kaylaruffalo mfw', u'section': u'', u'mentionMapping': {u'@kaylaruffalo': u'adYKGQj'}, u'commentId': u'c_146494763324897147', u'type': u'text', u'status': 0, u'parent': u'c_146494762330186947', u'timestamp': 1464947633, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'kaylaruffalo', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/16005886_100_9.jpg', u'timestamp': u'1390704764', u'userId': u'u_13907047642371', u'hashedAccountId': u'adYKGQj', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kaylaruffalo'}, u'accountId': u'16005886', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146494763324897147', u'level': 2, u'suppData': {}, u'richtext': u'@kaylaruffalo mfw', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'http://i.memeful.com/media/post/PRoPBdo_700wa_0.gif', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146494762330186947', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464947623, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700w_0.jpg', u'width': 500, u'height': 400}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700wa_0.gif', u'width': 500, u'height': 400}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700wv_0.mp4', u'width': 500, u'height': 400}}}, u'user': {u'displayName': u'kaylaruffalo', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/16005886_100_9.jpg', u'timestamp': u'1390704764', u'userId': u'u_13907047642371', u'hashedAccountId': u'adYKGQj', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kaylaruffalo'}, u'accountId': u'16005886', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146494762330186947', u'level': 1, u'suppData': {}, u'richtext': u'[url]http://i.memeful.com/media/post/PRoPBdo_700wa_0.gif[/url]', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'Look at the dude in the red shirt run XD', u'userId': u'u_144176454299618603', u'likeCount': 15, u'orderKey': u'score_00000000000806_14651298710300', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@crazybrownguy he knew he was next', u'userId': u'u_13976607580627', u'likeCount': 1, u'children': [], u'isCollapsed': 0, u'mediaText': u'@crazybrownguy he knew he was next', u'section': u'', u'mentionMapping': {u'@crazybrownguy': u'agGWL5q'}, u'commentId': u'c_146514413390208345', u'type': u'text', u'status': 0, u'parent': u'c_146512987103009031', u'timestamp': 1465144133, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'lightfoot2012', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/17248879_100_6.jpg', u'timestamp': u'1397660758', u'userId': u'u_13976607580627', u'hashedAccountId': u'axZPvbp', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/lightfoot2012'}, u'accountId': u'17248879', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146514413390208345', u'level': 2, u'suppData': {}, u'richtext': u'@crazybrownguy he knew he was next', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Look at the dude in the red shirt run XD', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512987103009031', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129871, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'crazybrownguy', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29662036_100_10.jpg', u'timestamp': u'1441764542', u'userId': u'u_144176454299618603', u'hashedAccountId': u'agGWL5q', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/crazybrownguy'}, u'accountId': u'29662036', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146512987103009031', u'level': 1, u'suppData': {}, u'richtext': u'Look at the dude in the red shirt run XD', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'http://i.memeful.com/media/post/kRp6z2w_700wa_0.gif', u'userId': u'u_144337172763285563', u'likeCount': 5, u'orderKey': u'score_00000000000626_14651301539010', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@wat_ya_doin I agree with that wife', u'userId': u'u_144337172763285563', u'likeCount': 3, u'children': [], u'isCollapsed': 0, u'mediaText': u'@wat_ya_doin I agree with that wife', u'section': u'', u'mentionMapping': {u'@wat_ya_doin': u'ay8yRoM'}, u'commentId': u'c_146513018506335085', u'type': u'text', u'status': 0, u'parent': u'c_146513015390105680', u'timestamp': 1465130185, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'wat_ya_doin', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29948571_100_6.jpg', u'timestamp': u'', u'userId': u'u_144337172763285563', u'hashedAccountId': u'ay8yRoM', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/wat_ya_doin'}, u'accountId': u'29948571', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513018506335085', u'level': 2, u'suppData': {}, u'richtext': u'@wat_ya_doin I agree with that wife', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'http://i.memeful.com/media/post/kRp6z2w_700wa_0.gif', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513015390105680', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130153, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700w_0.jpg', u'width': 319, u'height': 260}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700wa_0.gif', u'width': 319, u'height': 260}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700wv_0.mp4', u'width': 318, u'height': 260}}}, u'user': {u'displayName': u'wat_ya_doin', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29948571_100_6.jpg', u'timestamp': u'', u'userId': u'u_144337172763285563', u'hashedAccountId': u'ay8yRoM', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/wat_ya_doin'}, u'accountId': u'29948571', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513015390105680', u'level': 1, u'suppData': {}, u'richtext': u'[url]http://i.memeful.com/media/post/kRp6z2w_700wa_0.gif[/url]', u'childrenTotal': 3, u'isAnonymous': 0}
例如,我们可以从dct中提取文本,然后在dct["children"]
上迭代以获得更多的注释:
In [30]: params = {"appId": "",
....: "url": "",
....: "count": "2",
....: "level": "2",
....: "order": "score",
....: "mentionMapping": "true",
....: "origin": "9gag.com"}
In [31]: js = "Request URL:http://comment-cdn.9gag.com/v1/cacheable/comment-list.json"
In [32]: with requests.session() as s:
....: r = s.get(base)
....: soup = BeautifulSoup(r.content,"lxml")
....: links = [urljoin(base, a["href"]) for a in soup.select("a.badge-evt.point")][:1]
....: for link in links:
....: cont = s.get(link).content
....: soup = BeautifulSoup(cont,"lxml")
....: script = soup.find("script", text=re.compile('appId')).text
....: data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
....: params["appId"] = data["appId"]
....: params["url"] = data["url"]
....: page_json = s.get(js, params=params).json()
....: for dct in page_json["payload"]["comments"]:
....: print(dct["text"])
....: for child in dct["children"]:
....: print(child["text"])
....:
Once again this is a post made by someone who has no idea what true love is. True love is jealous, painful, and difficult. It's a battle it always will be. You're either fighting yourself to be a better person, fighting life to give the other person the life they deserve or fighting the other person. But true love is worth all of it, its also beautiful, kind, gentle and warm. No relationship is perfect. There is not "8 ways to know". The one for you is the one who will put up with your shit but at the same time make you want to make yourself a better person. Your true love will get on your nerves, piss you off, hurt you, but they will also love you, hold you up when you can't and forgive you. True love is when you find someone you can stand beside through anything, someone who would never want to hurt you When you find someone you can trust no matter what. No one is perfect and there is more than one person in the world you can fall in love with, but when you find that person, you fi
@celticdraconian this Is so true
Comment complaining that this will lead straight to the "friendzone"
Comment saying the "Friendzone" is not a thing.
您可以看到,我将param计数改为2,为了获取所有数据,您可以将其设置为一个非常高的数字,如"count":"1000"
,以获得所有数据,如果您继续在页面上加载更多的注释,您将得到所有数据:
https://stackoverflow.com/questions/37666562
复制相似问题