首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >urllib2和requests没有找到来自9gag.com的所有HTML标记

urllib2和requests没有找到来自9gag.com的所有HTML标记
EN

Stack Overflow用户
提问于 2016-06-06 20:43:00
回答 2查看 444关注 0票数 3

我试着刮9插曲评论部分做一些情绪分析,并贴上正面或负面的标签。最终的目标是对数千条帖子的数据进行训练,并根据评论数、帖子数量、前十位评论和帖子标题来预测帖子的情绪。

我成功地抓取了标题和更新的热门部分,但是当涉及到抓取注释时,Html解析器将不会显示相关的标记。我尝试了不同的库,如BS4、请求、模式、urllib1 1/2,甚至尝试了'html.parser‘而不是lxml。

我的问题是9gag评论部分限制刮擦吗?如果没有,是否有任何解析器不能获得所有标记的原因?

更新#2-这是我使用的代码-

代码语言:javascript
运行
复制
    url = URL("http://9gag.com/gag/a1Mzz1D")
    req = requests.get(url)
    soup = BeautifulSoup(req.text, 'html.parser')
    soup.findAll("div", attrs={"class":"comment-embed"})

输出看起来像空列表-

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2016-06-06 21:51:50

数据是使用React加载的,但是您可以进行一些解析,并以json格式获取所需的所有数据:

代码语言:javascript
运行
复制
import requests
from urlparse import urljoin
import ast

base = "http://9gag.com/"

# these are the params to get the json.
params = {"appId": "",
          "url": "",
          "count": "10",
          "level": "2",
          "order": "score",
          "mentionMapping": "true",
          "origin": "9gag.com"}

js = "Request URL:http://comment-cdn.9gag.com/v1/cacheable/comment-list.json"

with requests.session() as s:
    r = s.get(base)
    soup = BeautifulSoup(r.content,"lxml")
    # links to each actual page.
    links = [urljoin(base, a["href"]) for a in soup.select("a.badge-evt.point"")]
    for link in links:
        cont = s.get(link).content
        soup = BeautifulSoup(cont,"lxml")
        # the params are all in the script body
        script = soup.find("script", text=re.compile('appId')).text
        # convert to dict so we can pull what we need by key
        data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
        params["appId"] = data["appId"]
        params["url"] = data["url"]
        page_json = s.get(js, params=params).json()
        for dct in page_json["payload"]["comments"]:
            print(dct)

如果我们仅使用返回的第一个url运行该代码,就会得到:

代码语言:javascript
运行
复制
In [28]: with requests.session() as s:
   ....:         r = s.get(base)
   ....:         soup = BeautifulSoup(r.content,"lxml")
   ....:         links = [urljoin(base, a["href"]) for a in soup.select("a.comment.badge-evt")][:1]
   ....:         for link in links:
   ....:                 cont = s.get(link).content
   ....:                 soup = BeautifulSoup(cont,"lxml")
   ....:                 script = soup.find("script", text=re.compile('appId')).text
   ....:                 data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
   ....:                 params["appId"] = data["appId"]
   ....:                 params["url"] = data["url"]
   ....:                 page_json = s.get(js, params=params).json()
   ....:                 for dct in page_json["payload"]["comments"]:
   ....:                         print(dct)
   ....:             
{u'hasNext': True, u'dislikeCount': 0, u'text': u'This is so awkward to watch ... and funny', u'userId': u'u_13759018032623', u'likeCount': 343, u'orderKey': u'score_00000000004834_14651297124662', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@twistedpickle.and also fake.', u'userId': u'u_145548331532421082', u'likeCount': 26, u'children': [], u'isCollapsed': 0, u'mediaText': u'@twistedpickle.and also fake.', u'section': u'', u'mentionMapping': {u'@twistedpickle': u'aBL7q1'}, u'commentId': u'c_146513113612585611', u'type': u'text', u'status': 0, u'parent': u'c_146512971246623391', u'timestamp': 1465131136, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'savage_ali', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/34323189_100_45.jpg', u'timestamp': u'1455483315', u'userId': u'u_145548331532421082', u'hashedAccountId': u'anbN66n', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/savage_ali'}, u'accountId': u'34323189', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513113612585611', u'level': 2, u'suppData': {}, u'richtext': u'@twistedpickle.and also fake.', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'This is so awkward to watch ... and funny', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512971246623391', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129712, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'twistedpickle', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/1870095_100_1.jpg', u'timestamp': u'1375901803', u'userId': u'u_13759018032623', u'hashedAccountId': u'aBL7q1', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/twistedpickle'}, u'accountId': u'1870095', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146512971246623391', u'level': 1, u'suppData': {}, u'richtext': u'This is so awkward to watch ... and funny', u'childrenTotal': 19, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Hahaha PANTURA', u'userId': u'u_143454521023534763', u'likeCount': 231, u'orderKey': u'score_00000000004076_14649387351969', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@deadfight nussittuna nukut paremmin', u'userId': u'u_141790386790069041', u'likeCount': 39, u'children': [], u'isCollapsed': 0, u'mediaText': u'@deadfight nussittuna nukut paremmin', u'section': u'', u'mentionMapping': {u'@deadfight': u'aYLgpy7'}, u'commentId': u'c_146513018381635287', u'type': u'text', u'status': 0, u'parent': u'c_146493873519691145', u'timestamp': 1465130183, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'lady_kappa', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/22251683_100_38.jpg', u'timestamp': u'1417903867', u'userId': u'u_141790386790069041', u'hashedAccountId': u'a5K8b5N', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/lady_kappa'}, u'accountId': u'22251683', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513018381635287', u'level': 2, u'suppData': {}, u'richtext': u'@deadfight nussittuna nukut paremmin', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Hahaha PANTURA', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146493873519691145', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938735, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'deadfight', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/27180133_100_2.jpg', u'timestamp': u'1434545210', u'userId': u'u_143454521023534763', u'hashedAccountId': u'aYLgpy7', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/deadfight'}, u'accountId': u'27180133', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146493873519691145', u'level': 1, u'suppData': {}, u'richtext': u'Hahaha PANTURA', u'childrenTotal': 16, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'http://i.memeful.com/media/post/oMJ28xM_700wa_0.gif', u'userId': u'u_141680114571912397', u'likeCount': 225, u'orderKey': u'score_00000000003373_14649381081078', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@shogun_ka_yo up you go', u'userId': u'u_144283683005248817', u'likeCount': 2, u'children': [], u'isCollapsed': 0, u'mediaText': u'@shogun_ka_yo up you go', u'section': u'', u'mentionMapping': {u'@shogun_ka_yo': u'aMQRLRW'}, u'commentId': u'c_146513150738658348', u'type': u'text', u'status': 0, u'parent': u'c_146493810810784782', u'timestamp': 1465131507, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'dergermanyball', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29998985_100_29.jpg', u'timestamp': u'', u'userId': u'u_144283683005248817', u'hashedAccountId': u'a1dpXrY', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/dergermanyball'}, u'accountId': u'29998985', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513150738658348', u'level': 2, u'suppData': {}, u'richtext': u'@shogun_ka_yo up you go', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'http://i.memeful.com/media/post/oMJ28xM_700wa_0.gif', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146493810810784782', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938108, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700w_0.jpg', u'width': 400, u'height': 206}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700wa_0.gif', u'width': 400, u'height': 206}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700wv_0.mp4', u'width': 400, u'height': 206}}}, u'user': {u'displayName': u'shogun_ka_yo', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/22391718_100_2.jpg', u'timestamp': u'1416801145', u'userId': u'u_141680114571912397', u'hashedAccountId': u'aMQRLRW', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/shogun_ka_yo'}, u'accountId': u'22391718', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146493810810784782', u'level': 1, u'suppData': {}, u'richtext': u'[url]http://i.memeful.com/media/post/oMJ28xM_700wa_0.gif[/url]', u'childrenTotal': 4, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Now imagine if the genders were reversed', u'userId': u'u_143552720523387146', u'likeCount': 179, u'orderKey': u'score_00000000003144_14651301155438', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@rednotash hush little one. You're making sense now', u'userId': u'u_141363015125977644', u'likeCount': 77, u'children': [], u'isCollapsed': 0, u'mediaText': u'@rednotash hush little one. You're making sense now', u'section': u'', u'mentionMapping': {u'@rednotash': u'aOv8RMy'}, u'commentId': u'c_146513114535963914', u'type': u'text', u'status': 0, u'parent': u'c_146513011554386056', u'timestamp': 1465131145, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'srslydude', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/default-avatar/1_59_100_v0.jpg', u'timestamp': u'1413630151', u'userId': u'u_141363015125977644', u'hashedAccountId': u'aYwvpZx', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/srslydude'}, u'accountId': u'21558777', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513114535963914', u'level': 2, u'suppData': {}, u'richtext': u'@rednotash hush little one. You're making sense now', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Now imagine if the genders were reversed', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513011554386056', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130115, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'rednotash', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/27823975_100_5.jpg', u'timestamp': u'1435527205', u'userId': u'u_143552720523387146', u'hashedAccountId': u'aOv8RMy', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/rednotash'}, u'accountId': u'27823975', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513011554386056', u'level': 1, u'suppData': {}, u'richtext': u'Now imagine if the genders were reversed', u'childrenTotal': 9, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'userId': u'u_145321627176216569', u'likeCount': 78, u'orderKey': u'score_00000000002462_14651303108023', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'userId': u'u_143741207696358239', u'likeCount': 56, u'children': [], u'isCollapsed': 0, u'mediaText': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'section': u'', u'mentionMapping': {u'@marshmallowww': u'ab693MB'}, u'commentId': u'c_146513102333226094', u'type': u'text', u'status': 0, u'parent': u'c_146513031080236628', u'timestamp': 1465131023, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'the_hidden', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/28267060_100_15.jpg', u'timestamp': u'1437412076', u'userId': u'u_143741207696358239', u'hashedAccountId': u'aop4wG2', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/the_hidden'}, u'accountId': u'28267060', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513102333226094', u'level': 2, u'suppData': {}, u'richtext': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513031080236628', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130310, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'marshmallowww', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/33477821_100_134.jpg', u'timestamp': u'1453216271', u'userId': u'u_145321627176216569', u'hashedAccountId': u'ab693MB', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/marshmallowww'}, u'accountId': u'33477821', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513031080236628', u'level': 1, u'suppData': {}, u'richtext': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'childrenTotal': 20, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'userId': u'u_143329792027606743', u'likeCount': 54, u'orderKey': u'score_00000000001796_14651298735006', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@pcmasteracer yes it's correct', u'userId': u'u_143073218849877360', u'likeCount': 9, u'children': [], u'isCollapsed': 0, u'mediaText': u'@pcmasteracer yes it's correct', u'section': u'', u'mentionMapping': {u'@pcmasteracer': u'avnOvdq'}, u'commentId': u'c_146513013516459530', u'type': u'text', u'status': 0, u'parent': u'c_146512987350064451', u'timestamp': 1465130135, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'kkakuka97', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/26450856_100_3.jpg', u'timestamp': u'1430732188', u'userId': u'u_143073218849877360', u'hashedAccountId': u'a4j4NWy', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kkakuka97'}, u'accountId': u'26450856', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513013516459530', u'level': 2, u'suppData': {}, u'richtext': u'@pcmasteracer yes it's correct', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512987350064451', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129873, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'pcmasteracer', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/default-avatar/1_62_100_v0.jpg', u'timestamp': u'1433297920', u'userId': u'u_143329792027606743', u'hashedAccountId': u'avnOvdq', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/pcmasteracer'}, u'accountId': u'27225255', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146512987350064451', u'level': 1, u'suppData': {}, u'richtext': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'childrenTotal': 7, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'I can hear the 'BONG!'', u'userId': u'u_13987497367750', u'likeCount': 30, u'orderKey': u'score_00000000001168_14650124142865', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@yajirobe__ but not boing', u'userId': u'u_13775281935884', u'likeCount': 4, u'children': [], u'isCollapsed': 0, u'mediaText': u'@yajirobe__ but not boing', u'section': u'', u'mentionMapping': {u'@yajirobe__': u'avgE1Y5'}, u'commentId': u'c_146513060674619430', u'type': u'text', u'status': 0, u'parent': u'c_146501241428653553', u'timestamp': 1465130606, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'siophang', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/11455251_100_2.jpg', u'timestamp': u'1377528193', u'userId': u'u_13775281935884', u'hashedAccountId': u'aBQK6qO', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/siophang'}, u'accountId': u'11455251', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513060674619430', u'level': 2, u'suppData': {}, u'richtext': u'@yajirobe__ but not boing', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'I can hear the 'BONG!'', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146501241428653553', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465012414, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'yajirobe__', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/16992199_100_5.jpg', u'timestamp': u'1398749736', u'userId': u'u_13987497367750', u'hashedAccountId': u'avgE1Y5', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/yajirobe__'}, u'accountId': u'16992199', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146501241428653553', u'level': 1, u'suppData': {}, u'richtext': u'I can hear the 'BONG!'', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'http://i.memeful.com/media/post/PRoPBdo_700wa_0.gif', u'userId': u'u_13907047642371', u'likeCount': 21, u'orderKey': u'score_00000000000967_14649476233018', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@kaylaruffalo mfw', u'userId': u'u_13907047642371', u'likeCount': 0, u'children': [], u'isCollapsed': 0, u'mediaText': u'@kaylaruffalo mfw', u'section': u'', u'mentionMapping': {u'@kaylaruffalo': u'adYKGQj'}, u'commentId': u'c_146494763324897147', u'type': u'text', u'status': 0, u'parent': u'c_146494762330186947', u'timestamp': 1464947633, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'kaylaruffalo', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/16005886_100_9.jpg', u'timestamp': u'1390704764', u'userId': u'u_13907047642371', u'hashedAccountId': u'adYKGQj', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kaylaruffalo'}, u'accountId': u'16005886', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146494763324897147', u'level': 2, u'suppData': {}, u'richtext': u'@kaylaruffalo mfw', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'http://i.memeful.com/media/post/PRoPBdo_700wa_0.gif', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146494762330186947', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464947623, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700w_0.jpg', u'width': 500, u'height': 400}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700wa_0.gif', u'width': 500, u'height': 400}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700wv_0.mp4', u'width': 500, u'height': 400}}}, u'user': {u'displayName': u'kaylaruffalo', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/16005886_100_9.jpg', u'timestamp': u'1390704764', u'userId': u'u_13907047642371', u'hashedAccountId': u'adYKGQj', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kaylaruffalo'}, u'accountId': u'16005886', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146494762330186947', u'level': 1, u'suppData': {}, u'richtext': u'[url]http://i.memeful.com/media/post/PRoPBdo_700wa_0.gif[/url]', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'Look at the dude in the red shirt run XD', u'userId': u'u_144176454299618603', u'likeCount': 15, u'orderKey': u'score_00000000000806_14651298710300', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@crazybrownguy he knew he was next', u'userId': u'u_13976607580627', u'likeCount': 1, u'children': [], u'isCollapsed': 0, u'mediaText': u'@crazybrownguy he knew he was next', u'section': u'', u'mentionMapping': {u'@crazybrownguy': u'agGWL5q'}, u'commentId': u'c_146514413390208345', u'type': u'text', u'status': 0, u'parent': u'c_146512987103009031', u'timestamp': 1465144133, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'lightfoot2012', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/17248879_100_6.jpg', u'timestamp': u'1397660758', u'userId': u'u_13976607580627', u'hashedAccountId': u'axZPvbp', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/lightfoot2012'}, u'accountId': u'17248879', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146514413390208345', u'level': 2, u'suppData': {}, u'richtext': u'@crazybrownguy he knew he was next', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Look at the dude in the red shirt run XD', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512987103009031', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129871, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'crazybrownguy', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29662036_100_10.jpg', u'timestamp': u'1441764542', u'userId': u'u_144176454299618603', u'hashedAccountId': u'agGWL5q', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/crazybrownguy'}, u'accountId': u'29662036', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146512987103009031', u'level': 1, u'suppData': {}, u'richtext': u'Look at the dude in the red shirt run XD', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'http://i.memeful.com/media/post/kRp6z2w_700wa_0.gif', u'userId': u'u_144337172763285563', u'likeCount': 5, u'orderKey': u'score_00000000000626_14651301539010', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@wat_ya_doin I agree with that wife', u'userId': u'u_144337172763285563', u'likeCount': 3, u'children': [], u'isCollapsed': 0, u'mediaText': u'@wat_ya_doin I agree with that wife', u'section': u'', u'mentionMapping': {u'@wat_ya_doin': u'ay8yRoM'}, u'commentId': u'c_146513018506335085', u'type': u'text', u'status': 0, u'parent': u'c_146513015390105680', u'timestamp': 1465130185, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'wat_ya_doin', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29948571_100_6.jpg', u'timestamp': u'', u'userId': u'u_144337172763285563', u'hashedAccountId': u'ay8yRoM', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/wat_ya_doin'}, u'accountId': u'29948571', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513018506335085', u'level': 2, u'suppData': {}, u'richtext': u'@wat_ya_doin I agree with that wife', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'http://i.memeful.com/media/post/kRp6z2w_700wa_0.gif', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513015390105680', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130153, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700w_0.jpg', u'width': 319, u'height': 260}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700wa_0.gif', u'width': 319, u'height': 260}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700wv_0.mp4', u'width': 318, u'height': 260}}}, u'user': {u'displayName': u'wat_ya_doin', u'avatarUrl': u'http://accounts-cdn.9gag.com/media/avatar/29948571_100_6.jpg', u'timestamp': u'', u'userId': u'u_144337172763285563', u'hashedAccountId': u'ay8yRoM', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/wat_ya_doin'}, u'accountId': u'29948571', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4YM4n1#cs_comment_id=c_146513015390105680', u'level': 1, u'suppData': {}, u'richtext': u'[url]http://i.memeful.com/media/post/kRp6z2w_700wa_0.gif[/url]', u'childrenTotal': 3, u'isAnonymous': 0}

例如,我们可以从dct中提取文本,然后在dct["children"]上迭代以获得更多的注释:

代码语言:javascript
运行
复制
In [30]: params = {"appId": "",
   ....:           "url": "",
   ....:           "count": "2",
   ....:           "level": "2",
   ....:           "order": "score",
   ....:           "mentionMapping": "true",
   ....:           "origin": "9gag.com"}

In [31]: js = "Request URL:http://comment-cdn.9gag.com/v1/cacheable/comment-list.json"

In [32]: with requests.session() as s:
   ....:         r = s.get(base)
   ....:         soup = BeautifulSoup(r.content,"lxml")
   ....:         links = [urljoin(base, a["href"]) for a in soup.select("a.badge-evt.point")][:1]
   ....:         for link in links:
   ....:                 cont = s.get(link).content
   ....:                 soup = BeautifulSoup(cont,"lxml")
   ....:                 script = soup.find("script", text=re.compile('appId')).text
   ....:                 data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
   ....:                 params["appId"] = data["appId"]
   ....:                 params["url"] = data["url"]
   ....:                 page_json = s.get(js, params=params).json()
   ....:                 for dct in page_json["payload"]["comments"]:
   ....:                         print(dct["text"])
   ....:                         for child in dct["children"]:
   ....:                                 print(child["text"])
   ....:                 

Once again this is a post made by someone who has no idea what true love is. True love is jealous, painful, and difficult. It's a battle it always will be. You're either fighting yourself to be a better person, fighting life to give the other person the life they deserve or fighting the other person. But true love is worth all of it, its also beautiful, kind, gentle and warm.  No relationship is perfect. There is not "8 ways to know". The one for you is the one who will put up with your shit but at the same time make you want to make yourself a better person. Your true love will get on your nerves, piss you off, hurt you, but they will also love you, hold you up when you can't and forgive you. True love is when you find someone you can stand beside through anything, someone who would never want to hurt you  When you find someone you can trust no matter what. No one is perfect and there is more than one person in the world you can fall in love with, but when you find that person, you fi
@celticdraconian this Is so true
Comment complaining that this will lead straight to the "friendzone"
Comment saying the "Friendzone" is not a thing.

您可以看到,我将param计数改为2,为了获取所有数据,您可以将其设置为一个非常高的数字,如"count":"1000",以获得所有数据,如果您继续在页面上加载更多的注释,您将得到所有数据:

票数 2
EN

Stack Overflow用户

发布于 2016-06-06 20:50:13

它们的注释是通过反应加载的,您将需要执行javascript的东西来刮取注释部分。

几个让你开始:

)

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/37666562

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档