我正在写一个网络爬虫。我提取了这个link的标题和主要讨论,但我找不到任何一个评论(Ctrl+u -> Ctrl+f。注释文本)。我想评论是用JavaScript写的。我能把它提取出来吗?
发布于 2016-07-27 21:32:54
RT正在使用来自spot.im的服务进行评论
您需要发出两个POST请求,第一个是用来获取令牌的https://api.spot.im/me/network-token/spotim,另一个是用来获取JSON形式的注释的https://api.spot.im/conversation-read/spot/sp_6phY2k0C/post/353493/get。
我写了一个快速脚本来做这件事
import requests
import re
import json
def get_rt_comments(article_url):
spotim_spotId = 'sp_6phY2k0C' # spotim id for RT
post_id = re.search('([0-9]+)', article_url).group(0)
r1 = requests.post('https://api.spot.im/me/network-token/spotim').json()
spotim_token = r1['token']
payload = {
"count": 25, #number of comments to fetch
"sort_by":"best",
"cursor":{"offset":0,"comments_read":0},
"host_url": article_url,
"canonical_url": article_url
}
r2_url ='https://api.spot.im/conversation-read/spot/' + spotim_spotId + '/post/'+ post_id +'/get'
r2 = requests.post(r2_url, data=json.dumps(payload), headers={'X-Spotim-Token': spotim_token , "Content-Type": "application/json"})
return r2.json()
if __name__ == '__main__':
url = 'https://www.rt.com/usa/353493-clinton-speech-affairs-silence/'
comments = get_rt_comments(url)
print(comments)发布于 2016-07-27 16:25:34
https://stackoverflow.com/questions/38607502
复制相似问题