我已经写了一个小的Python程序来抓取Instagram个人资料,以提取数据并显示各种统计数据。我可以从个人资料的前9张照片中收集数据(或者在初始加载时显示多少),但我无法加载额外的照片(由于无限滚动机制)。我在网上读过关于无限滚动的网络抓取,人们说你需要复制请求来加载额外的图像。到目前为止我还不能复制这个请求,有人能帮上忙吗?
谢谢!
发布于 2018-06-17 22:33:30
不需要重新编写所有代码,已经编写了很多复制所有请求的库。
https://github.com/ping/instagram_private_api
就是这样一个库
使用这个库的解决方案,
from instagram_private_api import Client, ClientCompatPatch
user_name = 'YOUR_USERNAME'
password = 'YOUR_PASSWORD'
username_to_scrape = 'USERNAME_TO_SCRAPE'
all_posts = []
api = Client(user_name, password)
posts = api.username_feed(username_to_scrape) #Gets the first 12 posts
# Extract the value *next_max_id* from the above response, this is needed to load the next 12 posts
next_max_id = posts["next_max_id"]
all_posts = all_posts + posts
#
next_page_posts = api.username_feed(track_username, max_id = next_max_id)
这只是一个帮助您入门的快速示例。
Cookie更新:保存和加载
#Saving cookies
cookies = api.cookie_jar.dump()
with open("cookies.pkl", "wb") as save_cookies:
save_cookies.write(cookies)
#Loading cookies
with open("cookies.pkl", "rb") as read_cookies:
cookies = read_cookies.read()
#Pass cookies to Client to resume session
api = Client(user_name, password, cookie = cookies)
https://stackoverflow.com/questions/50897523
复制相似问题