有时,当试图刮取Instagram媒体时,在URL的末尾添加(?__a=1)
例:https://www.instagram.com/p/CP-Kws6FoRS/?__a=1
回复
{
"__ar": 1,
"error": 1357004,
"errorSummary": "Sorry, something went wrong",
"errorDescription": "Please try closing and re-opening your browser window.",
"payload": null,
"hsrp": {
"hblp": {
"consistency": {
"rev": 1005622141
}
}
},
"lid": "7104767527440109183"
}
为什么会返回这个响应,我应该做些什么来修复这个问题?另外,我们还有其他方法来获取视频和照片URL吗?
发布于 2022-06-11 07:12:29
我通过将&__d=dis
添加到URL末尾的查询字符串来解决这个问题,如下所示:https://www.instagram.com/p/CFr6G-whXxp/?__a=1&__d=dis
发布于 2022-06-09 01:47:20
我相信我可能找到了一个解决办法:
https://i.instagram.com/api/v1/users/web_profile_info/?username={username}
来获取用户的信息和最近的帖子。响应中的媒体id与https://i.instagram.com/{username}/?__a=1
.graphql.user
相同,https://instagram.com/p/{post_shortcode}
.https://i.instagram.com/api/v1/media/{media_id}/info
的HTML中的媒体id来自<meta property="al:ios:url" content="instagram://media?id={media_id}">
,使用提取的媒体id获得与https://instagram.com/p/{post_shortcode}/?__a=1
.相同的响应
以下几个要点:
user-agent
非常重要。在开发工具中重新发送请求返回"Sorry, something went wrong"
错误时,我发现了一个火狐浏览器。cookiejar = browser_cookie3.chrome(domain_name='instagram.com')
这是完整的剧本。如果这有帮助,请告诉我!
import os
import pathlib
import string
from datetime import datetime, timedelta
from urllib.parse import urlparse
import bs4 as bs
import browser_cookie3
from google.auth.transport import requests
import requests
# setup.
username = "<username>"
output_path = "C:\\some\\path"
headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 9; GM1903 Build/PKQ1.190110.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/75.0.3770.143 Mobile Safari/537.36 Instagram 103.1.0.15.119 Android (28/9; 420dpi; 1080x2260; OnePlus; GM1903; OnePlus7; qcom; sv_SE; 164094539)"
}
def download_post_media(post: dict, media_list: list, number: int):
output_filename = f"{output_path}/{username}"
if not os.path.isdir(output_filename):
os.mkdir(output_filename)
post_time = datetime.fromtimestamp(int(post["taken_at_timestamp"])) + timedelta(hours=5)
output_filename += f"/{username}_{post_time.strftime('%Y%m%d%H%M%S')}_{post['shortcode']}_{number}"
current_media_json = media_list[number - 1]
if current_media_json['media_type'] == 1:
media_type = "image"
media_ext = ".jpg"
media_url = current_media_json["image_versions2"]['candidates'][0]['url']
elif current_media_json['media_type'] == 2:
media_type = "video"
media_ext = ".mp4"
media_url = current_media_json["video_versions"][0]['url']
output_filename += media_ext
response = send_request_get_response(media_url)
with open(output_filename, 'wb') as f:
f.write(response.content)
def send_request_get_response(url):
cookiejar = browser_cookie3.firefox(domain_name='instagram.com')
return requests.get(url, cookies=cookiejar, headers=headers)
# use the /api/v1/users/web_profile_info/ api to get the user's information and its most recent posts.
profile_api_url = f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}"
profile_api_response = send_request_get_response(profile_api_url)
# data.user is the same as graphql.user from ?__a=1.
timeline_json = profile_api_response.json()["data"]["user"]["edge_owner_to_timeline_media"]
for post in timeline_json["edges"]:
# get the HTML page of the post.
post_response = send_request_get_response(f"https://instagram.com/p/{post['node']['shortcode']}")
html = bs.BeautifulSoup(post_response.text, 'html.parser')
# find the meta tag containing the link to the post's media.
meta = html.find(attrs={"property": "al:ios:url"})
media_id = meta.attrs['content'].replace("instagram://media?id=", "")
# use the media id to get the same response as ?__a=1 for the post.
media_api_url = f"https://i.instagram.com/api/v1/media/{media_id}/info"
media_api_response = send_request_get_response(media_api_url)
media_json = media_api_response.json()["items"][0]
media = list()
if 'carousel_media_count' in media_json:
# multiple media post.
for m in media_json['carousel_media']:
media.append(m)
else:
# single media post.
media.append(media_json)
media_number = 0
for m in media:
media_number += 1
download_post_media(post['node'], media, media_number)
发布于 2022-06-11 03:47:06
用户-代理:
Mozilla/5.0 (Linux; Android 9; GM1903 Build/PKQ1.190110.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/75.0.3770.143 Mobile Safari/537.36 Instagram 103.1.0.15.119 Android (28/9; 420dpi; 1080x2260; OnePlus; GM1903; OnePlus7; qcom; sv_SE; 164094539)
/?__a=1替代端点;
但是,您应该为使用此端点放置用户代理。
https://i.instagram.com/api/v1/users/web_profile_info/?username={username}
data.graphql.user = data.user
给出同样的结果
https://stackoverflow.com/questions/72467565
复制相似问题