我想获取任何Google 的右侧摘要框,并且我也得到了一个唯一的标记,即提取它,但是当我使用Python获取谷歌搜索查询的内容时,我不会从谷歌的响应中得到那个右侧框。
请帮助我获取Google查询的全部内容:
获取Google查询页的代码:
import requests
url = 'https://www.google.co.in/search?q=dhoni'
r = requests.get(url)
content = r.text
f = open('query.html','w')
f.write(search_results)
f.close()PS:在运行上面的代码并在浏览器中查看保存的文件后,右框不应用程序,这表明在获取页面内容时,右框内容是不被获取的。
发布于 2021-10-21 09:37:36
这并不是因为浆果提到的Javascript。这是因为没有指定user-agent,当机器人或浏览器发送假user-agent字符串以宣布自己为不同的客户端(用户)时,需要哪个才是“真正的”用户访问。
Pass user-agent
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}
requests.get('URL', headers=headers)在线IDE中的代码和示例
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)"
"Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
'q': 'dhoni',
'hl': 'en',
'gl': 'uk' # if set to "us" (united states) it would be a diffrent HTML layout with different CSS selectors
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
title = soup.select_one('#rhs .mfMhoc span, .qrShPb').text
subtitle = soup.select_one('.wwUB2c span').text
try:
snippet = soup.select_one('.zsYMMe+ span').text
except: snippet = None
print(f"{title}\n{subtitle}\n{snippet}\n")
for result in soup.select(".rVusze"):
key_element = result.select_one(".w8qArf").text
if result.select_one(".kno-fv"):
value_element = result.select_one(".kno-fv").text.replace(": ", "")
else: value_element = None # or pass
key_link = f'https://www.google.com{result.select_one(".w8qArf a")["href"]}'
try:
key_value_link = f'https://www.google.com{result.select_one(".kno-fv a")["href"]}'
except: key_value_link = None # or pass
print(f"{key_element}{value_element}\nkey_link: {key_link}\nkey_value_link: {key_value_link}")
--------------
# long output
'''
MS Dhoni
Indian cricketer
Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.
Born: 7 July 1981 (age 40 years), Ranchi, India
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+born&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWEstOttIvSM0vyEkFUkXF-XlWSflFeYtYeXOLFVIy8vMyFUB8ABdR4Gk1AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECF8QAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECF8QAw
Height: 1.8 m
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+height&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1ykjNTM8oWcTKn1uskJKRn5epABEBAAIai08-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHIQAg
key_value_link: None
Full name: Mahendra Singh Pansingh Dhoni
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+full+name&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks5OttIvSM0vyEkFUkXF-XlWaaU5OQp5ibmpi1iFcosVUjLy8zIV4IIAqvu4TT8AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHUQAg
key_value_link: None
Spouse: Sakshi Dhoni (m. 2010)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+spouse&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWkshOttIvSM0vyEkFUkXF-XlWxQX5pcWpi1j5c4sVUjLy8zIVICIAPnRCyzkAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHQQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Sakshi+Dhoni&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxs0xSjIwytCSyk630C1LzC3JSgVRRcX6eVXFBfmlx6iJWnuDE7OKMTAWXjPy8zB2sjADGY2n9RQAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECHQQAw
Salary: 1.8 million USD (2016)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+salary&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1Kk7MSSyqXMTKn1uskJKRn5epABEBAGZmveY-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECG4QAg
key_value_link: None
Parents: Pan Singh, Devaki Devi
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+parents&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWksxOttIvSM0vyEkFUkXF-XlWBYlFqXklxYtYBXKLFVIy8vMyFaBCAFvhf287AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECGIQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Pan+Singh&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxsypNM0zNtSSzk630C1LzC3JSgVRRcX6eVUFiUWpeSfEiVs6AxDyF4My89IwdrIwAlGBEk0MAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECGIQAw
'''或者,您也可以通过使用来自Google知识图API的SerpApi来实现相同的目标。这是一个有免费计划的付费API。
与您的情况不同的是,您不需要从零开始就搞清楚和创建所有东西,因为它已经为最终用户完成了,唯一真正需要做的事情就是通过访问结构化的JSON字符串来获取您想要的数据。
合并守则:
import os
from serpapi import GoogleSearch
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google",
"q": "dhoni",
"hl": "en",
}
search = GoogleSearch(params)
results = search.get_dict()
print(results['knowledge_graph'])
---------------
'''
{
"title": "MS Dhoni",
"description": "Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.",
"source": {
"name": "Wikipedia",
"link": "https://en.wikipedia.org/wiki/MS_Dhoni"
},
"born": "July 7, 1981 (age 40 years), Ranchi, India",
"born_links": [
{
"text": "Ranchi, India",
"link": "https://www.google.com/search?q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjOzafLmNvzAhWignIEHY5ZBMcQmxMoAHoFCJUBEAI"
}
]
... # other data
}
'''你可以在我写的关于如何减少网络抓取时被屏蔽的可能性的博客文章中读到更多关于它的内容。
免责声明,我为SerpApi工作。
发布于 2022-03-08 03:31:42
复制和粘贴此代码根本不起作用。
https://stackoverflow.com/questions/17372139
复制相似问题