首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用Python获取Google搜索右侧汇总框

使用Python获取Google搜索右侧汇总框
EN

Stack Overflow用户
提问于 2013-06-28 19:21:44
回答 2查看 484关注 0票数 0

我想获取任何Google 的右侧摘要框,并且我也得到了一个唯一的标记,即提取它,但是当我使用Python获取谷歌搜索查询的内容时,我不会从谷歌的响应中得到那个右侧框。

请帮助我获取Google查询的全部内容:

获取Google查询页的代码:

代码语言:javascript
运行
复制
import requests

url = 'https://www.google.co.in/search?q=dhoni'
r = requests.get(url)
content = r.text
f = open('query.html','w')
f.write(search_results)
f.close()

PS:在运行上面的代码并在浏览器中查看保存的文件后,右框不应用程序,这表明在获取页面内容时,右框内容是不被获取的。

EN

回答 2

Stack Overflow用户

发布于 2021-10-21 09:37:36

这并不是因为浆果提到的Javascript。这是因为没有指定user-agent,当机器人或浏览器发送假user-agent字符串以宣布自己为不同的客户端(用户)时,需要哪个才是“真正的”用户访问。

Pass user-agent

代码语言:javascript
运行
复制
headers = {
    'User-agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}

requests.get('URL', headers=headers)

在线IDE中的代码和示例

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests, lxml

headers = {
    "User-agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)"
    "Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  'q': 'dhoni',
  'hl': 'en',
  'gl': 'uk'  # if set to "us" (united states) it would be a diffrent HTML layout with different CSS selectors
}

html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')

title = soup.select_one('#rhs .mfMhoc span, .qrShPb').text
subtitle = soup.select_one('.wwUB2c span').text

try:
    snippet = soup.select_one('.zsYMMe+ span').text
except: snippet = None

print(f"{title}\n{subtitle}\n{snippet}\n")

for result in soup.select(".rVusze"):
    key_element = result.select_one(".w8qArf").text

    if result.select_one(".kno-fv"):
        value_element = result.select_one(".kno-fv").text.replace(": ", "")
    else: value_element = None # or pass

    key_link = f'https://www.google.com{result.select_one(".w8qArf a")["href"]}'

    try:
        key_value_link = f'https://www.google.com{result.select_one(".kno-fv a")["href"]}'
    except: key_value_link = None # or pass

    print(f"{key_element}{value_element}\nkey_link: {key_link}\nkey_value_link: {key_value_link}")


--------------
# long output
'''
MS Dhoni
Indian cricketer
Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.

Born: 7 July 1981 (age 40 years), Ranchi, India
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+born&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWEstOttIvSM0vyEkFUkXF-XlWSflFeYtYeXOLFVIy8vMyFUB8ABdR4Gk1AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECF8QAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECF8QAw
Height: 1.8 m
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+height&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1ykjNTM8oWcTKn1uskJKRn5epABEBAAIai08-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHIQAg
key_value_link: None
Full name: Mahendra Singh Pansingh Dhoni
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+full+name&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks5OttIvSM0vyEkFUkXF-XlWaaU5OQp5ibmpi1iFcosVUjLy8zIV4IIAqvu4TT8AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHUQAg
key_value_link: None
Spouse: Sakshi Dhoni (m. 2010)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+spouse&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWkshOttIvSM0vyEkFUkXF-XlWxQX5pcWpi1j5c4sVUjLy8zIVICIAPnRCyzkAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHQQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Sakshi+Dhoni&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxs0xSjIwytCSyk630C1LzC3JSgVRRcX6eVXFBfmlx6iJWnuDE7OKMTAWXjPy8zB2sjADGY2n9RQAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECHQQAw
Salary: 1.8 million USD (2016)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+salary&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1Kk7MSSyqXMTKn1uskJKRn5epABEBAGZmveY-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECG4QAg
key_value_link: None
Parents: Pan Singh, Devaki Devi
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+parents&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWksxOttIvSM0vyEkFUkXF-XlWBYlFqXklxYtYBXKLFVIy8vMyFaBCAFvhf287AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECGIQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Pan+Singh&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxsypNM0zNtSSzk630C1LzC3JSgVRRcX6eVUFiUWpeSfEiVs6AxDyF4My89IwdrIwAlGBEk0MAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECGIQAw
'''

或者,您也可以通过使用来自Google知识图API的SerpApi来实现相同的目标。这是一个有免费计划的付费API。

与您的情况不同的是,您不需要从零开始就搞清楚和创建所有东西,因为它已经为最终用户完成了,唯一真正需要做的事情就是通过访问结构化的JSON字符串来获取您想要的数据。

合并守则:

代码语言:javascript
运行
复制
import os
from serpapi import GoogleSearch 

params = {
    "api_key": os.getenv("API_KEY"),
    "engine": "google",
    "q": "dhoni",
    "hl": "en",
}

search = GoogleSearch(params)
results = search.get_dict()

print(results['knowledge_graph'])

---------------
'''
{
  "title": "MS Dhoni",
  "description": "Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.",
  "source": {
    "name": "Wikipedia",
    "link": "https://en.wikipedia.org/wiki/MS_Dhoni"
  },
  "born": "July 7, 1981 (age 40 years), Ranchi, India",
  "born_links": [
    {
      "text": "Ranchi, India",
      "link": "https://www.google.com/search?q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjOzafLmNvzAhWignIEHY5ZBMcQmxMoAHoFCJUBEAI"
    }
  ]
... # other data
}
'''

你可以在我写的关于如何减少网络抓取时被屏蔽的可能性的博客文章中读到更多关于它的内容。

免责声明,我为SerpApi工作。

票数 1
EN

Stack Overflow用户

发布于 2022-03-08 03:31:42

复制和粘贴此代码根本不起作用。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/17372139

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档