首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >无法获得google搜索结果python

无法获得google搜索结果python
EN

Stack Overflow用户
提问于 2014-11-28 07:29:33
回答 2查看 2.2K关注 0票数 1

我正在建立一个脚本来刮谷歌的搜索结果。我已经够到这里了。

代码语言:javascript
复制
import urllib
keyword = "google"
print urllib.urlopen("https://www.google.co.in/search?q=" + keyword).read()

但我的答覆如下:

代码语言:javascript
复制
<!DOCTYPE html><html lang=en><meta charset=utf-8><meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"><title>Error 403 (Forbidden)!!1</title><style>*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/errors/logo_sm_2.png) no-repeat}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/errors/logo_sm_2_hr.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:55px;width:150px}</style><a href=//www.google.com/><span id=logo aria-label=Google></span></a><p><b>403.</b> <ins>That’s an error.</ins><p>Your client does not have permission to get URL <code>/search?q=google</code> from this server.  (Client IP address: 117.196.168.89)<br><br>
Please see Google's Terms of Service posted at http://www.google.com/terms_of_service.html
<BR><BR><P>If you believe that you have received this response in error, please <A HREF="http://www.google.com/support/bin/request.py?contact_type=user&hl=en">report</A> your problem. However, please make sure to take a look at our Terms of Service (http://www.google.com/terms_of_service.html). In your email, please send us the <b>entire</b> code displayed below.  Please also send us any information you may know about how you are performing your Google searches-- for example, "I'm using the Opera browser on Linux to do searches from home.  My Internet access is through a dial-up account I have with the FooCorp ISP." or "I'm using the Konqueror browser on Linux to search from my job at myFoo.com.  My machine's IP address is 10.20.30.40, but all of myFoo's web traffic goes through some kind of proxy server whose IP address is 10.11.12.13."  (If you don't know any information like this, that's OK.  But this kind of information can help us track down problems, so please tell us what you can.)</P><P>We will use all this information to diagnose the problem, and we'll hopefully have you back up and searching with Google again quickly!</P>
<P>Please note that although we read all the email we receive, we are not always able to send a personal response to each and every email.  So don't despair if you don't hear back from us!</P>
<P>Also note that if you do not send us the <b>entire</b> code below, <i>we will not be able to help you</i>.</P><P>Best wishes,<BR>The Google Team</BR></P><BLOCKQUOTE>/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/<BR>
aef0-l8vNw3cWys_OWGKrv6VYDewUx0bhWxSeo2Mk4vGTZSoh<BR>
MdeNZki3vp-kzRGjrBTseg6uGBypibuTNGSeJoPRkDPCOFkyA<BR>
YBVgssaJaqSibV7khohBnsUVRVZqALwIe2lD6pdddMQIZ-Zg2<BR>
WEE-rO-ZackE5L2gwlmHZHP2oWML3ZlGgUL6CAbMbFmzVda38<BR>
ZYYVZLKBcjY1gSLk-FSzBc7QQnp0vrhkY6LnrALX94oK7Yrml<BR>
bKX-5KmpyhsI7aW3da5Rt5nt0K9PVPbKvpZ1LN-hdRqg749K6<BR>
T4v8mGfuH6BHSQUAPW1Byx_Wy1TGsyhZJQ02jrz7K0RBg4r0i<BR>
9O6Rs7-FFRzESkiyzRQaExUdpBpl3Mmguh1JXR_yxDJre9R7u<BR>
3AWKfCkt8BxKuv37oAIslM2Caor4QBXSNrq1F7zUetx8HxmaW<BR>
pX_6KsXyjs3-Pfq5NKOuzNCjatrhXdKC74NmNHztTPJU-4MzV<BR>
kUPuUehnDYgcgGAVYLLGiWvG4Scm8G2Gq2UnacMQsZ5BB7rgY<BR>
DXJnZwbMbVX53-llhCMeQfBTteOWIfWQR2FOyc-tuaRHX6c3N<BR>
rzpNDX9ZufFfOXRNkaORCZxkSEoX1xDBq0VGdkkCfwlUdG9Jq<BR>
prYBPnpRyhjxjC3c4n68AuEYHtMTVmbK-fyMtcWLMTVXzIrYS<BR>
EjACpMTnHRavhYza4ZJgs4SViS4FrsmJ0P3CdyLLayR0xMFM6<BR>
m7rxy-zaABo7iof_re5PKcFP6EYqD0Wm-ZlLksUh2a1LVaAsq<BR>
sSqnPPqq5qCu0z8wQe5jeGCRCY2vrT5HWmYNJbhyCyN_HiHGR<BR>
bHDb8f3_OcgAHsT7zv1a4FOG4B0JztqskzYmssBb-ezvErkp6<BR>
uZtwiKJc30F30RpQhKEb_rPjhpwc5dr3MUsTuki2j2tBSQl_O<BR>
kjFef_Jvl3u8TPQY5c6dqUSQv--p0N95Jv-WehS32lvyUbeEB<BR>
mN7ZC8oCFj06BRn5NaU9P8p1d7fmYyxyta2dZ21UfaRMhX8TZ<BR>
VgKiSDVyMO2GZ09bUEFGW4KvvTJDyQT_UMkCsahrv2MP_yI-D<BR>
fwEArSXvPIpyESHeyPXfFN-Z9_OuVwGDU2riHFIWgw5IPwtER<BR>
e0Ukzrn2iwGHHL8j2JdSNbunrifS-RqkK2hgQl16-TfqN11NL<BR>
Lgwtt-Kp3XL86K61Qq7lU-NxB8BOO_i-QOQszn6uRmb3VR__Q<BR>
T_0E9FULbsR9kgTyXDKQmOQ-3qeaFlz4in9V9PJ<BR>
+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+<BR></BLOCKQUOTE>

谷歌难道不允许自己的网页被刮掉吗?

EN

回答 2

Stack Overflow用户

发布于 2014-11-28 08:26:49

您也可以在headers中伪造urllib以获得结果。

类似于:

代码语言:javascript
复制
import urllib2

keyword = "google"
url = "https://www.google.co.in/search?q=" + keyword

# Build a opener
opener = urllib2.build_opener()

# In case you have proxy then u need to build a ProxyHandler opener 
#opener = urllib2.build_opener(urllib2.ProxyHandler(proxies={"http": "http://proxy.corp.ads:8080"}))

# To fake the browser
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
print opener.open(url).read()
票数 1
EN

Stack Overflow用户

发布于 2021-04-08 15:11:26

谷歌用不同的user-agent来对待你的脚本(如果你在使用requests,它将是python-requests) --参见更多更多

你只需要指定浏览器user-agent (Chrome,Mozilla,Edge,IE,Safari.)因此,谷歌将把它当作一个“用户”AKA,假装一个真正的浏览器访问。

如果您正在使用requests库,那么您可以这样指定它(其他网站的用户代理列表)

代码语言:javascript
复制
import requests

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

response = requests.get(
  'https://www.google.com/search?q=pizza is awesome', headers=headers).text

我回答了如何刮谷歌搜索结果标题,摘要和链接与示例代码这里的问题。

或者,您可以使用第三方Google搜索引擎结果API或来自SerpApi的Google有机结果API。这是一个免费试用的付费API。

检查游乐场以测试并查看输出。

获取原始HTML响应的代码:

代码语言:javascript
复制
import os, urllib
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "london",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

html = results['search_metadata']['raw_html_file']
print(urllib.request.urlopen(html).read())

免责声明,我为SerpApi工作。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/27183863

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档