使用BeautifulSoup抓取-使用相同类名的问题_具有相同类名的BeautifulSoup抓取标记_使用Beautifulsoup时的Python抓取问题 - 腾讯云开发者社区

python、beautifulsoup

我一直在使用Python Beautifulsoup来抓取数据。到目前为止，已经成功地抓取了。但是坚持使用下面的网站。目标站点：我的目标是从提到的网站上抓取歌词。但它总是给出空白结果或非类型对象没有属性类型错误。在过去的15天里一直在苦苦挣扎，不能弄清楚问题出在哪里，以及如何解决它？以下是我正在使用的代码。 import pymysql import requests from bs4 import Beautifulsoup r=requests.get("https://www.lyricshindisong.in/2020/04/chnda-re-chnda-re-ch

浏览 0提问于2020-04-11得票数 5

1回答

Python Beautifulsoup重复条目的

python、image、beautifulsoup、duplicates、screen-scraping

这是从4chans照相板上抓取的图片。问题是，它会将同一图像抓取两次。我不明白为什么我会得到重复的照片，如果有人能帮助我，那就太棒了。 from bs4 import BeautifulSoup import requests import re import urllib2 import os def get_soup(url,header): return BeautifulSoup(urllib2.urlopen(urllib2.Request(url, headers=header)), 'lxml') image_type = "image_name

浏览 0提问于2016-06-13得票数 1

3回答

我在这个网络抓取代码中做错了什么？

python、beautifulsoup

我在尝试做网络抓取时遇到了问题。我不太习惯编程，所以我真的不知道我做错了什么(但我有一些基础知识)。我正试着用蟒蛇和漂亮的汤做网络抓取。以下是代码 import requests from bs4 import BeautifulSoup URL = 'http://www.lotece.com.br/v2/' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser') results = soup.find(class = 'dataResultado')

浏览 2提问于2020-02-25得票数 2

4回答

使用Python抓取代码中的第一个链接

python、beautifulsoup

你好，这是我想要从使用BeautifulSoup抓取第一个链接的代码。视图-来源：我想要抓取这里的第一篇文章，所以它应该是"Trust Wallet Now Supports Lumens，4 More Tokens“ 我正在尝试使用Python来实现这一点。我使用这个代码，但是它抓取了所有的链接，我只想抓取第一个链接 with open('binanceblog1.html', 'w') as article: before13 = requests.get("https://www.binance.com/en/blog"

浏览 33提问于2019-03-28得票数 0

1回答

selenium Web驱动程序不返回Wikipedia表

python、selenium、web-scraping、selenium-chromedriver、webdriverwait

我正在努力搜集一张表，里面有美国所有总统选举的结果。为此，我想使用selenium。我相信我要抓取的表是由客户端脚本(javescript)执行的，因此在抓取site.Note之前，我试图注意某个特定标记的存在:我尝试过用漂亮的汤直接抓取页面，但总是得到“无”的响应。这是我的代码。 from selenium import webdriver from bs4 import BeautifulSoup from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import

浏览 16提问于2020-11-21得票数 1

回答已采纳

1回答

如何在美丽的汤中引入容错

python、beautifulsoup

我对尽可能快地抓取许多不同的网站感兴趣。URL可能存在大量的web抓取问题；例如，它们可能引用文件而不是站点，或者它们可能根本没有指向任何真实的内容。我一直未能解决的问题是，当BeautifulSoup挂起或由于某种原因而失败并没有退出时，该怎么办。如果html解析在X秒后似乎无法完成，则需要有一种方法来停止它。这似乎非常重要，但似乎我不是唯一一个，这个网站似乎提供了最相关的信息：。因此，考虑到在某个时间结束后很难终止挂起的进程(如BeautifulSoup(文本))，我该怎么办？

浏览 2提问于2014-12-24得票数 2

回答已采纳

3回答

Python中的屏幕抓取

python、screen-scraping

虽然我在R中做过一些屏幕抓取，但我对Python中的屏幕抓取这个概念还是个新手。我正在尝试抓取Yelp网站。我在试着抓取yelp搜索返回的每家保险公司的名字。对于大多数抓取任务，我能够执行以下任务，但在解析xml时总是遇到困难。 import urllib2 from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.yelp.com/search?find_desc=insurance+agency&ns=1&find_loc=Austin'

浏览 0提问于2011-06-30得票数 3

回答已采纳

2回答

如何使用美汤保存来自网站的附件？

python、beautifulsoup、get、python-requests

我已经写了一个代码来抓取一个网站的附件。它本质上是抓取附件的超链接。我不能想出一种方法来直接将这些附件保存在本地位置。 import requests import pandas as pd from requests import get url = 'https://www.amfiindia.com/research-information/amfi-monthly' response = get(url,verify=False) import bs4 from bs4 import BeautifulSoup html_soup = BeautifulSoup(re

浏览 0提问于2020-06-20得票数 0

1回答

使用BeautifulSoup找不到深度嵌套标记ID

web-scraping、beautifulsoup

我试图从中抓取NBA数据，但我遇到了BeautifulSoup抛出深度嵌套标记的问题。我试着用soup.find(id='opponent-stats-per_game')来抓取“对手每个游戏统计”表。然而，我得到了None的结果。如果我试图找一个在树上更高的div，那么它会剪辑更深的孩子。有人能给我一些指导吗？我对使用BeautifulSoup进行网络抓取相当陌生。

浏览 1提问于2019-04-03得票数 0

回答已采纳

2回答

如何找到正确的xpath并在表上循环？

python、selenium、web-scraping、beautifulsoup、python-requests

我想从上的表"Elektriciteit“中获得所有的值。但是，在没完没了地尝试使用selenium找到正确的xpath之后，我无法抓取表。我尝试使用“检查”并从表中复制xpath，以标识表的长度，以便稍后进行抓取。在这个失败之后，我尝试使用"contain“，但是这也不是成功的。之后，我尝试了一些使用BeautifullSoup的方法，但是没有任何结果。 #%% import pandas as pd from selenium import webdriver import pandas as pd #%% powerhouse Elektriciteit NL bas

浏览 0提问于2019-07-24得票数 1

回答已采纳

2回答

BeautifulSoup不抓取动态内容

python、html、dynamic、beautifulsoup

我的问题是，我想从这个页面获取相关链接：如果我检查Chrome或Safari中的元素，我可以看到<div id="outer_related_articles">和列出的所有文章。如果我试图用BeautifulSoup抓取它，它将抓取页面和除相关文章以外的所有内容。到目前为止，我的情况如下： import urllib2 from bs4 import BeautifulSoup url = "http://support.apple.com/kb/TS1538" response = urllib2.urlopen(url) soup = Be

浏览 2提问于2013-04-07得票数 1

1回答

Web重复多次抓取http文本文件页

python-3.x

我已经成功地编写了代码用于web刮取https文本页。此页面每60秒自动更新一次。我已经用过beautifulSoup4了。以下是我的两个问题:1)如何调用循环每60秒重新刮一次页面? 2)由于没有与页面相关的html标记，如何只能刮取特定的数据行？我在想，我可能必须将刮过的页面保存为CVS文件，然后使用保存的页面提取我需要的数据。但是，我希望所有这些都可以在不将页面保存到本地机器的情况下完成。我希望有一些python包可以在不保存页面的情况下为我完成所有这些工作。 import bs4 as bs import urllib sauce = urllib.urlopen("

浏览 0提问于2019-03-30得票数 0

回答已采纳

1回答

Selenium下载完整的html页面

python、selenium

我正在学习使用Python Selenium和BeautifulSoup进行web抓取。目前，我正在尝试抓取谷歌搜索趋势上的热门搜索这是我当前的代码。然而，我意识到完整的html没有下载，我只有最近几个日期的内容。我能做些什么来纠正这个问题？ from selenium import webdriver from bs4 import BeautifulSoup googleURL = "http://www.google.com/trends/hottrends#pn=p5" browser = webdriver.Firefox() browser.get(googl

浏览 1提问于2013-05-17得票数 15

1回答

如何检查图像是否已保存在目录中

django

我在Django的移动比较网站上创建了一个项目。我正在使用beautifulSoup4来抓取手机的详细信息。我在浏览器中将抓取的数据呈现为JSON，而不是将数据存储在数据库中，而是下载手机的图像。当用户搜索手机时，它会运行代码，获取数据，并将手机的图像保存在目录中。问题是，当另一个用户搜索相同的手机时，程序抓取了数据并下载了图像，但图像文件名已经在目录中。出现错误:文件名已在目录中我想说的是--有没有办法检查镜像文件名是否已经在目录中下面是我在目录中保存图像的代码 media_root = 'C:/Users/Goku/PycharmProjects/mysite/media

浏览 21提问于2019-05-03得票数 0

1回答

如何在python中使用url？

python、url、src

我正在使用BeautifulSoup和Selenium来抓取img。有些img src有'.jpg‘，有些没有。这是我的密码。 book_img = soup.find_all('em', {'class': 'imgBdr'}) img_url = book_img[0].find('img')['src'] if '.jpg' in str(img_url): img = img_url else: img = img_url + '.jpg' img_na

浏览 9提问于2020-09-16得票数 0

1回答

不更改URL的BeautifulSoup4抓取页面[Python]

python、web-scraping、beautifulsoup

所以我想从数据冲突中抓取所有的日期，它有多个页面，当你翻页时，URL不会改变。我如何抓取玩家加入新家族的所有日期？网址：https://www.clashofstats.com/players/pink-panther-VL029CJ2/history/log 现在是我的代码： from emoji import UNICODE_EMOJI import requests from bs4 import BeautifulSoup link = f'https://www.clashofstats.com/players/{"Pink Panther"}-{str

浏览 14提问于2021-05-03得票数 1

1回答

减少创建BeautifulSoup对象时的开销

python、beautifulsoup

我对网络抓取和使用Python语言中的BeautifulSoup库非常陌生，所以我遇到了这个问题:我必须从大量的网页中下载和抓取内容，下载它们不是问题，但是当我为每个页面创建一个BeautifulSoup对象(为了解析它)时，我的程序变得非常慢。我在问您，是否有一种方法可以减少这种开销，并且可能避免为我要分析的每个新页面创建一个不同的全新BeautifulSoup对象。下面是我执行的代码： for action in actions[:100]: #Here I download the pages I need curr_url = base_url

浏览 0提问于2020-10-08得票数 0

1回答

来自PIL的图像中的PermissionError

python、python-3.x、python-imaging-library

我正在制作一个bing图像抓取器，并将它们保存到项目文件夹中的一个目录中，但是当它运行时，.save() Pillow返回了这个错误PermissionError: [Errno 13] Permission denied: './scraped_images/' 以下是我使用Pyhton3.7和Pillow5.3.0编写的代码 from bs4 import BeautifulSoup import requests from PIL import Image from io import BytesIO search = input("Search for: &

浏览 3提问于2018-11-24得票数 0

1回答

如何从Google搜索信息栏中抓取文本数据

python、beautifulsoup、request

我需要从谷歌搜索引擎信息栏中抓取文本数据。如果有人使用关键字“西门子”在谷歌搜索引擎上搜索。一个小的信息栏出现在谷歌搜索结果的右侧。我想为那个信息栏收集一些文本信息。我如何使用requests和Beautifulsoup来做到这一点呢？下面是我写的一些代码。 from bs4 import BeautifulSoup as BS import requests from googlesearch import search from googleapiclient.discovery import build url = 'https://www.google.com/search?

浏览 75提问于2019-03-15得票数 1

回答已采纳

1回答

如何从维基百科打印表格

python

我正在尝试抓取维基百科网站作为一个小任务，以了解网络抓取。我要抓取的链接是：https://en.wikipedia.org/wiki/List_of_countries_by_population_in_2000 我想根据他们的人口在维基百科链接中列出这些国家的名单。我已经签出了HTML标记，该表在class = wikitable中可用。但是当我运行我的代码时，它正在打印其他表的结果，这些表在wikitable的右侧，其类名为wikitable float right。有没有人能帮我找出哪里出了问题？ import requests website_url = requests.get(

浏览 33提问于2019-09-26得票数 0

回答已采纳

1回答

在BeautifulSoup和Git Bash上进行网络抓取并传输到CSV

python、python-2.7、beautifulsoup、scrapy

所以我一直在网上抓取一个有表格的网站，理想情况下，我想在网上抓取成excel表格，并将其保存在表格中，我将输入我所拥有的，我已经使用了scrapy和BeautifulSoup，我对这两个都有问题。如果能帮上忙就太好了！ import requests import csv from bs4 import BeautifulSoup url = 'https://pcpartpicker.com/products/video-card/' r = requests.get(url) html = r.text soup = BeautifulSoup(html, 'l

浏览 1提问于2018-11-23得票数 0

2回答

用BeautifulSoup抓取，但首先要输入值吗？

python、web-scraping、beautifulsoup、python-requests

我是网络抓取的新手，对requests和BeautifulSoup也不是很熟悉。我正在尝试用BeautifulSoup抓取aspx网站。但是要获得我想要抓取的值，我首先需要选择一个下拉值，输入一个ID，然后按submit。这有可能吗？任何帮助都将不胜感激！

浏览 4提问于2020-09-08得票数 0

3回答

Webscraping -写入CSV时重复

python、web-scraping、beautifulsoup

我正在尝试抓取这个网站上所有帖子的urls：我是python和web抓取的新手，mt代码可以工作，但它会产生很多重复的代码--我做错了什么？ import requests from bs4 import BeautifulSoup import csv startURL = 'http://esencjablog.pl/' f = csv.writer(open('test.csv', 'a+', newline='')) f.writerow(['adres']) def parseLinks(url):

浏览 0提问于2018-04-09得票数 0

1回答

Web通过python抓取问题，不能读取html文件吗？

python-3.x、web-scraping、beautifulsoup

web抓取Python已经有一段时间了，最近我遇到了这个问题。BeautifulSoup似乎无法读取html文件。例如，我正试着从这个网站上抓取这是我的密码 from bs4 import BeautifulSoup import requests url_episode = 'https://www.thetvdb.com/series/initial-d/episodes/4889010' print(url_episode) getdetail_episode = requests.get(url_episode) soup = BeautifulSoup(getde

浏览 1提问于2020-04-06得票数 0

1回答

用漂亮汤从YouTube播放列表中抓取曲目链接

python、web-scraping、beautifulsoup

我正试图从我的播放列表中抓取所有的歌曲链接。这是我的密码 from selenium import webdriver from time import sleep from bs4 import BeautifulSoup from urllib.request import urlopen import re playlist = 'minimal_house' url = 'https://www.youtube.com/channel/UCt2GxiTBN_RiE-cbP0cmk5Q/playlists' html = urlopen(url)

浏览 1提问于2020-05-29得票数 1

回答已采纳

1回答

如何从谷歌新闻中抓取图片？

python、django、web-scraping、beautifulsoup

我对网络抓取还有点陌生，我试图从Google页面中抓取文章图像，并将它们显示在Django模板中。我一直跟随着towardsDataScience的教程(现在可以找到 )，我只是尝试从每一篇文章div中获取img数据，只是为了检查我是否能够提取数据。格式应该如下所示:然而，目前我的代码正在返回一个空字典，它告诉我我没有正确地瞄准图像。欢迎那些经验丰富的人提出任何建议。 from django.shortcuts import render, HttpResponse, redirect from django.contrib import messages from .models impor

浏览 2提问于2021-07-15得票数 0

回答已采纳

2回答

当web在Python中搜索表时，返回一个空表。

python

我需要通过使用Python中的BeautifulSoup库进行web抓取，从网站抓取一个表。来自URL https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html 当我运行这段代码时，我得到了一个空表： import requests from bs4 import BeautifulSoup # vaacineProgressResponse = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vacc

浏览 0提问于2021-04-18得票数 3

回答已采纳

1回答

使用python从html中提取文本

python、beautifulsoup、isodate

希望有人能帮我。我对python相当陌生，但我想从一个站点中抓取数据，不幸的是，这个站点需要一个帐户。虽然我无法提取日期(即2017-06-01)。 <li class="latest-value-item"> <div class="latest-value-label">Date</div> <div class="latest-value">2017-06-01</div> </li> <li class="latest-value-item

浏览 2提问于2017-06-02得票数 2

回答已采纳

3回答

从使用BeautifulSoup python中获得前100个作业结果

python、web-scraping、beautifulsoup

我是python网络抓取的新手，我想从indeed中抓取前100个工作结果，我只能抓取第一页的结果，即前10个。我正在使用BeautifulSoup框架。这是我的代码，有人能帮我解决这个问题吗？ import urllib2 from bs4 import BeautifulSoup import json URL = "https://www.indeed.co.in/jobs?q=software+developer&l=Bengaluru%2C+Karnataka" soup = BeautifulSoup(urllib2.urlopen(URL).read(

浏览 7提问于2019-03-11得票数 1

回答已采纳

1回答

Python:从html的href标签中获取javascript文件

javascript、python、html、web、web-scraping

考虑一个类似以下内容的网站：可以看到，该网站包含由页面源代码中的href标记引用的pdf文件的链接，例如： <a href="javascript:$('form_cofo_pdf_view_B000114563.PDF').submit();">B000114563.PDF</a> 我想用python打开底层文件，有效地抓取结果。 req = urllib2.Request("link.com") page = urllib2.urlopen(req) soup = BeautifulSoup(page) link

浏览 6提问于2016-09-09得票数 1

2回答

python qt4 :再次引用和抓取

python、qt4

我已经写了一个小的python代码来抓取网页中的表格。它使用qt4来抓取。现在，问题是我需要每5分钟抓取一次数据。我正在考虑刷新页面并再次抓取。如何每隔5分钟刷新一次网页并再次抓取？下面是我用来抓取的代码。 import sys from BeautifulSoup import BeautifulSoup from PyQt4.QtGui import * from PyQt4.QtCore import * from PyQt4.QtWebKit import * from lxml import html import redis from time import sl

浏览 2提问于2016-11-23得票数 0

1回答

Web从网站上抓取表格

python、web-scraping、beautifulsoup

嗨，我正在尝试从web 抓取和解析所有的表格数据。所以，我写了下面的code.But，它没有显示任何数据。我把问题答案看了一遍，但没弄明白问题所在。 from BeautifulSoup import BeautifulSoup from urllib2 import urlopen import re url='https://html5test.com/' data=urlopen(url) parse=BeautifulSoup(data).findAll('div', attrs={'class': 'resultsTabl

浏览 2提问于2017-05-17得票数 0

3回答

Python BeautifulSoup剪贴表

python、html、web-scraping、beautifulsoup、html-parsing

我正在尝试用BeautifulSoup创建一个表格抓取。我写了这段Python代码： import urllib2 from bs4 import BeautifulSoup url = "http://dofollow.netsons.org/table1.htm" # change to whatever your url is page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) for i in soup.find_all('form'): print i.attrs[

浏览 0提问于2013-09-24得票数 27

回答已采纳

5回答

Python/BeautifulSoup:从Web页面抓取数据

python、beautifulsoup

我是Python编程的初学者，我正在努力学习如何抓取网页。我要做的就是从这个中抓取数据我正在尝试从上面的页面中抓取ISSUE DATE (如果你打开网页，你可以看到ISSUE DATE )。我在这方面遇到了一些问题。这是我为此编写的代码。 import BeautifulSoup import urllib2 url = "http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&

浏览 0提问于2012-04-10得票数 0

2回答

Webscraping HTML-包括所有链接内的标记

python、regex、hyperlink、beautifulsoup

我正在使用Python3.5的BeautifulSoup，并且我试图抓取所有h-tags的网站(所以所有的h1、h2..等等)。我的问题是让程序在网站上打开其他链接来抓取它们的标签。因此，假设我有一个网站，它有一个导航菜单，其中包含一些链接，这些链接贯穿整个网站，并且都包含某种类型的h标记。我该如何抓取我所选网站上的所有内容呢？这是我到目前为止用来抓取特定url中的h1-tag的代码： import requests from bs4 import BeautifulSoup url = "http://dsv.su.se/en/research" r = reques

浏览 0提问于2016-04-19得票数 2

1回答

使用Python从encrypted-tbn0.gstatic.com下载镜像

python

我做的是从谷歌抓取图像。我的脚本中有图片链接，但格式是这样的我打开了，这是图片，但我不能使用urllib.urlretrieve(图片链接，图片)下载。有人知道下载的其他方法吗？我使用的是python 2.7 import requests from bs4 import BeautifulSoup import urllib def run(): palabra ='pez' response = requests.get('https://www.google.com/search?q={}&hl=es&sxsrf=ALeK

浏览 0提问于2020-08-03得票数 1

3回答

Web抓取:用Python抓取多个Web

python、web-scraping、beautifulsoup

from bs4 import BeautifulSoup import requests url = 'https://uk.trustpilot.com/review/thread.com' for pg in range(1, 10): pg = url + '?page=' + str(pg) soup = BeautifulSoup(page.content, 'lxml') for paragraph in soup.find_all('p'): print(paragraph.text) 我想

浏览 1提问于2019-01-13得票数 3

回答已采纳

1回答

用python抓取出现在单击中的表

python、html、selenium、beautifulsoup、scrape

我想从这个中抓取信息。具体来说，我想刮表，当你点击“查看所有”下的“十大控股”(你必须在页面向下滚动一点)。我对网络抓取很陌生，并且尝试过使用BeautifulSoup来做这件事。但是，似乎存在一个问题，因为我需要考虑"onclick“函数。换句话说:我直接从页面中抓取的HTML代码不包括我想要获得的表。我对我的下一步感到有点困惑:我是应该使用selenium之类的东西，还是应该以更简单/更有效的方式来处理这个问题？谢谢。我现在的代码是： from bs4 import BeautifulSoup import requests Soup = BeautifulSoup

浏览 1提问于2017-09-10得票数 1

回答已采纳

2回答

无法从DuckDuckGo搜索结果中抓取链接

python、html、web-scraping、beautifulsoup

我想从DuckDuckGo搜索结果中抓取第一个链接。我写了下面的代码： import requests from bs4 import BeautifulSoup class Bse: def currentPrice(self,symbol): headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0" }

浏览 6提问于2021-04-02得票数 0

1回答

返回空数组的soup.findAll

python、arrays、web-scraping、beautifulsoup、findall

美丽汤上的findAll函数返回一个空数组。我知道当内容找不到，但有符合我搜索标准的内容时，就会出现这个问题，所以我不确定哪里出了问题。代码如下： # Import libraries import requests import urllib.request import lxml import html5lib import time from bs4 import BeautifulSoup # Set the URL you want to webscrape from url = 'https://tokcount.com/?user=mrsam993' # C

浏览 25提问于2021-08-03得票数 0

1回答

使用优美汤的类正则表达式

python、html、regex、beautifulsoup、html-parsing

我使用BeautifulSoup是为了便于抓取。我已经知道有超过5 div在网页上，我想要废除。他们的名字是不同的，但有模式。这些div是： divnewthing divnew divnewstring 等所以这个模式是divnew*的正则表达式。我用的是： soup.find('div', {"class": "divnew"}) 此刻。我想以某种方式使用正则表达式。有人能帮我吗？

浏览 3提问于2015-06-23得票数 1

回答已采纳

1回答

当元素包含多个类名时，如何在selenium中复制标记？

python、html、selenium、web-scraping、beautifulsoup

从“查看更多”选项卡下隐藏了大量文本的网站中抓取数据。通过selenium，单击所有这样的按钮，然后使用beautifulsoup进行刮擦。但是，一些按钮的HTML标记中有额外的空格。将它们复制并粘贴到browser.find_element_by_class_name('')总是会产生错误。 class="pv-profile-section__see-more-inline pv-profile-section__text-truncate-toggle artdeco-button artdeco-button--tertiary artdec

浏览 7提问于2021-12-14得票数 0

回答已采纳

2回答

BeautifulSoup不会从网页中提取所有表单

python、html、forms、web-scraping、beautifulsoup

我希望从一个给定的网站使用Python3和BeautifulSoup提取所有表单。下面是一个执行此操作的示例，但无法提取所有表单： from urllib.request import urlopen from bs4 import BeautifulSoup url = 'https://www.qantas.com/au/en.html' data = urlopen(url) parser = BeautifulSoup(data, 'html.parser') forms = parser.find_all('form') for f

浏览 0提问于2017-03-27得票数 2

2回答

BeautifulSoup解析器未按标签正确拆分

python、python-2.7、parsing、web-scraping、beautifulsoup

我正在抓取一个网站，然后尝试拆分成几个段落。通过查看抓取的文本，我可以非常清楚地看到，一些段落分隔符没有正确拆分。请看下面的代码来重现这个问题！ from bs4 import BeautifulSoup import requests link = "http://www.presidency.ucsb.edu/ws/index.php?pid=111395" response = requests.get(link) soup = BeautifulSoup(response.content, 'html.parser') paras = soup.fin

浏览 14提问于2016-07-24得票数 1

回答已采纳

1回答

Newspaper3k的缺点:如何仅抓取文章HTML？Python

python、html、python-3.x、web-scraping、python-newspaper

您好，感谢您的帮助，我一直使用Python和Newspaper3k来抓取网站，但我注意到有些函数是...well的……不起作用。特别是，我只能抓取大约1/10甚至更少的站点的HTML这篇文章。下面是我的代码： from newspaper import Article url = pageurl.com article = Article(url, keep_article_html = True, language ='en') article.download() article.parse() print(article.title + "\n" +

浏览 61提问于2020-07-17得票数 1

回答已采纳

1回答

用Beautifulsoup抓取视频描述

python、web-scraping、beautifulsoup、youtube

我试着在youtube上抓取视频描述中的链接，但列表总是返回空。我已经尝试从我抓取的位置更改标记，但输出和错误消息都没有更改。下面是我使用的代码： from bs4 import BeautifulSoup import requests source = requests.get('https://www.youtube.com/watch?v=gqUqGaXipe8').text soup = BeautifulSoup(source, 'lxml') link = [i['href'] for i in soup.findAll

浏览 12提问于2021-09-08得票数 0

回答已采纳

2回答

初学者Python抓取

python

我是蟒蛇的初学者。我正在做一个网络抓取项目。在这个项目中，我想从剑桥词典中查找一些单词的意义和词性，并将它们导出到excel中。这是我的密码： pip install bs4 pip install requests from bs4 import BeautifulSoup import requests headers = {"User-Agent" : "xxxxxxx"} r=requests.get('https://dictionary.cambridge.org/dictionary/english/happy', headers

浏览 7提问于2022-02-07得票数 -1

2回答

Python 3.5.2 web-scraping - list索引超出范围

python、web-scraping、beautifulsoup

我是新的网络抓取，并试图抓取所有内容的餐厅的详细信息形式，以便我可以继续我的进一步抓取。 import requests from bs4 import BeautifulSoup import urllib url = "https://www.foodpanda.in/restaurants" r=requests.get(url) soup=BeautifulSoup(r.content,"html.parser") print(soup.find_all("Section",class_="js-infscroll-load-m

浏览 5提问于2016-09-21得票数 0

2回答

使用python抓取AJAX电子商务站点

python、ajax、web、beautifulsoup、screen-scraping

我在使用BeautifulSoup抓取电子商务网站时遇到了问题。我做了一些谷歌搜索，但我仍然无法解决问题。请参阅图片： Chrome F12： Result：这里是我试图刮的网站："“ 问题：当我试图打开Google (F12)上的检查元素时，我可以看到产品的名称、价格等。但是当我运行python程序时，我无法在python结果中得到相同的代码和标记。在googling之后，我发现这个网站使用AJAX查询来获取数据。任何人都可以通过抓取AJAX站点来帮助我获得这些产品的数据。我想用在表格中显示数据。我的代码： import requests f

浏览 0提问于2019-01-28得票数 2

回答已采纳

1回答

如何从python抓取的URL列表中抓取数据？

python、web-scraping、beautifulsoup、orange

我正在尝试使用Orange中的BeautifulSoup4从同一网站抓取的URL列表中抓取数据。当我手动设置URL时，我已经成功地从单个页面中抓取了数据。 from urllib.request import urlopen from bs4 import BeautifulSoup import requests import csv import re url = "https://data.ushja.org/awards-standings/zone-points.aspx?year=2021&zone=1&section=1901" req =

浏览 21提问于2021-07-23得票数 1

回答已采纳