BeautifulSoup未获取web数据_使用BeautifulSoup抓取Web数据_BeautifulSoup未获取数据 - 腾讯云开发者社区

、、、

我是个新手。我想知道如何使用YouTube在web上搜索BeautifulSoup评论。我在这里被击中了。有人能帮我处理密码吗。以下是我所写的： import requests from bs4 import BeautifulSoup r = requests.get("https://www.youtube.com/watch?v=kffacxfA7G4" req =r.conten soup = BeautifulSoup(req,'html.parser') print(soup.prettify()) all

浏览 1提问于2018-04-18得票数 3

回答已采纳

2回答

Selenium Python页面更新后返回为空

、、、

我正在使用Selenium Python和BeautifulSoup来抓取数据。我需要的网站的html后，‘生活’按钮被点击。我正在获取要单击的按钮，但是新的HTML没有返回给我。我认为在按钮单击后，html会很快返回，所以我休眠了。但即便如此，它也只返回了类的空div 'Collapsible__contentInner‘。 from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.s

浏览 0提问于2020-08-24得票数 0

1回答

在Python中打印selenium webelement的HTML文本

、、、、

我正在使用Python中的Selenium webdriver进行web抓取项目。如何打印selenium.WebElement的HTML文本？我打算使用BeautifulSoup来解析HTML以提取感兴趣的数据。谢谢

浏览 19提问于2011-11-30得票数 5

回答已采纳

1回答

BeautifulSoup网站scraping - html解析

、、

我正在尝试使用beautifulsoup4从网站中抓取数据，并且只检索html标记之间的信息以放入excel文档，目前我只能从页面中获取整个html数据。 import sys import urllib3 import xlsxwriter import lxml page = requests.get('genericurlhere.com') soup = BeautifulSoup(page.text, 'html.parser') f = csv.writer(open('web_scrape.csv', 'w'))

浏览 0提问于2018-10-20得票数 0

回答已采纳

2回答

如何将抓取的多行内容转换为列表？

、、、

我试图将抓取的内容转换为用于数据操作的列表，但得到以下错误: TypeError：'NoneType‘对象不可调用 #! /usr/bin/python from urllib import urlopen from BeautifulSoup import BeautifulSoup import os import re # Copy all of the content from the provided web page webpage = urlopen("http://www.optionstrategist.com/calculators/free-volat

浏览 2提问于2013-01-20得票数 1

1回答

如果我们可以使用Selenium，为什么还需要像BeautifulSoup这样的解析器呢？

、、、、

我目前正在使用Selenium从一些网站抓取数据。与urllib不同，我似乎并不需要像BeautifulSoup这样的解析器来解析HTML。我可以简单地找到一个带有Selenium的元素，并使用Webelement.text来获取我需要的数据。正如我所看到的，有些人在web爬行中同时使用Selenium和BeautifulSoup。真的有必要吗？bs4可以提供哪些特殊特性来改进爬行过程？谢谢。

浏览 3提问于2017-04-02得票数 8

回答已采纳

3回答

按名称获取元标记内容、漂亮的汤和python

、、、、

我试图从这个网站获得元数据(这是代码)。 import requests from bs4 import BeautifulSoup source = requests.get('https://www.svpboston.com/').text soup = BeautifulSoup(source, features="html.parser") title = soup.find("meta", name="description") image = soup.find("meta", name=

浏览 5提问于2021-03-08得票数 0

回答已采纳

1回答

使用Python废除NSE 52周高表

、、、

我对网页报废比较陌生，我用Selenium和漂亮的汤来收集数据，但是我不能，有人能帮我从下面的链接获得表格数据吗?或者用Python下载CSV文件？ “”“ print("Start") from nsetools import Nse import pandas as pd import requests from urllib.request import urlopen from bs4 import BeautifulSoup import time import urllib.request nse_web = "https://www.nseindia.c

浏览 3提问于2022-10-06得票数 0

1回答

为什么我的刮刀在试图从这个https://www.airtel.in/myplan-infinity/?中抓取时返回一个空列表

import requests from bs4 import BeautifulSoup url = "https://www.airtel.in/recharge-online?icid=header_new" source = requests.get(url) Soup = BeautifulSoup(source.text, "html.parser") info = Soup.find_all(class_="right-content") print(info)

浏览 1提问于2022-09-21得票数 0

1回答

从URL中抓取数据:如何检索所有含有丢失和未知最终页ID的URL页面

、、

我想提取一组网页的数据。这是URL的一个示例：我的问题是：网址中的“id=”号在不同的页面之间发生变化。我想遍历并检索数据库中的所有页面。将丢失id(例如，可能会有一个带有id=3和id=6的页面，但没有id=4和id=5)。我不知道in的最终数量(例如，数据库中的最后一个页面是id=100000或id=1000000000，我不知道)。我知道我需要的两行代码是以某种方式列出一个数字列表，然后用这段代码循环这些数字，以提取每一页的文本(解析文本本身是另一天的工作)： import urllib2 from bs4 import BeautifulSoup

浏览 1提问于2018-02-05得票数 0

回答已采纳

2回答

与urlopen一起使用的合适的javascript解析器

、、

我正在尝试以下操作： from urllib2 import urlopen from BeautifulSoup import BeautifulSoup url = 'http://search.wcad.org/Property-Detail?PropertyQuickRefID=R000017&PartyQuickRefID=O0532572' soup = BeautifulSoup(urlopen(url).read()) print soup 打印语句显示非常复杂的文本结构，很难提取变量。提取像Legal Description这样的变量的更好方法是什么

浏览 0提问于2017-05-24得票数 0

2回答

使用Urllib2和Beautiful soup时出现不明错误

、、、、

此代码块的输出总是返回"except“。我的终端中没有显示任何特定的错误。我做错了什么？如有任何帮助，我们不胜感激！ from bs4 import BeautifulSoup import csv import urllib2 # get page source and create a BeautifulSoup object based on it try: print("Fetching page.") page = urllib2.open("http://siph0n.net") soup = BeautifulSo

浏览 0提问于2016-07-28得票数 0

1回答

无法使用BeautifulSoup/Request访问网站上图表中的值

、、

我正在尝试访问以下网站的特定数据：https://koronawirusunas.pl/ 查找绿色图表/列中的最后一个条目，如下所示- 69513。我发现这个数字的唯一地方是在web source的2028行。此行包含图表中以日期分隔的所有条目。我可以访问身体部分的任何其他号码，f.e。通过以下方式： import requests from bs4 import BeautifulSoup source = requests.get('https://koronawirusunas.pl').text soup = BeautifulSoup(source, '

浏览 12提问于2021-01-27得票数 1

回答已采纳

2回答

用漂亮汤解析链接URL

、、

我使用漂亮的汤(BS4)和python，通过回退机/webarchive从黄页中刮取数据。我可以很容易地返回业务名称和电话号码，但当我试图检索网站网址为业务，我只返回整个div标签。 #Import Dependencies from splinter import Browser from bs4 import BeautifulSoup import requests import pandas as pd # Path to chromedriver !which chromedriver # Set the executable path and initialize the

浏览 2提问于2020-11-17得票数 0

回答已采纳

1回答

不能在span标签之间显示内容

、

到目前为止，这是我的代码： import requests from bs4 import BeautifulSoup def web_crawler(max_pages): page = 1 while page <= max_pages: url = "https://www.kupindo.com/Knjige/artikli/1_strana_" + str(page) source_code = requests.get(url) plain_text = source_code.text

浏览 4提问于2017-02-25得票数 0

回答已采纳

1回答

使用BeautifulSoup对表中的链接进行Web抓取返回NoneType和空表

、

我试图在网络上刮除所有的表格N-MFP2，然后打开链接到web，在表单中刮取信息。然而，我仍然无法检索表单。我尝试了多种web抓取方法，包括beautifulSoup和selenium，但是返回的内容是空的，无法进一步获取行数据。感谢你的帮助，因为我已经为这个问题做了三个多小时了。我的代码如下： # Create an URL object url = 'https://www.sec.gov/edgar/browse/?CIK=843781' page = requests.get(url) soup = BeautifulSoup(page.content, 'h

浏览 3提问于2022-01-13得票数 -3

1回答

为什么我的刮刀不起作用？Python3 -请求，BeautifulSoup

、、、

我已经跟踪很长一段时间了，我制作了一个类似于视频中的网络涂鸦器。语言: Python import requests from bs4 import BeautifulSoup def spider(max_pages): page = 1 while page <= max_pages: url = 'https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7&

浏览 8提问于2021-05-01得票数 0

回答已采纳

1回答

BeautifulSoup -将刮擦的数据保存到行和列中

、

我刚刚开始使用Python进行web抓取，并且正在慢慢地取得进展。我希望有人能帮我。我想把所有的飞机都刮到冰岛飞机登记簿上。我编写了一个脚本，从表中提取所有数据，并将其打印到屏幕上，如下所示： from bs4 import BeautifulSoup import requests import pandas as pd url = "https://www.icetra.is/aviation/aircraft/register/" page = requests.get(url) soup = BeautifulSoup(page.text, 'html.par

浏览 5提问于2022-02-25得票数 0

回答已采纳

1回答

使用BS4抓取超文本标记语言表格

、、

我一直在尝试从web url https://www.binance.com/en/futures/funding-history/0中提取表格数据。下面的代码只提取表的一部分(仅标题)。 import requests from bs4 import BeautifulSoup r = requests.get('https://www.binance.com/en/futures/funding-history/0') soup = BeautifulSoup(r.text, 'html.parser') resultsTable = soup.fin

浏览 7提问于2020-12-30得票数 0

回答已采纳

1回答

将HTML数据转换为文本格式- Python

、、、

我使用Selenium驱动程序从LinkedIn配置文件中提取数据点。在本例中，我希望从技能部分提取每个技能，但是数据被提取为HTML格式。当试图将HTML代码转换为文本时，我会得到附加的错误消息。 from parsel import Selector from selenium import webdriver from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup driver = webdriver.Chrome('/Users/davidcraven/

浏览 2提问于2019-04-28得票数 0

回答已采纳

4回答

为什么Selenium和BeautifulSoup之间的页面源不同？

、、

作为标题，我正在从越南的网站()上抓取数据。我一开始使用过BeautifulSoup，它不返回数据作为显示在Chrome上的html源，数据号是隐藏的。但是，我更改了使用Selenium获取html源代码的方法，它返回理想的结果，如所有数据号所示。守则如下：使用bs4： import requests from bs4 import BeautifulSoup url = "https://webgia.com/lai-suat/" req = requests.get(url) soup = BeautifulSoup(req.text, "lxml")

浏览 20提问于2022-06-06得票数 0

1回答

BeautifulSoup不通过ID找到、findAll或获取div

、、、

在过去的两天里，我一直在不停地.我试图使用BeautifulSoup获得一个特定的div ID，如下所示： import requests from bs4 import BeautifulSoup r = requests.get('www.example.com', cookies=cookies_dict) soup = BeautifulSoup(r.content, 'html.parser') div_text = soup.get('div', {'id': 'this_div_id'}).text

浏览 6提问于2017-11-29得票数 0

1回答

从网站上抓取一个不断变化的整数

、、、

我正在尝试从一个网站中提取数字数据。我尝试使用一个简单的web刮刀来检索数据： from mechanize import Browser from bs4 import BeautifulSoup mech = Browser() url = "http://www.oanda.com/currency/live-exchange-rates/" page = mech.open(url) html = page.read() soup = BeautifulSoup(html) data1 = soup.find(id='EUR_USD-b-in

浏览 3提问于2014-02-18得票数 1

回答已采纳

1回答

如何使用Python从网站获取表值

、、

我正在使用python脚本从网站()或()或()获取特定文本，我希望获得所有结果值，如“页面授权”和“域授权”，并将它们过滤掉。我使用python2.7和BeautifulSoup来提取数据。我使用的是以下代码： def parse_url() url = "https://beswick.net/api-code/state-of-digital-example.php" domain = 'http://www.google.com' mozID=" " mozSEC=" " def parse_url(): r =

浏览 5提问于2018-03-01得票数 0

1回答

BeautifulSoup不返回Twitch.tv视图计数

、、、、

我试图使用Python在www.twitch.tv/目录上搜索查看器。我已经尝试了基本的BeautifulSoup脚本： url= 'https://www.twitch.tv/directory' html= urlopen(url) soup = BeautifulSoup(url, "html5lib") #also tried using html.parser, lxml soup.prettify() 这给了我html，而没有显示实际的查看器编号。然后，我尝试使用param数据。 param = {"action": "ge

浏览 0提问于2018-10-06得票数 3

回答已采纳

1回答

Python Scraper在列中查找数据

、、、、

我正在做我的第一个网站抓取器，并试图获得保存在网页上的列中的数字41,110。下面是我的代码。我怎么才能找到这个号码并打印出来呢？ from bs4 import BeautifulSoup import requests web_page = 'https://mcassessor.maricopa.gov/mcs.php?q=14014003N' web_header = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML,

浏览 0提问于2018-01-11得票数 1

2回答

Web抓取-页面源中未显示内容

、、

我正在尝试从一个网站上抓取信息：。所有数据似乎都是在重复卡片中生成的，但我在查看页面源代码时找不到这些信息。我尝试过使用像Selenium这样的web驱动程序，但是仍然不能看到我想要抓取的内容。我希望能够提取每个条目的所有重复数据。 driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options) url = 'https://foreclosures.cabarruscounty.us/' driver.get(url) web_url = driver.page_sou

浏览 1提问于2020-08-04得票数 0

1回答

利用漂亮汤刮网时的颜色编码

、、

我正在使用Python中的漂亮汤来做web 抓取。网站上的文字有红色字体颜色的名字，我需要有颜色代码。我正在使用网站上的文本作为我的培训数据的纳(仅为正确的名称)。我怎么才能用漂亮汤得到颜色码呢？现在我的代码看起来是这样的。 from bs4 import BeautifulSoup import requests req = requests.get('https://www.islamweb.net/ar/library/index.php?page=bookcontents&idfrom=1&idto=272&bk_no=86&ID=2'

浏览 2提问于2021-08-07得票数 1

回答已采纳

1回答

用于在熊猫数据中创建两个不同列的Lambda函数

、、

我有一个熊猫数据框，它有基于HTML的文本字段，我想从中派生两个字段:标记的计数和没有任何标记的干净文本。我使用BeautifulSoup来执行这些功能。说, df_ads['content_elements_cnt'] = df_ads['content'].apply(lambda x: dict(Counter([element.name for element in BeautifulSoup(x).html if element.name != None]))) df_ads['content_refined'] = df_ads[&#

浏览 1提问于2022-01-23得票数 0

回答已采纳

1回答

使用bs4的not报废:类型错误- 'NoneType‘对象不可订阅

、、、、

我正在尝试从网址中提取“视频”，并打印控制台中有多少。但我知道这个错误： TypeError: 'NoneType' object is not subscriptable 这是我的代码： import requests from bs4 import BeautifulSoup Web_url = "https://watch.plex.tv/show/hannibal/season/1/episode/9" r = requests.get(Web_url) soup = BeautifulSoup(r.content, 'html.parse

浏览 3提问于2022-09-30得票数 -1

回答已采纳

1回答

Selenium下载完整的html页面

、

我正在学习使用Python Selenium和BeautifulSoup进行web抓取。目前，我正在尝试抓取谷歌搜索趋势上的热门搜索这是我当前的代码。然而，我意识到完整的html没有下载，我只有最近几个日期的内容。我能做些什么来纠正这个问题？ from selenium import webdriver from bs4 import BeautifulSoup googleURL = "http://www.google.com/trends/hottrends#pn=p5" browser = webdriver.Firefox() browser.get(googl

浏览 1提问于2013-05-17得票数 15

1回答

基于python的webscraper能获得java脚本函数的结果吗？

、、、、

我正在尝试创建一个基于python的web刮刀，以便从：获得黄金的价格。但是，当我运行代码时，它会返回我正在寻找的span，但它的empty.< span id="oz_display">< /span>。我检查了这个站点，它似乎正在运行一些java脚本来替换值“jQuery("#oz_display").html("$ jQuery "$1”)“我如何获得这些数据？” import re from bs4 import BeautifulSoup from urllib.request import urlopen m

浏览 0提问于2019-12-14得票数 0

回答已采纳

1回答

无法从web表中刮取新冠肺炎数据

、

我正在用Python研究新冠肺炎在科索沃的传播。问题是，从表上的web抓取中得到的结果是空的。网络是表，我需要它的记录：我使用了许多方法提取记录，但没有成功，使用的最新代码仅用于标题： import requests import pandas as pd from bs4 import BeautifulSoup link = 'https://corona-ks.info/?lang=en' # get web data req = requests.get(link) # parse web data soup = BeautifulSoup(req.co

浏览 2提问于2021-03-18得票数 1

回答已采纳

1回答

使用列出URL中Excel文件的名称

、、、

我正在尝试使用Python和BeautifulSoup进行web抓取，所以我正在学习教程，但是我仍然坚持在成功的requests.get(url)之后。一旦我定义了我想要提取的元素(在网站上出现的Excel文件名的名称)，基于标记及其类，其中包含“file -id-.”字符串(.意思是文件的id )我得到的都是空列表。我的目标是列出这个url地址中的所有Excel文件名，然后使用for循环来打开它们。所有这些都要从全国劳动局提取具体的月度数据，该部门全年结构相同。 labour_office_web_text = requests.get("url").text soup

浏览 2提问于2021-09-16得票数 1

回答已采纳

1回答

Web从网站上抓取表格

、、

嗨，我正在尝试从web 抓取和解析所有的表格数据。所以，我写了下面的code.But，它没有显示任何数据。我把问题答案看了一遍，但没弄明白问题所在。 from BeautifulSoup import BeautifulSoup from urllib2 import urlopen import re url='https://html5test.com/' data=urlopen(url) parse=BeautifulSoup(data).findAll('div', attrs={'class': 'resultsTabl

浏览 2提问于2017-05-17得票数 0

3回答

如何将web元素传递到BeautifulSoup

、、

我得到了像这样的web元素 elements = browser.find_elements_by_xpath("//*[contains(text(), 'Open Until')]") 现在，我必须将这个元素传递给soup，以查找下一个元素&上一个兄弟。我在试这个 soup = BeautifulSoup(elements,'html.parser') 我应该写些什么？ ??? soup = BeautifulSoup(elements.source,'html.parser') ??? 请给出建议

浏览 0提问于2018-11-19得票数 1

1回答

Python幻影not加载网页不正确

、、

我有个问题，从这个链接中提取从这个链接中给我带来数据，这是主页本身。知道为什么会发生这种事吗？我正在使用PhantomJS硒和漂亮的汤来帮助我。 # The standard library modules import os import sys import re import sqlite3 import locale # The wget module import wget import time import calendar from datetime import datetime # The BeautifulSoup module from bs4 import Be

浏览 1提问于2017-07-07得票数 0

回答已采纳

1回答

在BeautifulSoup中处理无限滚动UI

、

我正在研究如何抓取Linkedin source (https://www.linkedin.com/mynetwork/invite-connect/connections/)，但似乎不可能使用无限滚动。如何应对？我不想使用Selenium (稍后我想实现为web服务)。 import bs4 from bs4 import BeautifulSoup import requests def scraping(webpage): headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X

浏览 27提问于2020-01-19得票数 1

回答已采纳

1回答

404：[33mGET /search?q=books HTTP/1.1[0m“404 -

、、

我是网络新手-抓取和制作API，在抓取电子商务网站时遇到错误。下面是我的python代码，请指导我通过相同的，我得到“请求的URL没有在服务器上找到”。在本地主机上运行时。 from flask import Flask , request , jsonify from bs4 import BeautifulSoup import requests app = Flask(__name__) @app.route('/',methods=['GET']) def API(): if request.method == 'GET'

浏览 49提问于2020-07-14得票数 0

1回答

美丽的汤findAll找不到所有的信息

、

我正在尝试使用BeautifulSoup库解析一个HTML。但是，在某个点之后，我无法检索嵌套的div/类。当使用"findAll“函数时，它不会返回所有这些标记。这个特定的站点正在使用Bootstrap，而我想要检索的信息在手风琴组件中。BeautifulSoup是否与Bootstrap冲突，还是我没有正确地解析站点？我想知道商店的位置，比如地址，邮政编码。我使用的代码： req = Request('https://www.needs.ca/en/store-locator/', headers={'User-Agent': 'Mozil

浏览 8提问于2022-02-26得票数 -1

2回答

美丽的汤只返回javaScript代码？

、、、、

我想从下面的网站抓取数据。我试图从网络选项卡中获取数据，但是它没有返回任何数据。然后，我尝试BeautifulSoup获取一些数据，但它只返回带有空tbody标记的Javascript。但是在in元素中，它在表中显示数据。 import requests from bs4 import BeautifulSoup url = 'https://dell.secure.force.com/FAP' headers = { 'Connection': 'keep-alive' } data = { 'pt': "f

浏览 5提问于2022-02-03得票数 1

1回答

使用BeautifulSoup Python进行网页抓取将返回None

、、、

我正在尝试从http://rss.cnn.com/rss/money_markets.rss获取一些文本，当我运行代码时，我总是得到一个None输出。如果有帮助的话，我会试着从网页上获取所有的小标题，以及它们下面的文字。谢谢! import requests import bs4 from bs4 import BeautifulSoup web = requests.get("http://rss.cnn.com/rss/money_markets.rss") start = bs4.BeautifulSoup(web.text, 'lxml') scrap

浏览 34提问于2020-01-26得票数 0

1回答

无法从页面中刮取<h3>标记

、

似乎我可以刮除这个页面上的h3以外的任何标记和类。它一直返回None或空列表。我试着得到这个h3标签： ...on以下网页：这是我使用的代码： from bs4 import BeautifulSoup import requests from pprint import pprint from bs4 import BeautifulSoup URL = "https://www.empireonline.com/movies/features/best-movies-2/" response = requests.get(URL) web_html = r

浏览 3提问于2022-03-17得票数 1

回答已采纳

1回答

BeautifulSoup4提取元素检查器中未显示的额外文本

、、

我有一个脚本，它试图获取Twitter句柄的名称。下面是剧本： import requests, re, bs4, lxml from bs4 import BeautifulSoup url = 'https://web.archive.org/web/20150623154546/https:/twitter.com/biz/status/600839913286672384' r = requests.get(url).text c = re.compile('.*fullname js-action-profile-name show-popup-with-

浏览 3提问于2022-02-23得票数 0

2回答

如何登录到网站并使用Python 3进行刮擦

、、

我想登录到facebook信使并解析HTML。 import requests from bs4 import BeautifulSoup import webbrowser page = requests.get("https://www.messenger.com", auth= ('username', 'password')) soup = BeautifulSoup(page, 'html.parser') print(soup) 我从另一个堆栈问题中得到了这个，但是它给我抛出了这个错误： File "

浏览 0提问于2018-11-30得票数 1

回答已采纳

1回答

使用Python只在数据加载延迟后才通过迭代来刮表？

、、、、

我正在尝试使用python (请求和BeautifulSoup4库以及Selenium)来抓取数据。当我试图从网站中获取一些数据时，数据在延迟后加载，它返回一个空值。我明白，对于这个任务，我必须使用WebDriverWait。 import requests from bs4 import BeautifulSoup # selenium imports from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.ui import

浏览 0提问于2018-06-24得票数 2

回答已采纳

2回答

python中的Webscraping调用返回空值

、

我正在尝试在Python2.0中从MCX网站获取不同商品的最新交易价格(LTP)。下面是我使用的代码。 import requests from bs4 import BeautifulSoup url = 'https://www.mcxindia.com/market-data/market-watch' page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') soup.findAll('div',attrs={'class':'l

浏览 17提问于2018-09-09得票数 1

回答已采纳

1回答

第二个Scraper If语句

、、、、

我正在研究我的第二个Python刮板，并且一直遇到同样的问题。我想抓取在下面的代码中显示的网站。我希望能够输入地块编号，看看他们的属性使用代码是否匹配。但是，我不确定我的刮刀是否能在表中找到正确的行。另外，如果use代码不是3730，也不确定如何使用if语句。任何帮助都将不胜感激。 from bs4 import BeautifulSoup import requests parcel = input("Parcel Number: ") web = "https://mcassessor.maricopa.gov/mcs.php?q=" web_page =

浏览 0提问于2018-01-30得票数 0

2回答

web抓取--该网站向我的刮刀显示不同的内容

、、

我为我的大学开发了一个项目，从我的团队收集数据，做一些统计，操作和其他的东西。我从网站上得到的数据是：我想获得不同季节的数据，但当我运行代码时，我得到的内容与网站不同，例如，2014年的统计数据： import requests from bs4 import BeautifulSoup def scrap_web(page): pageTree = requests.get(page) pageSoup = BeautifulSoup(pageTree.content, 'html.parser') TeamPage = pageSoup.find

浏览 4提问于2020-01-13得票数 0

回答已采纳

1回答

Web抓取的不同结果

、、、、

我试图进行web抓取，并使用了以下代码： import mechanize from bs4 import BeautifulSoup url = "http://www.thehindu.com/archive/web/2010/06/19/" br = mechanize.Browser() htmltext = br.open(url).read() link_dictionary = {} soup = BeautifulSoup(htmltext) for tag_li in soup.findAll('li', attrs={"da

浏览 3提问于2013-11-11得票数 2

回答已采纳