为什么BeautifulSoup发现保持返回的元素的类id与我传递的不同？ - 腾讯云开发者社区

、、、

我正在学习python，在抓取一个web之后，我不太理解响应格式。为什么我没有得到作为这个代码的响应？ import requests from bs4 import BeautifulSoup quote_page = 'https://www.bloomberg.com/quote/SPX:IND' page = requests.get(quote_page).text soup = BeautifulSoup(page, "lxml") price_box = soup.find('span', class_="pric

浏览 0提问于2019-08-10得票数 0

回答已采纳

2回答

为什么BeautifulSoup select()方法返回空列表？

、、、、

import requests from bs4 import BeautifulSoup response = requests.get('https://stackoverflow.com/questions') soup = BeautifulSoup(response.text, 'html.parser') questions = soup.select('.question-summary') print(questions) 这将返回： [] 根据我为之付费的 python课程中的信息，这种情况不应该发生。为什么这段代码返回

浏览 3提问于2022-10-08得票数 -1

回答已采纳

2回答

美丽的汤只返回javaScript代码？

、、、、

我想从下面的网站抓取数据。我试图从网络选项卡中获取数据，但是它没有返回任何数据。然后，我尝试BeautifulSoup获取一些数据，但它只返回带有空tbody标记的Javascript。但是在in元素中，它在表中显示数据。 import requests from bs4 import BeautifulSoup url = 'https://dell.secure.force.com/FAP' headers = { 'Connection': 'keep-alive' } data = { 'pt': "f

浏览 5提问于2022-02-03得票数 1

3回答

用美丽汤获取“名称”属性

、、

from bs4 import BeautifulSoup source_code = """<a href="#" name="One"></a> <a href="#" name="Two"></a>""" soup = BeautifulSoup(source_code) print soup.a['name'] #prints 'One' 使用Beaut

浏览 4提问于2013-10-24得票数 2

回答已采纳

13回答

美汤与按ID抽取div及其内容

、

soup.find("tagName", { "id" : "articlebody" }) 为什么不返回<div id="articlebody"> ... </div>标签和中间的东西呢？它不返回任何内容。我知道它确实存在，因为我从 soup.prettify() soup.find("div", { "id" : "articlebody" })也不起作用。 (编辑：我发现BeautifulSoup没有正确解析我的页面，这可能意味着我试图解析的页面格式不

浏览 65提问于2010-01-26得票数 188

回答已采纳

1回答

只对一个HTML类使用get_text() - Python，BeautifulSoup

、

我试图访问一个类HTML中唯一的文本。我试图应用于 BeautifulSoup，但总是收到相同的错误消息或该标记中的所有项。 My code.py from urllib.request import urlopen from bs4 import BeautifulSoup import requests import re url = "https://www.auchandirect.pl/auchan-warszawa/pl/pepsi-cola-max-niskokaloryczny-napoj-gazowany-o-smaku-cola/p-98502176" r

浏览 0提问于2018-10-20得票数 1

回答已采纳

3回答

HTML标记之间的Selenium

、、、

将Javascript创建的页面中的所有超文本标记语言传递给BeautifulSoup的最佳方法是什么？我目前使用的是： from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from selenium.webdriver.common.keys import Keys from BeautifulSoup import BeautifulSoup browser = webdriver.Firefox() browser.get("http://w

浏览 1提问于2012-10-13得票数 2

3回答

如何用BeautifulSoup理解Python中的“递归”

、、、

我正在使用Python中的BeautifulSoup处理一个“递归”项目。我读过一份正式文件和许多问题，但我仍然不明白。 from bs4 import BeautifulSoup s = "<div>C<p><strong>A</strong>B</p></div>" soup = BeautifulSoup(s, 'html.parser') print(soup.find("p", recursive=False))给出None 是不是因为我们找不到<

浏览 2提问于2021-11-08得票数 1

回答已采纳

1回答

这两个css选择器[class^=test]和[class|=test]有什么不同？

这两个[class^=test]和[class|=test]匹配的元素与我看到的相同。有什么真正的区别吗?为什么我要使用其中一个而不是另一个呢？

浏览 0提问于2016-10-19得票数 3

2回答

公共-Lisp中“`find`”函数的意外行为

、

使用nil搜索序列中的find总是返回NIL。 (find nil #(a b nil c)) -> NIL (find nil #(a b c)) -> NIL 如果序列是一个列表的话。但是，member的操作与我所期望的一样： (member nil '(a b nil c)) -> (NIL C) 为什么find被设计成这样操作？请注意，position的工作方式与我所期望的一样： (position nil #(a b nil c)) -> 2 (position nil #(a b c)) -> NIL

浏览 6提问于2019-03-08得票数 3

回答已采纳

2回答

AttributeError:在Python3中，“NoneType”对象没有属性'get‘，使用的是漂亮的汤

、

我正在尝试刮，一个使用python的网站。但我会跟着错误走。回溯(最近一次调用)：文件"c:\Users\My PC Buddy\python\scraper\scraper.py"，第11行，在link = product.find("a"，{“class”：“product.find”}).get(‘href’) AttributeError：'NoneType‘对象没有属性'get’中以下是我的代码 import requests from bs4 import BeautifulSoup import pandas as pd bas

浏览 2提问于2021-05-15得票数 0

1回答

python BeautifulSoup表抓取

、

我的HTML有几个表，第一个表是： <table> <tr> <td> <div id="string"> </div> </td> </tr> </table> 其余的是形式： <table class="confluenceTable" data-csvtable="1"> <tbody> <

浏览 2提问于2016-07-24得票数 1

回答已采纳

3回答

使用BeautifulSoup4与谷歌翻译

、、

目前，我正在浏览AutomateTheBoringStuff的网页抓取部分，并试图编写一个脚本，该脚本可以使用BeautifulSoup4从谷歌翻译中提取翻译单词。我检查了一页的html内容，其中的“解释”是翻译出来的词： <span id="result_box" class="short_text" lang="en"> <span class>Explanation</span> </span> 使用BeautifulSoup4，我尝试了不同的选择器，但没有返回翻译的单词。下面是

浏览 7提问于2016-07-19得票数 4

回答已采纳

2回答

"class“属性返回列表，而其他属性返回值

、、

对于python中的html解析非常方便，下面的代码结果融合了我。 from bs4 import BeautifulSoup tr =""" <table> <tr class="passed" id="row1"><td>t1</td></tr> <tr class="failed" id="row2"><td>t2</td></tr> </table> "&#

浏览 5提问于2016-07-26得票数 1

回答已采纳

1回答

它显示了运行python代码后的TypeError (O‘’reilly示例代码)。

我遵循“O‘：从现代Web收集更多数据”的示例代码，并发现它显示了错误。版本是: python3.7.3，BeautifulSoup4 守则如下： from urllib.request import urlopen from bs4 import BeautifulSoup import re import random import datetime import codecs import ssl ssl._create_default_https_context = ssl._create_unverified_context random.seed(datetim

浏览 2提问于2019-07-07得票数 0

2回答

使用python解析网页

、、、、

我正试图解析一个网页(forums.macrumors.com)，并得到所有发布的线程的列表。因此，到目前为止，我已经得到了如下结论： import urllib2 import re from BeautifulSoup import BeautifulSoup address = "http://forums.macrumors.com/forums/os/" website = urllib2.urlopen(address) website_html = website.read() text = urllib

浏览 3提问于2015-06-22得票数 0

回答已采纳

1回答

用beautifulSoup错误提取HTML

、、

这是im的代码，最初用于提取左上角的时间。 import qgrid import webbrowser import requests from bs4 import BeautifulSoup page = requests.get('http://www.meteo.gr/cf.cfm?city_id=14') #sending the request to take the html file. soup = BeautifulSoup(page.content, 'html.parser') #creating beautifulSoup

浏览 0提问于2018-05-12得票数 3

回答已采纳

3回答

使用re模块输出soup.findall()作为进一步文本操作的输入

、、、

尝试使用BeautifulSoup从网页中提取文本。希望将soup.findall()的输出作为输入传递，以便使用re模块进一步清理数据纯文本输入正常工作，但如果传递soup.findall()的输出，它将引发以下错误。回溯(最近一次调用)：文件“scpe2.py”，第18行，在url = re.search( '，univ) File "/usr/lib/python2.7/re.py“中，第142行，在搜索返回_compile(模式，标志).search( string ) TypeError:预期字符串或缓冲区 soup.findall()的变量打印正在工作。

浏览 6提问于2013-11-24得票数 0

回答已采纳

1回答

在PyCharm中使用requests和BeautifulSoup后没有输出

、、、、

我想从《纽约时报》网站上获得一些头条。我有两个问题，问题1:这是我的代码，但我没有给我任何输出，有人知道我必须修改什么吗？ import requests from bs4 import BeautifulSoup url = 'https://www.nytimes.com' r = requests.get(url) soup = BeautifulSoup(r.text, "html.parser") a = soup.find_all(class_="balancedHeadline") for story_heading in a:

浏览 24提问于2020-04-12得票数 0

1回答

不协调的on_raw_reaction_add /Nextcord.py反应角色

、、、

我花了很长时间试图找出为什么这不起作用的原因，不起作用的地方是我得到这个角色的地方，但我不知道为什么。有人知道吗？ @client.event async def on_raw_reaction_add(payload): message_id = 999984925216358470 if message_id == payload.message_id: user = payload.member guild_user = user.guild emoji = payload.emoji.name if emoji == "<:

浏览 11提问于2022-07-22得票数 0

回答已采纳

2回答

仅返回txt文件的最后一个URL的Beautiful

、、

我试图解析一组txt文件的url，但是Beautiful Soup只返回最后一个url的内容。这是一组来自LetterBoxD网站的带有电影评论的urls。例如，如果文件有10个urls，我得到的前9个urls是"none“，只有第10个是正确返回的。有人能帮我吗？ from bs4 import BeautifulSoup import requests with open('list_of_urls.txt', 'r') as f: x = f.readlines() for url in x: page = requests.get(

浏览 18提问于2020-10-24得票数 0

回答已采纳

1回答

写信给CSV会导致每个字母都有自己的单元格。

、、

我有一些用BeautifulSoup解析HTML的代码，并打印代码。 (gist链接，如果有兴趣的话)： import csv import requests from bs4 import BeautifulSoup import lxml r = requests.post('https://opir.fiu.edu/instructor_evals/instr_eval_result.asp', data={'Term': '1175', 'Coll': 'CBADM'}) soup = Beautiful

浏览 3提问于2017-11-06得票数 1

回答已采纳

1回答

为什么SequenceType.dropFirst(_:)返回Self.SubSequence而不是Self？

试着理解 protocol SequenceType { associatedtype SubSequence @warn_unused_result func dropFirst(_ n: Int) -> Self.SubSequence /* ... */ } 为什么我们需要associatedtype SubSequence，为什么不需要Self，SequenceType

浏览 0提问于2016-08-19得票数 3

回答已采纳

1回答

Python beautifulSoup循环

、

我试图使用BeautifulSoup将for循环的特定迭代(在“位置”变量中)的URL分配给变量，但我不明白为什么它不能工作(输出是完整的列表--我只想要所选的列表)。任何帮助都是非常感谢的。谢谢! position = int(input('Enter position:')) n = int(0) tags = soup('a') for tag in tags: if n<position: n=n+1 else: x=tag.get('href', None) pri

浏览 8提问于2022-07-05得票数 0

1回答

bs4模拟方法的混淆及其对属性的影响

我知道过去有人问过这个问题的版本，但我仍然很困惑，如果可能的话，我想一劳永逸地解决我的疑虑。如果我用 from bs4 import BeautifulSoup 我的soup任务是 soup = BeautifulSoup(html, "lxml") 如果我这样做的话： from bs4 import BeautifulSoup as bs4 我的soup任务是 soup = bs4(html, "lxml") 最后，如果我使用： import bs4 我的soup任务是 soup = bs4.BeautifulSoup(html, "lxml

浏览 2提问于2019-04-10得票数 0

回答已采纳

1回答

BeautifulSoup选择表

、、

在网站上有几张桌子。我想选择其中的一个，这就是我的问题。当我写道： g_data=soup.find_all("table",{"class":"awT votegroup votegroup7 wH episodesList"}, {"id":"sezon7"}) 它恰好只找到了表nr 7，但当我编写以下代码时： html_1=("table",{"class":"awT votegroup votegroup7 wH episodesList"}, {"

浏览 4提问于2014-11-10得票数 0

3回答

TypeError：“NoneType”对象不能使用BeautifulSoup进行迭代

我对Python非常陌生，这可能是一个非常简单的错误类型，但无法解决问题所在。我试图从一个包含特定子字符串的网站获得链接，但是当我这样做时，会得到"TypeError：'NoneType‘对象是不可迭代的“。我相信这个问题与我从网站上得到的链接有关。有人知道这里有什么问题吗？ from bs4 import BeautifulSoup from urllib.request import urlopen html_page = urlopen("http://www.scoresway.com/?sport=soccer&page=competition&am

浏览 3提问于2017-04-20得票数 0

回答已采纳

2回答

AttributeError：'NoneType‘对象没有属性'find’(加密货币网络刮刀)

、

使用一个加密货币小部件，它实时地为特定的加密货币价格抓取html，但是运行在AttributeError上：'NoneType‘对象没有属性'find’。这件事突然发生了，我真搞不懂为什么它来得不快。在此之前，我已经多次运行代码，绝对没有问题。我的两个问题是..。为什么会突然发生这种事？和我该怎么解决这个问题？ from tkinter import * import requests from bs4 import BeautifulSoup from tkinter.ttk import * from time import strftime def get_c

浏览 12提问于2021-12-06得票数 2

1回答

用硒和美丽的汤刮网却不起作用

、、

我试图从一个网站获得作者和评论的内容，但我发现它的页面来源和检查元素是不同的。我试着使用BeautifulSoup，但是我无法从它得到任何东西。因此，我试着使用Selenium，但仍然无法得到任何东西。我检查网站中的元素，并使用Selenium输入类名，但仍然不能返回任何内容。这是我写的代码。 web = "https://www.regulations.gov/document?D=WHD-2020-0007-0609" #Selenium driver = webdriver.Chrome() driver.get(web) name = driver.find_elem

浏览 3提问于2021-01-25得票数 0

回答已采纳

2回答

使用CSS选择器/Selenium查找元素

、、、

我的代码转到一个网站上，点击每一个(表的)行的迭代，这会打开一个新的窗口。我想为每个新窗口刮1条信息，但是我很难使用CSS选择器来获取这个字段() from selenium import webdriver from bs4 import BeautifulSoup import pandas as pd import time import requests driver = webdriver.Chrome() productlink=[] driver.get('https://aaaai.planion.com/Web.User/SearchSessions?ACCOUN

浏览 1提问于2021-02-01得票数 0

1回答

为什么网站的源代码与浏览器中的代码不同？(用于网刮)

、、、、

我正在做我的第一个编程项目，所以原谅任何不正确的行话。我的目标:我试图从我的本地图书馆的网站上刮刮。最终目标是能够自动更新网站上的书籍。进展:我已经成功地使用Python、Selenium和Webdriver登录到库的网站，并进入“结帐”页面查看借出的项目。然后，我使用“美丽汤”来提取网站“结账”页面的HTML代码。问题:当我检查“”网站(在Chrome中右击并选择‘’)时，源代码与我在Chrome浏览器中右击并选择‘查看源代码’的代码不同。当查看源代码时，HTML代码与我的Python代码提取的内容保持一致，但缺少了我想要刮的所有信息。但是，在检查站点时，HTML代码确实包含了我想刮

浏览 0提问于2020-01-20得票数 1

1回答

BeautifulSoup get_text返回NoneType对象

、、

我正在尝试BeautifulSoup进行网页抓取，我需要从这个中提取标题，特别是从“更多”标题部分。这是我到目前为止尝试使用的代码。 import requests from bs4 import BeautifulSoup from csv import writer response = requests.get('https://www.cnbc.com/finance/?page=1') soup = BeautifulSoup(response.text,'html.parser') posts = soup.find_all(id='p

浏览 12提问于2018-08-05得票数 4

回答已采纳

1回答

处理IndexError:超出范围的列表索引

import requests from bs4 import BeautifulSoup from lxml import etree import csv with open('1_colonia.csv', 'r', encoding='utf-8') as csvfile: reader = csv.reader(csvfile, delimiter=';') next(reader) # skip the header row for row in reader: url =

浏览 6提问于2022-09-19得票数 1

回答已采纳

2回答

为什么查找函数在BeautifulSoup中不起作用？

、

我想从一块块中提取产品名称。我使用的是BeautifulSoup，问题是当我试图从select()中获取产品名称时，它会返回预期的数据，当我尝试find()时，它不会返回任何数据。为什么find()不在这里工作。 from bs4 import BeautifulSoup data = '''<span id="productTitle" class="a-size-large"> Alien 3 </span>''' soup = BeautifulSoup(data) print(sou

浏览 0提问于2018-03-10得票数 1

回答已采纳

2回答

Selenium的Webdriver.execute_script()返回一个空列表，而Chrome工具返回一个填充的列表

、、

我正在尝试使用Selenium的Webdriver.execute_script()从Reddit的主页中获取元素列表。(在您推荐PRAW之前: Reddit实际上并不是我想从其中获取元素，我只是以它为例。) 尽管我在Chrome的dev工具控制台中运行它时，执行的脚本工作得很好，Selenium的方法只返回一个空列表，这个列表应该用Reddit posts的title元素填充。 import urllib from selenium import webdriver from BeautifulSoup import BeautifulSoup #Path to the chromedri

浏览 1提问于2014-11-25得票数 0

回答已采纳

2回答

BeautifulSoup4查找法

、、

我试着用python3从雅虎金融公司获得一些数字，但我得到的只是一个“零”。 from bs4 import BeautifulSoup import requests source = requests.get('https://finance.yahoo.com/quote/SWCH? p=SWCH&.tsrc=fin-srch').text soup = BeautifulSoup(source, 'lxml') price = soup.find('span', class_='Trsdu(0.3s) Fw(b)

浏览 1提问于2018-11-11得票数 0

回答已采纳

19回答

如何按类查找元素

、、、

我在使用Beautifulsoup解析带有"class“属性的HTML元素时遇到了问题。代码如下所示 soup = BeautifulSoup(sdata) mydivs = soup.findAll('div') for div in mydivs: if (div["class"] == "stylelistrow"): print div 在脚本结束后，我在同一行得到了一个错误。 File "./beautifulcoding.py", line 130, in getlanguage

浏览 6提问于2011-02-18得票数 532

回答已采纳

1回答

使用python进行with抓取以提取数据

、、

我正在使用下面的代码。除了“从属关系”这一部分，一切都能正常工作。它返回一个错误: AttributeError：'NoneType‘对象在没有.text的情况下没有'text’属性，它返回类中的所有内容--整个代码 import requests import bs4 import re headers = {'User-Agent':'Mozilla/5.0'} url = 'http://pubs.acs.org/toc/jacsat/139/5' html = requests.get(url, headers=hea

浏览 1提问于2017-02-10得票数 1

1回答

使用BS4设计数据抓取函数的问题

、、

我需要的数据是在2个不同的tag + class组合。我希望我的函数在这两个组合下进行搜索，并同时显示这两个组合下的数据。这两种组合是互斥的。如果存在1个组合，则不存在其他组合。我使用的代码是： # -*- coding: cp1252 -*- import csv import urllib2 import sys import urllib import time from bs4 import BeautifulSoup from itertools import islice def match_both2(arg1,arg2): if arg1 == 'div&#

浏览 0提问于2013-02-11得票数 0

回答已采纳

3回答

Python用漂亮汤3抓取:如何从div获取文本

、、、

下面是我遇到麻烦的HTML， <div id="id" class="class"> text </div> 如果我有一个变量里面有汤， div = find('div', attrs={'class': 'class'}) 如何获得div的文本部分？我分别尝试过这些方法中的一些。 text = div.get_text() text = div.string text = div.text 当我跑的时候 type(div) 它是BeautifulSoup.Tag类型的，但是当我调

浏览 7提问于2017-09-19得票数 0

回答已采纳

1回答

为什么有时使用findAll会得到空数组？

、

我使用BeautifulSoup获取YouTube视频id。 import time import requests from bs4 import BeautifulSoup # BeautifulSoup get video id url = "https://www.youtube.com/results?search_query=" + trailerCnName request = requests.get(url) time.sleep(1) soup = BeautifulSoup(request.text, "

浏览 24提问于2020-06-29得票数 0

1回答

Python递归返回不同

、

蟒蛇的返回是不同的，我不明白为什么： from bs4 import BeautifulSoup def recurse_table(table, table_list): if table.find_next("table") is not None: recurse_table(table.find_next("table"), table_list) table_list.append(table) return table_list fp = open("tc4400_cs_2.html"

浏览 20提问于2020-04-15得票数 1

1回答

Python&BeautifulSoup:如何提取标签的值？

、、

<a href="link" target="_blank" class="xXx text-user topic-author" sl-processed="1"> diamonds </a> 我想提取伪‘钻石’，这是在'a‘标签与BeautifulSoup。我尝试了很多事情，但它总是给我“没有”。对我来说，应该起作用的是 txt = soup.find('a', {'class': 'xXx text-user topic-author

浏览 2提问于2017-10-24得票数 0

回答已采纳

3回答

美汤错误与类内容连字符"-"？

、

我使用python2.7+ BeautifulSoup 4.4.1 e = BeautifulSoup(data) s1 = e.find("div", class_="one").get_text() # Successful s2 = e.find("div", class_="two-three").get_text() # ERROR

浏览 6提问于2016-04-25得票数 2

回答已采纳

1回答

如何从在Python中使用react.js和Selenium的网页中刮取数据？

、、、、

我面临一些困难，刮一个使用react.js的网站，不知道为什么会发生这种情况。这是网站的html：我想做的是用class: play-pause-button btn btn -naked点击按钮。但是，当我用Mozilla gecko webdriver加载页面时，会抛出一个异常： Message: Unable to locate element: .play-pause-button btn btn-naked 这让我觉得也许我应该做点别的什么来得到这个元素？到目前为止，这是我的代码： driver.get("https://drawittoknowit.com/cou

浏览 0提问于2019-08-28得票数 3

回答已采纳

1回答

为链接返回不同的href。

、、、、

在python中，我使用请求模块和BS4使用duckduckgo.com搜索web。我手动进入了'hello‘，并使用开发人员工具获得了第一个结果标题为<a class="result__a" href="http://example.com">。现在，我使用了下面的代码来获得Python的href： html = requests.get('http://duckduckgo.com/html/?q=hello').content soup = BeautifulSoup4(html, 'html.parser

浏览 0提问于2018-10-14得票数 2

回答已采纳

1回答

jQuery hasClass()方法未返回正确的值

、、

嗨..。我试着改变元素的背景颜色，它是children.So，我写了一个function，它被递归调用。以下是一些条件，如果元素具有class="irmNDrdnVal"或id="irmNDatePickerContainer"属性，则不应更改元素的背景色。代码如下： <!DOCTYPE HTML PUBLIC> <html> <style> .newClass{ color:red; background:green; } </style> <script type="text

浏览 0提问于2012-12-29得票数 0

回答已采纳

1回答

Jquery按日期对html表行进行排序

、

我一直在使用脚本根据我从stackoverflow获得的日期字符串对表进行排序，但代码已经停止工作。我可以问一下有没有人知道为什么？ $('tr.Entries').each(function() { var $this = $(this), t = this.cells[1].textContent.split('-'); $this.data('_ts', new Date(t[2], t[1] - 1, t[0]).getTime()); }).sort(function(a, b) { return $(a).da

浏览 1提问于2018-11-21得票数 0

2回答

在Magento对象中，Load意味着什么？

我正在尝试通过Magento学习编码，我必须承认，我对其中的对象链接的概念感到有点困惑。事实上，我不知道什么时候加载，什么时候我可以避免它。例如： $product = Mage::getModel('catalog/product')->load($item->getProductId()); 在这种情况下，我想从产品ID中获取产品信息；为什么需要加载它？($item是订单中所有产品的循环) 在这里，我不需要做任何加载： $customer = $payment->getOrder()->getCustomer(); 我为我愚蠢的问题提前道歉:与我的

浏览 1提问于2011-02-16得票数 4

6回答

我创建了一个NSArray实例，而哪个类不是NSArray，而是__NSArrayI？

、、

我有以下代码： id anArray = [NSArray arrayWithObjects:@1, @2, nil]; NSLog(@"anArrayClass - %@", [anArray class]); NSLog(@"NSArrayClass - %@", [NSArray class]); 我希望这两个输出都是NSArray，但是结果是： 2016-08-18 21:08:53.628 TestUse[9279:939745] anArrayClass - __NSArrayI 2016-08-18 21:08:53.629 TestUse[927

浏览 4提问于2016-08-18得票数 0

回答已采纳