BeautifulSoup/Regex:从href中查找特定值_BeautifulSoup4在链接中查找具有特定文本的多个href链接_python beautifulsoup4从find_all结果中查找href链接 - 腾讯云开发者社区

python、beautifulsoup

我正在编写一个python 3脚本来抓取一个网站，并检查产品是否有库存。我遇到的问题是在我从BeautifulSoup抓取的超链接中搜索产品名称。产品名称将有一个空格，因此它实际上是2个单词，我认为这是导致问题的原因。 **传入product_name，示例："Blue Truck“示例链接：<a href="https://example.com/products/">Blue Truck</a> soup = BeautifulSoup(driver.page_source, 'html.parser') print("

浏览 0提问于2020-09-22得票数 0

1回答

用Python2.7x从href标记中提取字符串

python、regex、python-2.7、beautifulsoup

我目前正在使用Beautifulsoup4从HTML页面中提取href标签。我在Beautifulsoup4中使用的是Beautifulsoup4查询，它运行良好，并返回我正在寻找的'a href‘标记。返回内容的示例如下： "<a href="manage/foldercontent.html?folder=Pictures" style="background-image: url(shares/Pictures/DefaultPicture.png)" target="content_window" title=

浏览 3提问于2015-06-30得票数 0

回答已采纳

2回答

提取<a href>标签的特定值

python、web-scraping

我只需要刮掉一个网站，位于这个标签的数字。顺便说一句，我正在使用Python和BeautifulSoup <p class="cell-link"> <a href="/#/miner-list/offline-list">17</a> </p> 我已经在网上寻找了解决方案，但由于这个网站每5分钟自动更新一次，我似乎找不到一个方法来获得这个数字。任何建议都会很有帮助。

浏览 1提问于2021-07-08得票数 1

5回答

BeautifulSoup/Regex:从href中查找特定值

javascript、python、html、regex、beautifulsoup

使用下面的代码，并尝试找到href末尾的值。有没有办法提取href，并在BeutifulSoup/Regex中找到page=之后的值？ from bs4 import BeautifulSoup import requests import json import re request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1') soup = BeautifulSoup(request.text, 'html.parser') findNext = soup.find(

浏览 76提问于2018-01-27得票数 2

回答已采纳

1回答

BeautifulSoup在标记中找到一个部分字符串

python、string、beautifulsoup、tags、find

由于某种原因，BeautifulSoup突然无法在我已经开始的新的Python中找到我的任何标记的内容。我已经使用BeautifulSoup大约一年了，从来没有遇到过这个问题。我能够用".json()“在Python中成功地注入一个JSON有效负载，并使用html.parser将其传递给BeautifulSoup，而且每次都非常有效。现在，我正在尝试读取包含原始MySql的BeautifulSoup字段，将其作为文本字符串提供给Python，并使用BeautifulSoup进行解析和操作，但没有成功。我只想简单地加载一个文本字符串，就像在这个例子中一样，基于文本字符串搜索(Bea

浏览 2提问于2021-04-22得票数 0

2回答

从HTML获取所有链接、URL和URI

python、regex

我想从html中获取所有类型的链接、URL和URI。基本上，任何我可以放在Chrome搜索栏上的东西，都会带我到另一个页面/下载一个档案。到目前为止，我已经有了这段代码，但是这段代码只能从href获得链接，也只能从a标记获得链接，但是我确实希望从src、source、<source>，甚至是html中“融合”的链接中获取每种类型的链接。 import requests as rq from bs4 import BeautifulSoup def get_links(url): data = rq.get(url).content.decode() soup =

浏览 2提问于2021-03-30得票数 0

回答已采纳

1回答

无法为re.compile定义正则表达式并将其传递给Beautifulsoup

regex、python-2.7、beautifulsoup

目前，我正在实践使用python访问web的基本概念。我正在学习关于YouTube的教程，并在下面的代码中进行了指导。 from urllib2 import urlopen, HTTPError from BeautifulSoup import BeautifulSoup import re url="http://getbusinessreviews.org/" try: webpage = urlopen(url).read except HTTPError, e: if e.code == 404: e.msg = 'd

浏览 2提问于2015-11-22得票数 1

回答已采纳

2回答

美丽的汤选择google图像返回空列表

python、beautifulsoup、web-crawler

我想使用从BeautifulSoup检索信息。我检查了许多堆栈溢出帖子(、、、、)，但仍然无法检索信息。我希望每个瓷砖(图片)的(li)信息，如href，然而，find_all和select one返回空列表或无。你能帮我得到"e0WtYb HpzMff PJLMUc“类锚标记的以下href值吗? href="/entity/claude-monet/m01xnj?categoryId=artist" 下面是我尝试过的。 import requests from bs4 import BeautifulSoup url = 'https://artsand

浏览 14提问于2021-12-05得票数 3

回答已采纳

1回答

使用httplib2和BeautifulSoup递归搜索网站的链接

python-2.7、beautifulsoup、httplib2

我使用以下方法从网页中获取所有外部Javascript引用。我如何修改代码，以便不仅搜索网址，而且搜索网站的所有页面？ import httplib2 from BeautifulSoup import BeautifulSoup, SoupStrainer http = httplib2.Http() status, response = http.request('https://stackoverflow.com') for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('script&

浏览 5提问于2017-10-02得票数 0

2回答

BeautifulSoup find_all限制在50个结果？

python、beautifulsoup

我正在尝试使用BeautifulSoup从页面中获得结果： req_url = 'http://www.xscores.com/soccer/livescores/25-02' request = requests.get(req_url) content = request.content soup = BeautifulSoup(content, "html.parser") scores = soup.find_all('tr', {'style': 'height:18px;'}, limit=None)

浏览 3提问于2017-02-27得票数 7

回答已采纳

1回答

我想从python中的url用户那里获取输入。

python-3.x、url

我想从用户输入的网站名称和最大编号。他想要抓取的页面的数量website...but无法获得任何解决方案..这是我的代码 import requests from bs4 import * from urllib import request url1 = input("Enter url you want to crawl:") max_pages1 = int(input("Enter no. of pages you want to crawl:")) def web_crawler(max_pages,url): page = 1 w

浏览 10提问于2017-02-19得票数 0

2回答

Python:使用BeautifulSoup难以获得href的URL

python、html、beautifulsoup、href

我正在学习如何首先使用BeautifulSoup在Python中进行web抓取。我遇到了一个我不知道如何解决的问题，我将向您展示我的代码片段： from bs4 import BeautifulSoup import requests start_url = "https://www1.interactivebrokers.com/en/index.php?f=2222&exch=nasdaq&showcategories=STK#productbuffer" # Download the HTML from start_url: downloaded_ht

浏览 6提问于2020-11-04得票数 0

回答已采纳

2回答

提取属性值的正则表达式

c#、html、regex

提取HTML表的title属性值的快速方法是什么： ... <li><a href="/wiki/Proclo" title="Proclo">Proclo</a></li> <li><a href="/wiki/Proclus" title="Proclus">Proclus</a></li> <li><a href="/wiki/Ptolemy" title="Ptolemy">

浏览 0提问于2011-04-03得票数 11

回答已采纳

1回答

beautifulsoup4从具有特定属性值的锚点元素获取href

python、parsing、beautifulsoup

我正在尝试解析来自页面上多个锚点元素的href值，这些锚点元素的属性为itemprop，值为url，使用BeautifulSoup4 例如，从<a itemprop="url" href="/pages/page"></a>中提取/pages/page，但是在一个页面中有多个这样的项目，所以我希望将它们放在一个数组中。我在想像这样的soup("span", html = True, {'itemprop' : 'name' })

浏览 2提问于2016-10-28得票数 1

1回答

BeautifulSoup抓取具有至少重复两次的类的特定href

python、web-scraping、beautifulsoup

我必须从不同的网页数据表，如，部分与网址的网址。问题是，类“vermell_nobullet”的href比我所需要的至少重复两次。如何使用网站的href提取特定的类“vermell_nobullet”。我的代码 from bs4 import BeautifulSoup import lxml import requests def parse_url(url): response = requests.get(url) content = response.content parsed_response = BeautifulSoup(content,

浏览 8提问于2021-12-23得票数 0

回答已采纳

1回答

BeautifulSoup:在类中获取更改名称的数据

python、web-scraping、beautifulsoup

我使用BeautifulSoup来抓取一个html页面，其中我需要的信息存储在如下代码中： <a class=" l00_PR_lTitleonContext" href="site0.html"> Title 0 </a> <a class=" l01_PR_lTitleonContext" href="site1.html"> Title 1 </a> <a class=" l02_PR_lTitleonContext" href="site2.ht

浏览 1提问于2022-06-27得票数 0

回答已采纳

2回答

如何在蟒蛇汤中得到所有的“href”？我试了很多次，但没有工作。

python、beautifulsoup、python-requests、for-in-loop

如何在蟒蛇汤中得到所有的“href”？我试了很多次，但都失败了。无论我使用'soup.find‘或'soup.find_all’方法来对'href‘进行不信任，它都不起作用。 python版本:3.10 !pip install requests import requests import time import pandas as pd from bs4 import BeautifulSoup productlink = [] headers = {'User-Agent':'Mozilla/5.0 (Linux; Android 6.

浏览 6提问于2021-12-04得票数 0

回答已采纳

2回答

什么漂亮的汤findall正则表达式字符串使用？

python、regex、web-scraping、beautifulsoup

我在表单的HTML中有链接 <a href="/downloadsServlet?docid=abc" target="_blank">Report 1</a> <a href="/downloadsServlet?docid=ixyz" target="_blank">Fetch Report 2 </a> 我可以使用BeautifulSoup获得上述形式的链接列表我的代码如下 from bs4 import BeautifulSoup html_page = urllib2.u

浏览 27提问于2017-01-20得票数 2

回答已采纳

1回答

如何在网页中搜索多个关键词？这只输入一个关键字

python

从bs4导入BeautifulSoup导入时间导入smtplib 默认情况下为True 而True： url = "https://www.google.com" browser = mechanize.Browser() browser.open(url) response = browser.response().read() soup = BeautifulSoup(response, "lxml") count = 1 if str(soup).find("English") == -1: # wait 60 seconds (c

浏览 1提问于2021-12-25得票数 -1

1回答

用BeautifulSoup导航

python、html、beautifulsoup、html-parsing、python-requests

对于如何使用BeautifulSoup导航HTML，我有点困惑。 import requests from bs4 import BeautifulSoup url = 'http://examplewebsite.com' source = requests.get(url) content = source.content soup = BeautifulSoup(source.content, "html.parser") # Now I navigate the soup for a in soup.findAll('a'):

浏览 5提问于2015-10-29得票数 8

回答已采纳

2回答

在python中使用BeautifulSoup爬行sqlite网站时无法获得正确的href值

python、html、sqlite、beautifulsoup、web-crawler

我试着使用BeautifulSoup在sqlite下载网页上获得sqlite下载链接。我可以看到正确的href值时，检查网页的铬。但是，我不能像使用代码那样使用python获得href值。 import urllib.request import re from bs4 import BeautifulSoup url = "https://www.sqlite.org/download.html" data = urllib.request.urlopen(url).read() parsed_html = BeautifulSoup(data, 'html.

浏览 9提问于2022-08-18得票数 0

回答已采纳

2回答

在Python 3中查找列表中字符串的Regex

python、regex

如何从列表中获取base.php?id=5314？ import urllib.parse import urllib.request from bs4 import BeautifulSoup url = 'http://www.fansubs.ru/search.php' values = {'Content-Type:' : 'application/x-www-form-urlencoded', 'query' : 'Boku dake ga Inai Machi' } d = {} data

浏览 1提问于2016-03-06得票数 3

回答已采纳

1回答

如何在不知道标签/类的情况下使用搜索词来抓取网页？

python、python-3.x、web-scraping、beautifulsoup、scrapy

我正在使用Python3.7和BeautifulSoup(4)进行一个项目来实现刮取解决方案。注意:我搜索了一个解决我的问题的方法，但是我找不到任何解决方案，因为它不同于我们通常需要的抓取方法。所以，这就是为什么，不要标记为重复，请！该项目分为两部分：我们已经抓取了谷歌搜索结果URL(例如，前5)的基础上的搜索词。然后，我们必须从这些页面中抓取搜索结果的URL，以获取搜索词的相关信息，因此我们不知道这些结果页面的实际类/标记。那么，我们如何在不知道实际标签/类的情况下，从网页中获取搜索词的相关信息呢？以下是我迄今所做的工作： soup = Beautiful

浏览 0提问于2019-06-13得票数 3

2回答

美汤:如何从列表中获取特定链接？

python、python-3.x、list、web-scraping

使用BeautifulSoup如何从网页中获取链接，将它们存储在列表中，然后打印出某个链接？这就是我到目前为止所知道的： from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("https://example.com/") content = BeautifulSoup(html.read(), "html.parser") for link in content.find_all("a"): print(link.get("

浏览 48提问于2021-02-25得票数 1

回答已采纳

2回答

BeautifulSoup -解析文件中的数值

python、python-3.x、web-scraping、beautifulsoup

我想从标签中解析出1& 1999： ''' <li><a **href="/1/"**>|<</a></li> <li><a accesskey="p" **href="/1999/"** rel="prev">< Prev</a></li> <li><a href="//c.xkcd.com/random/comic/">Random&

浏览 52提问于2018-06-01得票数 1

2回答

美丽的汤:获取子节点的内容

python、beautifulsoup

我有以下python代码： def scrapeSite(urlToCheck): html = urllib2.urlopen(urlToCheck).read() from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) tdtags = soup.findAll('td', { "class" : "c" }) for t in tdtags: print t.encode('latin1

浏览 1提问于2010-10-21得票数 1

回答已采纳

1回答

BeautifulSoup找不到特定的标签

python、beautifulsoup

我想把股票市场的名字从泛欧交易所划掉。问题是BeautifulSoup找不到存储股票名称的<td>...</td>标记：该页面具有： <td class="stocks-name sorting_1" data-order="1000MERCIS"><a href="/en/product/equities/FR0010285965-ALXP/1000mercis/almil/quotes" data-order="1000MERCIS" data-title-hover="1

浏览 29提问于2021-07-02得票数 0

回答已采纳

2回答

用python中的BeautifulSoup在“a”链接中搜索图像

python、beautifulsoup

我想得到包含图像的所有<a href=''> (jpg，png，jpeg) 首先，我发现我可以下载与这个美丽汤代码的链接 for a in soup.find_all('a', href=True): print "Found the URL:", a['href'] 但是我得到了所有的字符串我只想得到图像。 from bs4 import BeautifulSoup import requests import re url = requests.get("https://8ch.net/a/res/

浏览 0提问于2018-10-16得票数 0

回答已采纳

1回答

将子href提取到BeautifulSoup列表

python、beautifulsoup、href、urllib2

我正在学习python，并使用BeautifulSoup来抓取一些网页。我要做的是找到第一个'td‘的子'a’，提取href并将其添加到列表中。如何以及在何处将href添加到单元格文本？ import urllib2 from BeautifulSoup import BeautifulSoup def listify(table): """Convert an html table to a nested list""" result = [] rows = table.findAll('t

浏览 2提问于2013-01-10得票数 0

回答已采纳

1回答

需要将BeautifulSoup href结果限制为href字符串中打开括号的第一次出现或帐户。

python、beautifulsoup、href、limit

我只想在的完整替换月NPI文件部分中< href数据发布。在我不想要的每周增量NPI文件中还有其他< href数据分发文件。下面是获取每月和每周部分中所有NPPES数据传播文件的代码： import subprocess import re from bs4 import BeautifulSoup import requests import wget def get_urls(soup): urls = [] for a in soup.find_all('a', href=True): ul = a.find_all(text

浏览 8提问于2022-11-30得票数 0

回答已采纳

3回答

删除所有内部javascript

javascript、python、python-2.7、beautifulsoup

下面是一个简单的BeautifulSoup代码，它有两个内部JavaScript(不要责怪JavaScript，它只是为了测试目的)。 from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title> <script> var x = 5; var y = 6; document.getElementById("demo").innerHTML = x + y; //docu

浏览 0提问于2014-05-07得票数 2

3回答

利用优美汤通过文本获取Href

python、beautifulsoup

我使用“请求”和“漂亮汤”来搜索包含特定文本的网页中的所有href链接。我已经做到了，但是如果文本是新的一行，美丽的汤不会“看到”它，也不会返回那个链接。 soup = BeautifulSoup(webpageAdress, "lxml") path = soup.findAll('a', href=True, text="Something3") print(path) 示例：像这样，它返回Something3文本的Href： ... <a href="page1/somethingC.aspx">Someth

浏览 5提问于2019-04-10得票数 0

回答已采纳

3回答

用Python BeautifulSoup解析HTML表格

python、html、beautifulsoup

我正在尝试使用BeautifulSoup解析我上传到的html表，以便获得三列(0到735，0.50到1.0和0.5到0.0)作为列表。为了解释原因，我希望整数0-735是键，十进制数是值。在阅读了许多其他帖子后，我得出了以下结论，这与创建我想要的列表不太一样。它所做的就是显示表中的文本，如这里所示的 from bs4 import BeautifulSoup soup = BeautifulSoup(open("fide.html")) table = soup.find('table') rows = table.findAll('tr'

浏览 0提问于2013-06-23得票数 5

回答已采纳

1回答

Regex match子字符串忽略HTML内部的出现

php、regex

我需要删除一个数字代码，在字符串前加上下划线，这些字符串可能包含也可能不包含相同的子字符串，这些字符串可能在HTML标记中接受，也可能不包含。示例:从以下字符串中删除_1234： this is my string_1234 <a href="link_1234">this is my html nested string_1234</a> 我只想： $regex = '#\_(\d+)$#'; $name = preg_replace($regex, '', $name); 但是我也要删除HREF中的部分，所以我

浏览 5提问于2022-09-28得票数 0

1回答

BeautifulSoup find()与现有文本不匹配

html、parsing、beautifulsoup

使用BeautifulSoup，我试图找到一个带有正则表达式的<th>text</th>-Tag。但是正则表达式与已确认的文本不匹配。 soup.find('th', text=re.compile("test")) 查看输出黑马的测试用例 import re from bs4 import BeautifulSoup html = """ <table> <tr> <th>brown dog</th> <th><a

浏览 5提问于2020-05-13得票数 1

回答已采纳

2回答

在python +漂亮汤上使用regex

python、parsing

我有这样一个html页面： <td class="subject windowbg2"> <div> <span id="msg_152617"> <a href= SOME INFO THAT I WANT </a> </span> </div> <div> <span id="msg_465412"> <a href= SOME INFO THAT I WANT</a> </span&

浏览 4提问于2014-05-23得票数 1

回答已采纳

1回答

美汤不经加工而粘在一起

python、web-scraping、beautifulsoup

我试图理解BeautifulSoup，并试图找到facebook.com中的所有链接，并迭代其中的每个链接. 下面是我的code...it工作得很好，但是一旦它找到了Linkedin.com并在它上迭代，它就会被困在这个URL - 之后的某个点。当我单独运行Linkedin.com时，我没有任何问题. 这是否是我使用Ubuntu操作system..Im的一个限制. import urllib2 import BeautifulSoup import re def main_process(response): print "Main process started"

浏览 1提问于2014-03-26得票数 1

回答已采纳

3回答

(Python 3) - Regex问题不返回匹配项

python、python-3.x、regex、web-scraping、pastebin

有点背景故事。我正在尝试抓取pastebin的存档页面，并且只获取粘贴的ID。ID长度为8个字符，指向粘贴的示例链接如下："“ 我目前编写的代码能够从标记中获取所有数据，但它也可以检索不必要的信息。 import requests import re from bs4 import BeautifulSoup def get_recent_id(): URL = requests.get('https://pastebin.com/archive', verify=False) href_regex = r"<a href=\

浏览 3提问于2021-05-27得票数 0

2回答

Python中的正则表达式-从网站抓取数据

python

我是Python的新手，我尝试从网站中提取xml文件并将其加载到数据库中。我一直在使用Python中的Beautiful Soup模块，但是我不能拉入我想要的特定xml文件。在网站源代码中，它看起来如下所示： <a href="ReportName I want 20130101.XML">ReportName.XML</a> <a href="ReportName I want 20120101.XML">ReportName.XML</a> <<a href="ReportName I do

浏览 0提问于2013-01-23得票数 1

1回答

如何使用BeautifulSoup解析javascript内容

javascript、python、parsing、beautifulsoup、python-re

我很难在HTML中解析一些变量下面是HTML的示例： <script type="text/javascript"> var ASPath = "\/modules\/pm_advancedsearch4\/"; var ASSearchUrl = "https:\/\/golf-land.fr\/module\/pm_advancedsearch4\/advancedsearch4"; var as4_orderBySalesAsc = "Meilleures ventes

浏览 8提问于2022-03-08得票数 -1

2回答

如何从HTML中提取URL

python、regex、beautifulsoup、urlopen

我是个网络抓取的新手。我这样做如下 from urllib.request import urlopen from bs4 import BeautifulSoup import re html = urlopen("http://chgk.tvigra.ru/letopis/?2016/2016_spr#27mar") soup = BeautifulSoup(html, "html.parser") res = soup.find_all('a', {'href': re.compile("r'\b?20\b&

浏览 3提问于2017-08-02得票数 0

回答已采纳

2回答

TypeError:在网页文本上使用re.findall时所期望的字符串--为什么？

python、regex

我正在努力学习如何用BeautifulSoup来筛选刮擦。 from urllib import urlopen from BeautifulSoup import BeautifulSoup import re webpage = urlopen('http://feeds.feedburner.com/zenhabits').read() patFinderTitle = re.compile('<h4 class="itemtitle"><a href=(.*)</a></h4>') fin

浏览 0提问于2011-05-16得票数 0

1回答

从中文网站获取python漂亮汤页脚

python、web-scraping、beautifulsoup

我正试着从一个中文网站上获取数据。我已经找到了它在html中的位置，但需要帮助提取文本。我到目前为止： from bs4 import BeautifulSoup import requests page = 'http://sbj.speiyou.com/search/index/subject:/grade:12/gtype:time' r = requests.get(page) r.encoding = 'utf-8' soup = BeautifulSoup(r.text) div = soup.find('div', class

浏览 2提问于2014-04-03得票数 0

回答已采纳

3回答

BeautifulSoup刮擦结果未显示

python、web-scraping、beautifulsoup、lxml.html

我正在玩BeautifulSoup从网站上抓取数据。因此，我决定为有史以来最伟大的100部电影在网上搜索网站。以下是指向网页的链接：我从网站上导入了HTML，我可以在上面使用漂亮的汤。但是当我想要得到100个电影标题的列表时，我得到了一个空的列表。下面是我写的代码。 import requests from bs4 import BeautifulSoup URL = "https://www.empireonline.com/movies/features/best-movies-2/" response = requests.get(URL) top100_webp

浏览 20提问于2022-05-09得票数 -2

回答已采纳

1回答

瑞吉斯+美汤

python、regex、beautifulsoup

我已经从BeautifulSoup中分离出一行我想运行regex的HTML，但是我一直在获取AttributeError: 'NoneType' object has no attribute 'groups' 我读到了另一个堆栈溢出问题()，但我看不出我需要做什么来修复这个问题的版本。这是代码的相关部分(提供了url)：与罗伯的正确regex更新仍然抛出dat属性错误： soup = BeautifulSoup(urlopen(url).read()).find("div",{"id":"page"}

浏览 0提问于2015-05-20得票数 0

回答已采纳

1回答

我的回调函数和正则表达式无法让漂亮的汤发挥作用

python、regex、web-scraping、beautifulsoup

因此，我尝试使用以下代码从href属性匹配模式/how- to - use /a-zA-Z+的网站中抓取所有标记代码如下： import requests from bs4 import BeautifulSoup import re webpage = requests.get('https://www.talkenglish.com/vocabulary/top-1500-nouns.aspx').content soup = BeautifulSoup(webpage, "html.parser") def has_how_to_use(tag):

浏览 13提问于2021-10-27得票数 1

回答已采纳

1回答

error TypeError：'NoneType‘对象不能迭代是什么意思？

python

from bs4 import BeautifulSoup import requests import time urls = ['http://www.soku.com/search_playlist/q_python_orderby_1_limitdate_0?site=14&page={}&spm=a2h0k.8191403.0.00'.format(str(i)) for i in range(1,30,1)] def UUrl(urls): def Url(url): single_urls = [] t

浏览 0提问于2017-03-11得票数 1

1回答

如何使用web爬虫获取开放url和获取其内容

python、web-crawler

我正在尝试使用网络爬虫从体育、主页、世界、商业和技术中获取新闻内容，我有这样的代码，它可以抓取页面的标题和url，如何获取页面的url并打开它并获取它的正文内容。 #python code import requests from bs4 import BeautifulSoup url = "https://www.aaa.com" page = requests.get(url) soup = BeautifulSoup(page.content, 'html.parser') print(soup.prettify()) headlines = sou

浏览 2提问于2021-11-30得票数 0

回答已采纳

1回答

如何用美丽的汤刮掉youtube的视频描述

python、selenium、web-scraping、beautifulsoup、youtube

我试图在网上刮一个列表的YouTube视频，我想收集每一个视频的YouTube描述。不过，我并不成功，亦不明白原因何在。任何帮助都是非常感谢的。(有争议的Youtube视频：) element_titles = driver.find_elements_by_id("video-title") result = requests.get(element_titles[1].get_attribute("href")) soup = BeautifulSoup(result.content) description = str(soup.find("div

浏览 11提问于2022-05-23得票数 1

回答已采纳

2回答

如何在html源代码中提取href属性

python、html、web-scraping、beautifulsoup

这是我正在处理的HTML源代码： <a href="/people/charles-adams" class="gridlist__link"> 所以我想要做的是提取href属性，在本例中应该是"/people/charles-adams"，并带有漂亮的So模块。我需要这个，因为我想获得的特定网页的soup.findAll方法的html源代码。但是我正在努力从网页中提取这样的属性。有人能帮我解决这个问题吗？附言:我正在使用这个方法，通过Python模块beautifulSoup来获取html源代码： request = reque

浏览 24提问于2019-09-23得票数 0