仅在BeautifulSoup元素内查找文本_BeautifulSoup查找文本内容_BeautifulSoup无法正确查找元素 - 腾讯云开发者社区

python、regex、beautifulsoup

我想用BeautifulSoup识别大型文本文档中的一个拆分点。因此，我已经制定了一个正则表达式来查找出现特定字符串的标记。问题是，如果在我正在搜索的字符串中有进一步的格式化/子节点，它就无法工作。 t1 = BeautifulSoup("<p class=\"p p8\"><strong>Question-And-Answer</strong></p>") t2 = BeautifulSoup("<p class=\"p p8\"><strong>Question

浏览 0提问于2019-02-07得票数 3

回答已采纳

1回答

如何从Google搜索信息栏中抓取文本数据

python、beautifulsoup、request

我需要从谷歌搜索引擎信息栏中抓取文本数据。如果有人使用关键字“西门子”在谷歌搜索引擎上搜索。一个小的信息栏出现在谷歌搜索结果的右侧。我想为那个信息栏收集一些文本信息。我如何使用requests和Beautifulsoup来做到这一点呢？下面是我写的一些代码。 from bs4 import BeautifulSoup as BS import requests from googlesearch import search from googleapiclient.discovery import build url = 'https://www.google.com/search?

浏览 75提问于2019-03-15得票数 1

回答已采纳

1回答

将子href提取到BeautifulSoup列表

python、beautifulsoup、href、urllib2

我正在学习python，并使用BeautifulSoup来抓取一些网页。我要做的是找到第一个'td‘的子'a’，提取href并将其添加到列表中。如何以及在何处将href添加到单元格文本？ import urllib2 from BeautifulSoup import BeautifulSoup def listify(table): """Convert an html table to a nested list""" result = [] rows = table.findAll('t

浏览 2提问于2013-01-10得票数 0

回答已采纳

1回答

无法从Google搜索页面获取CSS类

python、beautifulsoup、spell-checking

我使用解析谷歌搜索，但得到的列表是空的。我想用谷歌的“你是什么意思吗？”来做拼写检查。 import requests from bs4 import BeautifulSoup import urllib.parse text = "i an you ate goode maan" data = urllib.parse.quote_plus(text) url = 'https://translate.google.com/?source=osdd#view=home&op=translate&sl=auto&tl=en&t

浏览 28提问于2019-11-30得票数 3

1回答

根据下拉列表中的选择填充文本框

javascript

我正在尝试根据选择框选择填充文本框。我有示例代码(请参见下面)，问题是:如果我注释<form>标记，下面的代码可以工作，如果我取消对<form>标记的注释，它不是working...could您可以看到下面的代码我做错了什么附言:我需要有一个表单，因为我有许多其他的表单字段，如文本框和选择下拉菜单，我只需要填充一个文本框为一个选择。代码如下： <html> <script type="text/javascript"> function test() { var sel

浏览 3提问于2014-04-11得票数 2

1回答

如何避免使用NavigableString时出现“BeautifulSoup”错误，并找到href的文本？

beautifulsoup

这就是我所拥有的： import requests from bs4 import BeautifulSoup from urllib.parse import urljoin url = "http://python.beispiel.programmierenlernen.io/index.php" doc = requests.get(url).content soup = BeautifulSoup(doc, "html.parser") for i in soup.find("div", {"class":"

浏览 6提问于2019-11-10得票数 1

回答已采纳

3回答

Python Selenium看不到数据

javascript、python、selenium

好吧，我刚开始使用Python和HTML/Javascript编程，但到目前为止，我已经设法抓取了一两个网站。然而，我遇到了这个网站，它把我逼疯了。我的代码是： #import libraries from urllib.request import urlopen as ureq from bs4 import BeautifulSoup as soup from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support

浏览 0提问于2017-11-18得票数 2

2回答

与urlopen一起使用的合适的javascript解析器

python、python-2.7、beautifulsoup

我正在尝试以下操作： from urllib2 import urlopen from BeautifulSoup import BeautifulSoup url = 'http://search.wcad.org/Property-Detail?PropertyQuickRefID=R000017&PartyQuickRefID=O0532572' soup = BeautifulSoup(urlopen(url).read()) print soup 打印语句显示非常复杂的文本结构，很难提取变量。提取像Legal Description这样的变量的更好方法是什么

浏览 0提问于2017-05-24得票数 0

3回答

没有任何警告的Python Web抓取错误

python、web-scraping、beautifulsoup、python-requests

我正在尝试从网页中抓取一些文本，并使用以下代码将它们保存在文本文件中(我正在打开一个名为links.txt的文本文件中的链接)： import requests import csv import random import string import re from bs4 import BeautifulSoup #Create random string of specific length def randStr(chars = string.ascii_uppercase + string.digits, N=10): return ''.join(ra

浏览 44提问于2021-08-26得票数 1

回答已采纳

2回答

无法从python中的html页面提取文本

python、beautifulsoup、html-parsing

我对网络抓取非常陌生。我读到了关于BeautifulSoup的文章，并试图使用它。但我无法提取具有给定类名“company-desc-and-排序容器”的文本。我甚至不能从html页面中提取标题。这是我尝试过的代码： from BeautifulSoup import BeautifulSoup import requests url= 'http://fortune.com/best-companies/' r = requests.get(url) soup = BeautifulSoup(r.text) #print soup.prettify()[0:10

浏览 5提问于2016-12-20得票数 1

回答已采纳

1回答

在python中迭代HTML中的"class“属性？

python、html、beautifulsoup

我有一个网站的HTML字符串。下面是其中的一部分。 <p class="news-body"> <a href="/ci/content/player/45568.html" target="new">Paul Harris,</a> the South African spinner, is to retire at the end of the season, bringing to an end a 14-year first-class career. </p> <p class=&#

浏览 1提问于2013-01-10得票数 1

2回答

BeautifulSoup返回“none”

python、html、beautifulsoup

我接受获取h2标记文本，但它显示我‘无’。它是存在的。我尝试将page.content 'html.parser‘更改为lxml，依此类推。它仍然不起作用。 from bs4 import BeautifulSoup import requests headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"} url = 'http://ww

浏览 0提问于2019-07-21得票数 1

2回答

在beautifulsoup4中提取具有子元素的标记内的文本节点

python、web-scraping、beautifulsoup

我正在解析和抓取的HTML具有以下代码： <li> <span> 929</span> Serve Returned </li> 如何使用Beautifulsoup仅提取<li>的文本节点(在本例中为"serve returned“ .string不起作用，因为<li>有一个子元素，而.text返回<span>中的文本。

浏览 2提问于2015-04-23得票数 6

1回答

Selenium - XPATH -使用innerHTML搜索元素

html、python-2.7、selenium、xpath、beautifulsoup

我正在学习Selenium，并且对XPATH有一个很好的理解。我遇到的一个问题是，在一个网页上，有一个我想选择的元素，它有一个动态生成的id和class。我尝试过以下几种方法： code = driver.find_element_by_xpath("//*[contains(@text='someUniqueString')]") 但是，该元素没有任何文本。相反，它是一个带有JSON的<code>元素。 <codestyle="display: none" id="something-crazy-dynamic

浏览 0提问于2017-03-14得票数 11

回答已采纳

2回答

在美丽的汤中打印最后一次<td>

python、beautifulsoup、html-table、html-parsing

我必须从一个复杂的HTML文档中读取，其中一个表没有ID，每个表都有未定义的tr标记数。我想在最后一个<tr>标记的td中打印文本。在解析树时，我找不到打印最后一个子文件的任何内容。我想打印4,4.1,4.2 <table border=0 bgcolor=#000000 cellspacing=1 width="100%" <tr bgcolor="#FFFFFF"> <td>1</td> <td>1.1</td> <td>1.2</td&

浏览 5提问于2014-03-28得票数 2

回答已采纳

2回答

如何在div中不刮掉第一个x号的p标签，并打印其余的？

python、web-scraping

过去几天我一直在学习蟒蛇。今天我带着一个叫做网络抓取的话题来了。我正在尝试刮除前3p标记之外的div中的所有p标记。由于p标记没有类或id，所以我找不到不刮它们的方法。我的代码： from bs4 import BeautifulSoup data = '''<div class="one"> <p style="color:red">Dummy Text</p> <p style="color:red">Unwanted Text</p>

浏览 0提问于2021-08-25得票数 2

回答已采纳

3回答

在BeautifulSoup中查找标记的字符串索引

python、html、string、beautifulsoup

BeautifulSoup是否提供了一种方法来获取标记的字符串索引或它所来自的HTML字符串中的文本？例如： from bs4 import BeautifulSoup html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p>

浏览 0提问于2017-06-15得票数 4

1回答

用漂亮的汤从<span>上获取短信

python-3.x、beautifulsoup、html-parsing

我想从span标签中获取文本，但是我有这样的问题。我写了这个， import bs4 as bs import urllib.request page = urllib.request.urlopen('http://www.accuweather.com/en/az/baku/27103/current-weather/27103').read() soup = bs.BeautifulSoup(page, 'html.parser') print(soup.find_all('li', class_='wind')) 它

浏览 4提问于2016-11-07得票数 0

回答已采纳

1回答

提取漂亮汤中的HTML表格，得到重复元素

python、html、python-3.x、beautifulsoup

第二次尝试网络抓取，并遇到了一个奇怪的错误。最终的结果是废弃该表并将每一行放入一个SQL表中，但这是90年代风格的嵌套表，没有分区，也没有类。我读过关于使用结构的文章，但我不知道怎么做。我一直很好奇，我是否可以从某一行开始作为“结构”，但这也有其局限性。当前的挑战是，在移到下一行之前，此代码将拾取重复的元素，并忽略其他元素。 import bs4 import urllib from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'http://www.tex

浏览 1提问于2018-09-07得票数 0

3回答

查找所有机场的名称和代码

python、beautifulsoup

我正在努力刮数据，以获得我需要的文本。我想找到阿伯丁的线路和后面的所有线路，其中包含机场信息。下面是html层次结构的图片：我试图用以下代码定位类"i1“中的文本元素： import requests from bs4 import BeautifulSoup page = requests.get('http://www.airportcodes.org/') soup = BeautifulSoup(page.text, 'html.parser') table = soup.find('div',attrs={"

浏览 2提问于2020-06-21得票数 1

回答已采纳

19回答

如何按类查找元素

python、html、web-scraping、beautifulsoup

我在使用Beautifulsoup解析带有"class“属性的HTML元素时遇到了问题。代码如下所示 soup = BeautifulSoup(sdata) mydivs = soup.findAll('div') for div in mydivs: if (div["class"] == "stylelistrow"): print div 在脚本结束后，我在同一行得到了一个错误。 File "./beautifulcoding.py", line 130, in getlanguage

浏览 6提问于2011-02-18得票数 532

回答已采纳

3回答

在网页中查找文本

python、html

我正在使用此代码连接到supremenewyork，并查明某一项是缺货还是缺货。搜索似乎不起作用 import requests from bs4 import BeautifulSoup url = 'https://www.supremenewyork.com/shop/jackets/xqfitokcd/sqt46dyvb' res = requests.get(url) html_page = res.content soup = BeautifulSoup(html_page, 'html.parser') text = soup.find_all(

浏览 2提问于2020-04-25得票数 1

回答已采纳

2回答

解析HTML --为什么这个文档必须用文本而不是标记来解析呢？

python、html、python-3.x、beautifulsoup

我使用了一个Python模块，它可以抓取一个站点，并在下面的代码中注意到它处理不同的表的方式不同： def player_stats(request, stat, numeric=False, s_index=False): """ """ supported_tables = ["totals", "per_minute", "per_poss", "advanced", "playoffs_per_

浏览 3提问于2021-01-14得票数 0

回答已采纳

1回答

文本提取:使用了所有的方法，但仍然停滞不前。

python、beautifulsoup、webpage、extraction、persian

我想从一个网页中提取几个文本。我搜索了StackOverFlow (以及其他网站)，以找到一个合适的方法。我使用HTML2TEXT、BEAUTIFULSOUP、NLTK和其他一些手动方法进行提取，例如： HTML2TEXT在离线(=保存的页面)上工作，我需要在线完成它。 BS4无法在Unicode上正常工作(我的页面使用UTF8波斯编码)，也不会提取文本。它还返回HTML标记\代码。我只需要渲染文本。 NLTK不会处理我的波斯文的。即使在尝试使用urllib.request.urlopen打开页面时，也会遇到一些错误。所以，正如你所看到的，在尝试了几种方法之后，我陷入了困境。

浏览 3提问于2015-01-16得票数 0

回答已采纳

3回答

使用beautifulSoup和Python抓取h3和div标记中的文本

python、html、selenium、beautifulsoup、web-crawler

我没有使用python，BeautifulSoup，Selenium等的经验，但我渴望从网站上抓取数据，并将其存储为csv文件。我需要的单个数据样本编码如下(单行数据)。 <div class="box effect"> <div class="row"> <div class="col-lg-10"> <h3>HEADING</h3> <div><i class="fa user"></i> 

浏览 0提问于2017-10-25得票数 7

回答已采纳

2回答

使用BeautifulSoup，如何只从特定的选择器中获取文本，而不使用子程序中的文本？

python、web-scraping、beautifulsoup、html-parsing

我不知道如何编写BeautifulSoup代码，以便它只给出所选标签中的文本。我得到了更多像它的孩子的文字(伦)！例如： from bs4 import BeautifulSoup soup = BeautifulSoup('<div id="left"><ul><li>"I want this text"<a href="someurl.com"> I don\'t want this text</a><p>I don\'t want this e

浏览 4提问于2016-09-28得票数 3

回答已采纳

3回答

如何提取带有标签的标签内的文本？

python、beautifulsoup

我想使用beautifulsoup解析html页面。我希望在不删除html标签的情况下提取标签中的文本。例如，示例输入： <a class="fl" href="https://stackoverflow.com/questio..."> Angular2 <b>Router link not working</b> </a> 样本输出： 'Angular2 <b>Router link not working</b>' 我试过这样做： from bs4 import

浏览 10提问于2019-10-11得票数 2

回答已采纳

1回答

如何从<a>获取href时，搜索文本的美丽的汤和

python、beautifulsoup

我正在使用selenium、BeautifulSoup和Selenium执行数据提取。此页面已分页。我知道这个链接存在于页面上的某个地方： <a href="/DP/changeQueryPageAction.do?pager.offset=20">[ Next > ]</a> 这个url在页面上的一个随机位置，所以我需要做的是找到文本并提取href。我如何让bs4找到文本，并给我href？谢谢

浏览 11提问于2019-02-24得票数 0

回答已采纳

3回答

jquery不适用于在表中设置同级td文本

jquery

嗨，我有以下表格数据： <tr> <td><input type="text" data-usage="payment" /></td> <td><input type="text" data-usage="payment" /></td> <td data-usage="amount"></td> </tr> 然后我尝试这样做，当用户离开一个文本框时，data-usage="

浏览 4提问于2013-07-09得票数 1

回答已采纳

3回答

从标签中获取数据(BeautifulSoup)

python、beautifulsoup

简要说明:我有一个遍历页面元素，然后返回数据的脚本。但我希望它返回的数据不是在元素中，而是按顺序。 import argparse, os, socket, urllib2, re from bs4 import BeautifulSoup pge = urllib2.urlopen("").read() src = BeautifulSoup(pge) body = src.findAll('body') el = body[0].findChildren() for s in el: cname = s.get('class')

浏览 0提问于2014-01-21得票数 1

1回答

使用Google Chrome扩展对Python进行Web抓取

javascript、python

嗨，我是一个Python新手，我正在抓取一个网页。我正在使用Google Chrome Developer扩展来识别我想要抓取的对象的类。但是，我的代码返回一个空的结果数组，而屏幕截图清楚地显示这些字符串在HTML代码中。 import requests from bs4 import BeautifulSoup url = 'http://www.momondo.de/flightsearch/?Search=true&TripType=2&SegNo=2&SO0=BOS&SD0=LON&SDP0=07-09-2016&SO

浏览 0提问于2016-08-23得票数 0

1回答

我应该如何刮刮由一个'p‘标签所代表的网站的文本？

python、web-scraping

我是Python的新手，正在通过从中提取数据来练习web抓取。我目前面临两个问题：如何刮掉由标记表示的文本？它是网页上众多内容之一。例如，第一个就在作者姓名之前。我导出的CSV文件只包含标题，而不包含文本。为什么？我该怎么解决这个问题？这是密码，非常感谢你的帮助。 import requests import pandas as pd from bs4 import BeautifulSoup from pandas import DataFrame import csv import re f = open ('nprtest1.csv', '

浏览 3提问于2020-01-06得票数 0

1回答

使用BeautifulSoup从`div`中的`p`中提取文本

python、python-3.x、web-scraping、beautifulsoup

我对使用Python进行网络抓取非常陌生，从HTML中提取嵌套文本(确切地说，是div中的p)真的很难。这是我到目前为止所得到的： from bs4 import BeautifulSoup import urllib url = urllib.urlopen('http://meinparlament.diepresse.com/') content = url.read() soup = BeautifulSoup(content, 'lxml') 这可以很好地工作： links=soup.findAll('a',{'title&#

浏览 3提问于2016-04-20得票数 9

回答已采纳

1回答

更改Google翻译器文本

jquery、css、google-translator-toolkit

你能改变谷歌翻译器上的文本以显示翻译器而不是选择语言吗？我试过： $('#google_translate_element').text("Translator"); $('#google_translate_element span').text("Translator");

浏览 2提问于2015-12-04得票数 0

回答已采纳

2回答

元素中子节点的控制方法，在xpath之后使用lxml给出。

python、xpath、lxml

我在我的pc机中试用了以下示例代码 from bs4 import BeautifulSoup from lxml import etree, html import requests URL = "https://en.wikipedia.org/wiki/Nike,_Inc." HEADERS = ({'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 \ (

浏览 7提问于2022-11-02得票数 0

1回答

Python `bs4.BeautifulSoup.get_text()` -仅从即时级别获取文本

python、beautifulsoup

假设我有一个超文本标记语言片段，并且我只想从直接级别执行get_text： from bs4 import BeautifulSoup s = "<div><p><strong>College Type:</strong> \r\nPrivate Un-aided\r\n</p></div>" soup = BeautifulSoup(s, 'lxml') print soup.find('p').get_text() 打印的内容： College Type: Privat

浏览 2提问于2016-10-20得票数 2

1回答

BeautifulSoup登录-如何获取具有特定属性和值的CRSF域

python、http、cookies、beautifulsoup

我使用以下脚本对登录到LinkedIn进行身份验证，然后使用Beautiful Soup抓取HTML。登录身份验证没有问题(我看到了我的帐户信息)，但是当我试图加载页面时，我得到一个"fs.config({"failureRedirect})“错误。 import cookielib import os import urllib import urllib2 import re import string import sys from bs4 import BeautifulSoup username = "MY USERNAME" password =

浏览 3提问于2015-02-03得票数 1

2回答

如何从html文件中删除所有不必要的标签和符号？

python、html、parsing、beautifulsoup、xbrl

我正在尝试使用Python的BeautifulSoup或HTMLParser从美国证券交易委员会的EDGAR系统上的10-K报告(例如公司的代理报告)中提取“唯一”文本信息。然而，我使用的解析器似乎不能很好地处理‘txt’格式的文件，包括很大一部分毫无意义的符号和标签以及一些xbrl信息，这些信息根本不是必需的。然而，当我将解析器直接应用于‘htm’格式的文件时，解析器似乎工作得相对较好。 """for Python 3, from urllib.request import urlopen""" from urllib2 import urlo

浏览 2提问于2017-05-09得票数 2

1回答

BeautifulSoup4无缘无故返回空列表

python、beautifulsoup、python-3.4

我在txt文件中有一些数据，我试图在这个文件中找到一些特定的单词。 import re from bs4 import BeautifulSoup with open ("myfile.txt") as f: soup = BeautifulSoup(f) print (soup.find_all("DLC")) 文件中至少有5 DLC，但是输出是一个空列表。我把soup = BeautifulSoup(f)改成了soup = BeautifulSoup(f),"html.parser"，但没有起作用。为什么它返回一个空列表

浏览 3提问于2016-05-04得票数 0

回答已采纳

3回答

如何用BeautifulSoup理解Python中的“递归”

python、html、beautifulsoup、web-crawler

我正在使用Python中的BeautifulSoup处理一个“递归”项目。我读过一份正式文件和许多问题，但我仍然不明白。 from bs4 import BeautifulSoup s = "<div>C<p><strong>A</strong>B</p></div>" soup = BeautifulSoup(s, 'html.parser') print(soup.find("p", recursive=False))给出None 是不是因为我们找不到<

浏览 2提问于2021-11-08得票数 1

回答已采纳

2回答

find_all()只返回列表的第一项

python、beautifulsoup

我在使用BeautifulSoup和find_all()方法时遇到了一些问题。我尝试获取所有p标记之间的文本，但它只返回列表的第一个元素。实际上list只有一项。为什么find_all()方法只返回一项？这是我想提取的代码的一部分： <div class="post-content"> <p>If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its qual

浏览 0提问于2019-08-01得票数 1

3回答

如何在python中获取两组标记之间的文本

python、html、url、beautifulsoup、tags

我试图在标签之间获取文本，也在标记集之间获取文本，我已经尝试过了，但我没有得到我想要的。有人能帮忙吗？我真的很感激。 text = ''' <b>Doc Type: </b>AABB <br /> <b>Doc No: </b>BBBBF <br /> <b>System No: </b>aaa bbb <br /> <b>VCode: </b>040000033 <br /> <b>G Code: </b&

浏览 22提问于2022-03-17得票数 1

回答已采纳

1回答

迭代python中XML标记中的所有子标记和字符串，而不指定子标记名。

python、xml、parsing

我的问题是来自的加法，但我不打算在附加问题中使用答案部分。如果我有以下XML文件的一部分： <eligibility> <criteria> <textblock> Inclusion Criteria: - women undergoing cesarean section for any indication - literate in german language Exclusion Criteria: - histor

浏览 0提问于2018-02-01得票数 0

回答已采纳

1回答

抓取WSJ文章并仅检索文本

python、web-scraping

我试图从“华尔街日报”的一篇文章中抓取文本(实际上，我需要多篇文章，但目前我只是试图从这篇WSJ文章中抓取文本)。我使用Python3.x，我使用下面的代码： import requests from bs4 import BeautifulSoup url = 'https://www.wsj.com/articles/SB120584797987545053' headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firef

浏览 6提问于2021-12-14得票数 0

2回答

在<img alt标记中查找文本的BeautifulSoup

python、regex、beautifulsoup

下面是我从这行Python代码中得到的结果 listm = soup.findAll('td',{'class':'thumb'}) 当我遍历listm时，下面是一个项目的示例... <a href="/property-search/property-details/1021206?StrtNum=1507"><img alt="1507 BOSTWICK LN" src="/res/slir/w75-h57-c4:3/propertyimages/20120904/BB/DSCN073

浏览 0提问于2013-01-28得票数 2

回答已采纳

2回答

BeautifulSoup不返回元素的子元素

python、beautifulsoup

我是网络抓取的新手，一直在使用BeautifulSoup从一个赌博网站上抓取数字。我试图获取某个元素的文本，但没有返回任何内容。下面是我的代码： r=requests.get('https://roobet.com/crash') soup = bs4.BeautifulSoup(r.text,'lxml') crash = soup.find('div', class_='CrashHistory_2YtPR') print(crash) 当我将汤的内容复制到记事本中并尝试使用ctrl+f查找元素时，我找不到它。我要查找的

浏览 0提问于2020-04-18得票数 0

3回答

Jquery对每个最接近的值进行计算和替换

jquery

我在尝试替换计算值时遇到了一些问题。在此函数中，我们尝试使用隐藏文本框的值来计算折扣，并用新的折扣值替换原始价格。我可以得到折扣和值，但它不会将值替换到div中。我认为我的“最近”不是working...Any认为我在这里做错了什么。 $(document).ready(function() { $("#cal_discount").click(function() { $(".price_val").each(function() { var Percent = $('#percent_dis'

浏览 0提问于2012-04-28得票数 1

1回答

使用beautifulSoup Python进行解析？

python、beautifulsoup

from BeautifulSoup import BeautifulSoup soup = BeautifulSoup('http://arithmetic.zetamac.com/game?key=96823302') problem = soup.findAll('problem') print(problem) 网页上的problem是文本，但这不是print.What是这里的问题吗？

浏览 6提问于2016-10-24得票数 0

1回答

用ResultSet定位find_all

python、python-2.7、beautifulsoup

我试图使用find_all获取'span‘标记中的任何内容，该标记也是'a’标记的直接子标记，并具有itemprop="foo"属性。我在用bs4。见下文。 text = '<a><span itemprop="foo"> TEXT I WANT </span></a> \ <label><span itemprop="foo"> DO NOT WANT </span></label> \ <a><span i

浏览 1提问于2016-02-29得票数 1

回答已采纳

1回答

Python:在关键字之间解析文本

python、regex、web-scraping、beautifulsoup

我试图使用BeautifulSoup解析一种类型的网页上的文本，代码如下： import urllib import re html = urllib.urlopen('http://english.hani.co.kr/arti/english_edition/e_national/714507.html').read() content= str(soup.find("div", class_="article-contents")) 所以我的目标是在第一段中至少解析出第一句或前几句。由于段落没有被<p>标记包围，所以到目

浏览 0提问于2015-10-28得票数 3

回答已采纳