blocks|key|1662898|text|BeautifulSoup是它解析的HTML的非常宽大，您也可以将它用于HTML的块/部分：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1662899|#+-*-+coding:+utf-8+-*-
from+bs4+import+BeautifulSoup

data+=+u"""
<div+id="left-stack">
++View+In+iTunes</a>
+<span+class="price">£19.99
+<ul+class="list">
++++<li>HD+Version</li>
"""

soup+=+BeautifulSoup(data)
print+soup.find('span',+class_='price').text[1:]|code-block|syntax|javascript|1662900|指纹：|1662901|19.99|1662902|entityMap|0|LINK|mutability|MUTABLE|url|http://www.crummy.com/software/BeautifulSoup/bs4/doc/|1|http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser^0|0|D|0|D|0|N|4|1|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Y|8|@$9|Z|A|10|B|C]]|D|@$9|11|A|12|1|13]|$9|14|A|15|1|16]]|E|$]]|$1|F|3|G|5|H|7|17|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|18|8|@]|D|@]|E|$]]|$1|M|3|N|5|H|7|19|8|@]|D|@]|E|$I|J]]|$1|O|3|-4|5|6|7|1A|8|@]|D|@]|E|$]]]|P|$Q|$5|R|S|T|E|$U|V]]|W|$5|R|S|T|E|$U|X]]]]

<a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="nofollow"><code>BeautifulSoup</code></a> is <a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser" rel="nofollow">very lenient</a> to the HTML it parses, you can use it for the chunks/parts of HTML too:

<pre><code># -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

data = u"""
&lt;div id="left-stack"&gt;
 &lt;span&gt;View In iTunes&lt;/span&gt;&lt;/a&gt;
 &lt;span class="price"&gt;£19.99&lt;/span&gt;
 &lt;ul class="list"&gt;
 &lt;li&gt;HD Version&lt;/li&gt;
"""

soup = BeautifulSoup(data)
print soup.find('span', class_='price').text[1:]
</code></pre>

Prints:

<pre><code>19.99
</code></pre>

blocks|key|424354|text|这里需要一个正则表达式，但它不是解析HTML的合适工具。为此使用BeautifulSoup。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|424355|>>>+from+bs4+import+BeautifulSoup
>>>+html+=+'''
<div+id="left-stack">++++++++
++View+In+iTunes</a>
+<span+class="price">£19.99
+<ul+class="list">
++++<li>HD+Version</li>'''
>>>+soup+=+BeautifulSoup(html)
>>>+val++=+soup.find('span',+{'class':'price'}).text
>>>+print+val[1:]
19.99|code-block|syntax|javascript|424356|entityMap|0|LINK|mutability|MUTABLE|url|http://www.crummy.com/software/BeautifulSoup/bs4/doc/^0|W|D|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@$A|R|B|S|1|T]]|C|$]]|$1|D|3|E|5|F|7|U|8|@]|9|@]|C|$G|H]]|$1|I|3|-4|5|6|7|V|8|@]|9|@]|C|$]]]|J|$K|$5|L|M|N|C|$O|P]]]]

You've asked for a regular expression here, but it's not the right tool for parsing HTML. Use <a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="nofollow">BeautifulSoup</a> for this.

<pre><code>&gt;&gt;&gt; from bs4 import BeautifulSoup
&gt;&gt;&gt; html = '''
&lt;div id="left-stack"&gt; 
 &lt;span&gt;View In iTunes&lt;/span&gt;&lt;/a&gt;
 &lt;span class="price"&gt;£19.99&lt;/span&gt;
 &lt;ul class="list"&gt;
 &lt;li&gt;HD Version&lt;/li&gt;'''
&gt;&gt;&gt; soup = BeautifulSoup(html)
&gt;&gt;&gt; val = soup.find('span', {'class':'price'}).text
&gt;&gt;&gt; print val[1:]
19.99
</code></pre>

blocks|key|424258|text|您仍然可以使用BeautifulSoup进行解析，不需要完整的html：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|424259|from+bs4+import+BeautifulSoup
html="""
<div+id="left-stack">
++View+In+iTunes</a>
+<span+class="price">£19.99
+<ul+class="list">
++++<li>HD+Version</li>
"""

soup+=+BeautifulSoup(html)
sp+=+soup.find(attrs={"class":"price"})+
print+sp.text[1:]
19.99|code-block|syntax|javascript|424260|entityMap^0|7|D|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|P|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|D|@]|E|$]]]|L|$]]

You can still parse using <code>BeautifulSoup</code>, you don't need the full html:

<pre><code>from bs4 import BeautifulSoup
html="""
&lt;div id="left-stack"&gt;
 &lt;span&gt;View In iTunes&lt;/span&gt;&lt;/a&gt;
 &lt;span class="price"&gt;£19.99&lt;/span&gt;
 &lt;ul class="list"&gt;
 &lt;li&gt;HD Version&lt;/li&gt;
"""

soup = BeautifulSoup(html)
sp = soup.find(attrs={"class":"price"}) 
print sp.text[1:]
19.99
</code></pre>

blocks|key|669421|text|当前的BeautifulSoup答案只显示如何获取所有<span+class="price">标记。这样做更好：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|669422|from+bs4+import+BeautifulSoup

soup+=+"""<div+id="left-stack">++++++++
+View+In+iTunes</a>
+<span+class="price">£19.99
+<ul+class="list">
++++<li>HD+Version</li>"""

for+HD_Version+in+(tag+for+tag+in+soup('li')+if+tag.text.lower()+==+'hd+version'):
++++price+=+HD_Version.parent.findPreviousSibling('span',+attrs={'class':'price'}).text|code-block|syntax|javascript|669423|一般来说，使用正则表达式来解析像HTML这样的不规则语言是自找麻烦的。坚持使用已建立的解析器。|669424|entityMap^0|R|K|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|S|8|@]|D|@]|E|$]]|$1|M|3|-4|5|6|7|T|8|@]|D|@]|E|$]]]|N|$]]

The current BeautifulSoup answers only show how to grab all <code>&lt;span class="price"&gt;</code> tags. This is better:

<pre><code>from bs4 import BeautifulSoup

soup = """&lt;div id="left-stack"&gt; 
 &lt;span&gt;View In iTunes&lt;/span&gt;&lt;/a&gt;
 &lt;span class="price"&gt;£19.99&lt;/span&gt;
 &lt;ul class="list"&gt;
 &lt;li&gt;HD Version&lt;/li&gt;"""

for HD_Version in (tag for tag in soup('li') if tag.text.lower() == 'hd version'):
 price = HD_Version.parent.findPreviousSibling('span', attrs={'class':'price'}).text
</code></pre>

In general, using regular expressions to parse an irregular language like HTML is asking for trouble. Stick with an established parser.

blocks|key|424244|text|您可以使用这个regex：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|424245|\d%2B(?:\.\d%2B)?(?=\D%2BHD+Version)|code-block|syntax|javascript|424246|\D%2B在查找中跳过非数字，有效地断言我们的匹配(19.99)是HD+Version前面的最后一个数字。|unordered-list-item|offset|length|style|CODE|424247|这是一个regex演示。|424248|使用正则表达式中的i修饰符使匹配的大小写不敏感，如果数字可以直接放在HD+Version之前，则将%2B更改为*。|424249|entityMap|0|LINK|mutability|MUTABLE|url|http://regex101.com/r/eU2sJ8/1^0|0|0|0|3|O|5|V|A|0|4|7|0|0|9|1|Y|A|1D|1|1H|1|0^^$0|@$1|2|3|4|5|6|7|Z|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|10|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|I|7|11|8|@$J|12|K|13|L|M]|$J|14|K|15|L|M]|$J|16|K|17|L|M]]|9|@]|A|$]]|$1|N|3|O|5|6|7|18|8|@]|9|@$J|19|K|1A|1|1B]]|A|$]]|$1|P|3|Q|5|6|7|1C|8|@$J|1D|K|1E|L|M]|$J|1F|K|1G|L|M]|$J|1H|K|1I|L|M]|$J|1J|K|1K|L|M]]|9|@]|A|$]]|$1|R|3|-4|5|6|7|1L|8|@]|9|@]|A|$]]]|S|$T|$5|U|V|W|A|$X|Y]]]]

You can use this regex:

<pre><code>\d+(?:\.\d+)?(?=\D+HD Version)
</code></pre>

<ul>
<li><code>\D+</code> skips ahead of non-digits in a lookahead, effectively asserting that our match (<code>19.99</code>) is the last digit ahead of <code>HD Version</code>.</li>
</ul>

Here is a <a href="http://regex101.com/r/eU2sJ8/1" rel="nofollow">regex demo</a>.

Use the <code>i</code> modifier in the regex to make the matching case-insensitive and change <code>+</code> to<code>*</code> if the number can be directly before <code>HD Version</code>.

I would like to parse the HD price from the following snipper of HTML. I am only have fragments of the html code, so I cannot use an HTML parser for this.

<pre><code>&lt;div id="left-stack"&gt; 
 &lt;span&gt;View In iTunes&lt;/span&gt;&lt;/a&gt;
 &lt;span class="price"&gt;£19.99&lt;/span&gt;
 &lt;ul class="list"&gt;
 &lt;li&gt;HD Version&lt;/li&gt;
</code></pre>

Basically, the format would be to "Find the price before the word "HD Version" (case insensitive). Here is what I have so far:

<pre><code>re.match(r'^(\d|.){1,6}...HD\sVersion', string)
</code></pre>

How would I extract the value "19.99" from the above string?

Regex within html tags

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我想从以下HTML片段中解析HD的价格。我只有html代码的片段，所以我不能使用HTML解析器。<div id="left-stack"> View In iTunes</a> £19.99 <ul class="list"> <li>HD Version</li>基本上，格式是“在"HD版本”

问html标记中的Regex
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问html标记中的RegexEN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问html标记中的Regex
EN