问如何处理用于HTML解析的Python正则表达式？
EN

Stack Overflow用户

提问于 2018-08-20 03:10:13

回答 2查看 0关注 0票数 0

我想在HTML中获取隐藏输入字段的值。

<input type="hidden" name="fooId" value="12-3456789-1111111111" />

我想用Python编写一个正则表达式，它将返回fooId,

<input type="hidden" name="fooId" value="**[id is here]**" />

有人能在Python中提供一个示例来解析这个值的HTML吗？

回答 2

Stack Overflow用户

发布于 2018-08-20 11:51:17

Python解析是BeautifulSoup和regex之间的一个很好的过渡步骤，它比正则表达式更健壮，因为它的HTML标记解析包含在情况、空格、属性存在/缺失/顺序等方面的变化，但比使用BS进行这种基本标记提取更简单。

你的示例特别简单，因为你要查找的所有内容都位于开头的“Input”标记的属性中。下面是一个Python解析示例，显示了输入标记上的几个变体，这些变化将使regexes适合，并且还说明了如果标签在注释中，则如何不匹配它：

html = """<html><body>
<input type="hidden" name="fooId" value="**[id is here]**" />
<blah>
<input name="fooId" type="hidden" value="**[id is here too]**" />
<input NAME="fooId" type="hidden" value="**[id is HERE too]**" />
<INPUT NAME="fooId" type="hidden" value="**[and id is even here TOO]**" />
<!--
<input type="hidden" name="fooId" value="**[don't report this id]**" />
-->
<foo>
</body></html>"""

from pyparsing import makeHTMLTags, withAttribute, htmlComment

# use makeHTMLTags to create tag expression - makeHTMLTags returns expressions for
# opening and closing tags, we're only interested in the opening tag
inputTag = makeHTMLTags("input")[0]

# only want input tags with special attributes
inputTag.setParseAction(withAttribute(type="hidden", name="fooId"))

# don't report tags that are commented out
inputTag.ignore(htmlComment)

# use searchString to skip through the input 
foundTags = inputTag.searchString(html)

# dump out first result to show all returned tags and attributes
print foundTags[0].dump()
print

# print out the value attribute for all matched tags
for inpTag in foundTags:
    print inpTag.value

输出：

['input', ['type', 'hidden'], ['name', 'fooId'], ['value', '**[id is here]**'], True]
- empty: True
- name: fooId
- startInput: ['input', ['type', 'hidden'], ['name', 'fooId'], ['value', '**[id is here]**'], True]
  - empty: True
  - name: fooId
  - type: hidden
  - value: **[id is here]**
- type: hidden
- value: **[id is here]**

**[id is here]**
**[id is here too]**
**[id is HERE too]**
**[and id is even here TOO]**

票数 0

Stack Overflow用户

发布于 2018-08-20 13:08:37

import re
reg = re.compile('<input type="hidden" name="([^"]*)" value="<id>" />')
value = reg.search(inputHTML).group(1)
print 'Value is', value

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/-100000613

复制

相似问题

问如何处理用于HTML解析的Python正则表达式？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何处理用于HTML解析的Python正则表达式 ？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何处理用于HTML解析的Python正则表达式？
EN