我想在HTML中获取隐藏输入字段的值。
<input type="hidden" name="fooId" value="12-3456789-1111111111" />
我想用Python编写一个正则表达式,它将返回fooId
,
<input type="hidden" name="fooId" value="**[id is here]**" />
有人能在Python中提供一个示例来解析这个值的HTML吗?
发布于 2018-08-20 11:51:17
Python解析是BeautifulSoup和regex之间的一个很好的过渡步骤,它比正则表达式更健壮,因为它的HTML标记解析包含在情况、空格、属性存在/缺失/顺序等方面的变化,但比使用BS进行这种基本标记提取更简单。
你的示例特别简单,因为你要查找的所有内容都位于开头的“Input”标记的属性中。下面是一个Python解析示例,显示了输入标记上的几个变体,这些变化将使regexes适合,并且还说明了如果标签在注释中,则如何不匹配它:
html = """<html><body>
<input type="hidden" name="fooId" value="**[id is here]**" />
<blah>
<input name="fooId" type="hidden" value="**[id is here too]**" />
<input NAME="fooId" type="hidden" value="**[id is HERE too]**" />
<INPUT NAME="fooId" type="hidden" value="**[and id is even here TOO]**" />
<!--
<input type="hidden" name="fooId" value="**[don't report this id]**" />
-->
<foo>
</body></html>"""
from pyparsing import makeHTMLTags, withAttribute, htmlComment
# use makeHTMLTags to create tag expression - makeHTMLTags returns expressions for
# opening and closing tags, we're only interested in the opening tag
inputTag = makeHTMLTags("input")[0]
# only want input tags with special attributes
inputTag.setParseAction(withAttribute(type="hidden", name="fooId"))
# don't report tags that are commented out
inputTag.ignore(htmlComment)
# use searchString to skip through the input
foundTags = inputTag.searchString(html)
# dump out first result to show all returned tags and attributes
print foundTags[0].dump()
print
# print out the value attribute for all matched tags
for inpTag in foundTags:
print inpTag.value
输出:
['input', ['type', 'hidden'], ['name', 'fooId'], ['value', '**[id is here]**'], True]
- empty: True
- name: fooId
- startInput: ['input', ['type', 'hidden'], ['name', 'fooId'], ['value', '**[id is here]**'], True]
- empty: True
- name: fooId
- type: hidden
- value: **[id is here]**
- type: hidden
- value: **[id is here]**
**[id is here]**
**[id is here too]**
**[id is HERE too]**
**[and id is even here TOO]**
发布于 2018-08-20 13:08:37
import re
reg = re.compile('<input type="hidden" name="([^"]*)" value="<id>" />')
value = reg.search(inputHTML).group(1)
print 'Value is', value
https://stackoverflow.com/questions/-100000613
复制相似问题