问Python正则表达式匹配失败
EN

Stack Overflow用户

提问于 2018-05-31 06:35:28

回答 2查看 480关注 0票数 1

这在https://regex101.com/上传递时没有任何问题。我错过什么了吗？整个字符串在一行中。

def get_title_and_content(html):
  html = """<!DOCTYPE html>     <html>       <head>       <title>Change delivery date with Deliv</title>       </head>       <body>       <div class="gkms web">The delivery date can be changed up until the package is assigned to a driver.</div>       </body>     </html>  """
  title_pattern = re.compile(r'<title>(.*?)</title>(.*)')
  match = title_pattern.match(html)
  if match:
    print('successfully extract title and answer')
      return match.groups()[0].strip(), match.groups()[1].strip()
    else:
      print('unable to extract title or answer')

python

regex

回答 2

Stack Overflow用户

发布于 2018-05-31 07:37:14

在评论摘要中：

应该使用title_pattern.search(html)而不是title_pattern.match(html)

因为搜索函数将在所提供的字符串中的任何位置进行搜索，而不仅仅是从开头开始。match = title_pattern.findall(html)也可以使用类似的方法，但它会返回一个项目列表，而不是一个项目。

此外，正如前面提到的，从长远来看，使用BeautifulSoup会付出更多的代价，因为正则表达式不适合搜索超文本标记语言

票数 0

Stack Overflow用户

发布于 2018-05-31 10:03:23

注释是正确的，re.match()会从头开始搜索。也就是说，在您的正则表达式中插入一个.*以从头开始搜索：

title_pattern = re.compile(r'.*<title>(.*?)</title>(.*)')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50614223

复制

相似问题

问Python正则表达式匹配失败
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python正则表达式匹配失败EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python正则表达式匹配失败
EN