首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何使用正则表达式提取作者的姓名和发布日期?

如何使用正则表达式提取作者的姓名和发布日期?
EN

Stack Overflow用户
提问于 2019-02-20 05:48:04
回答 1查看 389关注 0票数 0

我试图从这个HTML文本中提取作者的姓名和发布日期。

以下是我到目前为止所得到的:(authorName) =(“.”)

不过,这只适用于这个特定的情况,我正在寻找一种通用的方法。能给我一些如何处理这个问题的建议吗?

omni_bizObjectId = "13560483";var omni_publicationDate =“2019-01-25T12:00+00:00”;var omni_sourceSite ="sfgate";var omni_authorName = "Heather“;var omni_authorTitle = "";var omni_premiumStatus = "isPremium";var omni_premiumEndDate = "1893506400";var omni_originalSource = " SF ";var omni_pageNumber = "1";var omni_breakingNewsFlag = "0";omni_localNewsFlag = "1";var omni_isListView = "0";var omni_paywallSite = "1";var omni_displayTemplate = "ard";

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-02-20 06:03:12

您可以使用此正则表达式在group1中捕获作者名称,

代码语言:javascript
运行
复制
authorName\s+=\s+"([^"]*)"

此正则表达式匹配authorName,然后是一个或多个空白,然后是一个双引号",然后捕获下一个双引号之间的任何数据,并将其存储在group1中,在group1中可以使用m.group(1)捕获数据。

演示

检查下面的Python代码,了解如何从group1捕获数据,

代码语言:javascript
运行
复制
import re

s = 'teacher a prime example of where SF should invest windfall";var omni_bizObjectId = "13560483";var omni_className = "article";var omni_publicationDate = "2019-01-25T12:00:00+00:00";var omni_sourceSite ="sfgate";var omni_authorName = "Heather Knight";var omni_authorTitle = "";var omni_premiumStatus = "isPremium";var omni_premiumEndDate = "1893506400";var omni_originalSource = "SF";var omni_pageNumber = "1";var omni_breakingNewsFlag = "0";var omni_localNewsFlag = "1";var omni_isListView = "0";var omni_paywallSite = "1";var omni_displayTemplate = "ard";'

m = re.search(r'authorName\s+=\s+"([^"]*)"',s)
if (m):
 print(m.group(1))

只打印作者的名字,

代码语言:javascript
运行
复制
Heather Knight

编辑:感谢Onyambu指出publicationDate.

authorName类似,您可以使用上面的regex并将authorName替换为publicationDate,并使用此regex捕获publicationDate

代码语言:javascript
运行
复制
publicationDate\s+=\s+"([^"]*)"

publicationDate演示

如果您想使用单个正则表达式同时提取这两个正则表达式,则可以使用该正则表达式,

代码语言:javascript
运行
复制
(?i).*publicationdate\s+=\s+"([^"]*)".*authorName\s+=\s+"([^"]*)"

演示

Python代码,

代码语言:javascript
运行
复制
import re

s = 'teacher a prime example of where SF should invest windfall";var omni_bizObjectId = "13560483";var omni_className = "article";var omni_publicationDate = "2019-01-25T12:00:00+00:00";var omni_sourceSite ="sfgate";var omni_authorName = "Heather Knight";var omni_authorTitle = "";var omni_premiumStatus = "isPremium";var omni_premiumEndDate = "1893506400";var omni_originalSource = "SF";var omni_pageNumber = "1";var omni_breakingNewsFlag = "0";var omni_localNewsFlag = "1";var omni_isListView = "0";var omni_paywallSite = "1";var omni_displayTemplate = "ard";'

m = re.search(r'(?i).*publicationdate\s+=\s+"([^"]*)".*authorName\s+=\s+"([^"]*)"',s)
if (m):
 print('Publication Date:', m.group(1))
 print('Author Name:', m.group(2))

指纹,

代码语言:javascript
运行
复制
Publication Date: 2019-01-25T12:00:00+00:00
Author Name: Heather Knight
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54779598

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档