前言:我知道大家普遍反对使用regex来解析HTML。请事先询问,请避免在这方面提出任何建议。
解释。
我有下面的正则表达式
/<div class="panel-body">([^]*?)(<\/div>|$)/gi
它将div
中的所有内容(包括self )与类.panel-body
相匹配。
完全匹配:
<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
</div>
。。它还匹配没有关闭的div
标记的内容。
完全匹配:
<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
Don't match after closing `div`...but match this and below in case closing `div` is removed.
Line below 1
Line below 2
Line below 3
有个问题。
如何改进我的正则表达式以完成以下操作:
<div class="panel-body">
和关闭</div>
中(当有关闭div
标记时)编辑1:
字符串不是以<div class="panel-body">
开头,而是以
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Webmin 1.851 on centos.centos (CentOS Linux 7.3.1611)</title>
</head>
<body>
<div>
<div>
<div class="panel-body">
*注:直到满载时,它才会关闭,因为它是渐进输出。
编辑2:
贴出答案后,我做了速度比较测试。这取决于你,谁的解决方案对你最有利。
发布于 2017-07-30 23:21:16
您可以使用DOM
解析器,也可以使用不完整的标记:
function divContent(str) {
// create a new dov container
var div = document.createElement('div');
// assign your HTML to div's innerHTML
div.innerHTML = '<html>' + str + '</html>';
// find an element by given className
var el = div.getElementsByClassName("panel-body");
// return found element's first innerHTML
return (el.length > 0 ? el[el.length-1].innerHTML : "");
}
// extract text from a complete tag:
var html = `<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
</div>`;
console.log(divContent(html));
// extract text from an incomplete tag:
html = `<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
Don't match after closing 'div'...but match this and below
in case closing 'div' is removed.
Line below 1
Line below 2
Line below 3`;
console.log(divContent(html));
// OP'e edited HTML text
html = `<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Webmin 1.851 on centos.centos (CentOS Linux 7.3.1611)</title>
</head>
<body>
<div>
<div>
<div class="panel-body">`;
console.log(divContent(html));
发布于 2017-07-30 23:02:17
我还不能发表评论,所以我会试着回答。非捕获组如何,您仍然在完全匹配,但您在比赛中的唯一条目将是内容。所以索引0。
(?:<div class="panel-body">)([^]*?)(?:<\/div>|$)
发布于 2017-07-31 00:40:50
一定要是正则表达式吗?您只需查找开始标记,可以选择删除结束标记,如果存在的话:
function parseContent(input) {
var openingTag = '<div class="panel-body">';
var i = input.indexOf(openingTag);
if (i == -1) {
return ""; // Or something else
}
var closingTag = '</div>';
var closingTagLength = closingTag.length;
var end = input.length - (input.slice(-closingTagLength) === closingTag ? closingTagLength : 0);
return input.slice(i + openingTag.length, end);
}
编辑:
如果在结束标记之后可以有文本,那么也可以使用indexOf
:
function parseContent(input) {
var openingTag = '<div class="panel-body">';
var i = input.indexOf(openingTag);
if (i == -1) {
return ""; // Or something else
}
var closingTag = '</div>';
var endIndex = input.indexOf(closingTag, i);
var end = (endIndex === -1 ? input.length : endIndex);
return input.slice(i + openingTag.length, end);
}
https://stackoverflow.com/questions/45408900
复制相似问题