我需要根据下一个想法拆分一个字符串:
const strin = 'test <br><span>test</span> <div>aa</div>8'.split(/<\ *>/i)
console.log(strin)
因此,期望的输出是下一步:['test','<br>', '<span>test</span>', '<div>aa</div>', '8']
发布于 2021-11-01 20:03:18
正如@sebastian-simon提到的,只使用正则表达式“拆分”HTML是不可能的。最好的解决方案是使用真正的超文本标记语言解析器(已经随浏览器一起提供了,如果您正在使用Node.js,可以使用JSDOM)。
var str = 'test <br><span>test</span> <fake></fake> <div><p>aa</p></div>8';
var container = document.createElement("div");
container.innerHTML = str; // use a HTML element to parse HTML
// If you need to work with nested tag, you should traverse childNodes and their childNodes by yourself
// childNodes included TextNode, children not.
// [...container.childNodes] convert container.childNodes to a normal array
// so we can .map over it
var elmList = [...container.childNodes];
var tags = elmList
// if elm is a TextNode, elm.outerHTML is undefined
// then we use elm.textContent instead
.map(elm => elm.outerHTML ?? elm.textContent)
.map(elm => elm.trim()) // remove whitespaces
.filter(elm => elm); // remove empty items
console.log(tags)https://stackoverflow.com/questions/69801780
复制相似问题