目前,我正在开发一个将长列拆分成短列的应用程序。为此,我将整个文本拆分为单词,但目前我的正则表达式也拆分了数字。
我所做的是:
str = "This is a long string with some numbers [125.000,55 and 140.000] and an end. This is another sentence.";
sentences = str.replace(/\.+/g,'.|').replace(/\?/g,'?|').replace(/\!/g,'!|').split("|");
结果是:
Array [
"This is a long string with some numbers [125.",
"000,55 and 140.",
"000] and an end.",
" This is another sentence."
]
期望的结果将是:
Array [
"This is a long string with some numbers [125.000, 140.000] and an end.",
"This is another sentence"
]
如何更改我的正则表达式才能实现这一点?我需要注意一些我可能会遇到的问题吗?或者搜索". "
、"? "
和"! "
就足够了吗?
发布于 2013-09-20 18:54:54
您可以利用下一句话以大写字母或数字开头的特点。
.*?(?:\.|!|\?)(?:(?= [A-Z0-9])|$)
它将此文本拆分
This is a long string with some numbers [125.000,55 and 140.000] and an end. This is another sentence. Sencenes beginning with numbers work. 10 people like that.
到句子中去:
This is a long string with some numbers [125.000,55 and 140.000] and an end.
This is another sentence.
Sencenes beginning with numbers work.
10 people like that.
发布于 2013-09-20 18:41:55
您可以更安全地使用先行查找,以确保点后面的内容不是数字。
var str ="This is a long string with some numbers [125.000,55 and 140.000] and an end. This is another sentence."
var sentences = str.replace(/\.(?!\d)/g,'.|');
console.log(sentences);
如果你想更安全,你可以检查后面的是不是数字,但由于JS不支持后视,你需要捕获前一个字符并在替换字符串中使用它。
var str ="This is another sentence.1 is a good number"
var sentences = str.replace(/\.(?!\d)|([^\d])\.(?=\d)/g,'$1.|');
console.log(sentences);
一个更简单的解决方案是转义数字中的点(例如,用$$$$替换它们),进行拆分,然后取消转义点。
发布于 2013-09-20 18:55:31
您忘记在regexp中输入'\s‘。
试试这个
var str = "This is a long string with some numbers [125.000,55 and 140.000] and an end. This is another sentence.";
var sentences = str.replace(/\.\s+/g,'.|').replace(/\?\s/g,'?|').replace(/\!\s/g,'!|').split("|");
console.log(sentences[0]);
console.log(sentences[1]);
https://stackoverflow.com/questions/18914629
复制相似问题