我有一个多行字符串,它由一组不同的分隔符分隔:
(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
我可以将这个字符串分成几个部分,使用String.split,但似乎无法获得与分隔符regex匹配的实际字符串。
换句话说,这就是我所得到的:
Text1
Text2
Text3
Text4
这就是我想要的
Text1
DelimiterA
Text2
DelimiterC
Text3
DelimiterB
Text4
有没有什么JDK方法可以使用分隔符regex拆分字符串,同时保留分隔符?
发布于 2010-05-17 18:09:37
您希望使用lookarounds,并在零宽度匹配上拆分。下面是一些示例:
public class SplitNDump {
static void dump(String[] arr) {
for (String s : arr) {
System.out.format("[%s]", s);
}
System.out.println();
}
public static void main(String[] args) {
dump("1,234,567,890".split(","));
// "[1][234][567][890]"
dump("1,234,567,890".split("(?=,)"));
// "[1][,234][,567][,890]"
dump("1,234,567,890".split("(?<=,)"));
// "[1,][234,][567,][890]"
dump("1,234,567,890".split("(?<=,)|(?=,)"));
// "[1][,][234][,][567][,][890]"
dump(":a:bb::c:".split("(?=:)|(?<=:)"));
// "[][:][a][:][bb][:][:][c][:]"
dump(":a:bb::c:".split("(?=(?!^):)|(?<=:)"));
// "[:][a][:][bb][:][:][c][:]"
dump(":::a::::b b::c:".split("(?=(?!^):)(?
And yes, that is triply-nested assertion there in the last pattern.
Related questions
Java split is eating my characters.
Can you use zero-width matching regex in String split?
How do I convert CamelCase into human-readable names in Java?
Backreferences in lookbehind
See also
发布于 2010-02-05 18:36:42
一个不涉及正则表达式的非常天真的解决方案是在分隔符上执行字符串替换,如下所示(假设用逗号代替分隔符):
string.replace(FullString, "," , "~,~")
其中,您可以将tilda (~)替换为适当的唯一分隔符。
然后,如果你对新的分隔符进行拆分,那么我相信你会得到想要的结果。
发布于 2008-11-09 17:52:00
import java.util.regex.*;
import java.util.LinkedList;
public class Splitter {
private static final Pattern DEFAULT_PATTERN = Pattern.compile("\\s+");
private Pattern pattern;
private boolean keep_delimiters;
public Splitter(Pattern pattern, boolean keep_delimiters) {
this.pattern = pattern;
this.keep_delimiters = keep_delimiters;
}
public Splitter(String pattern, boolean keep_delimiters) {
this(Pattern.compile(pattern==null?"":pattern), keep_delimiters);
}
public Splitter(Pattern pattern) { this(pattern, true); }
public Splitter(String pattern) { this(pattern, true); }
public Splitter(boolean keep_delimiters) { this(DEFAULT_PATTERN, keep_delimiters); }
public Splitter() { this(DEFAULT_PATTERN); }
public String[] split(String text) {
if (text == null) {
text = "";
}
int last_match = 0;
LinkedList splitted = new LinkedList();
Matcher m = this.pattern.matcher(text);
while (m.find()) {
splitted.add(text.substring(last_match,m.start()));
if (this.keep_delimiters) {
splitted.add(m.group());
}
last_match = m.end();
}
splitted.add(text.substring(last_match));
return splitted.toArray(new String[splitted.size()]);
}
public static void main(String[] argv) {
if (argv.length != 2) {
System.err.println("Syntax: java Splitter ");
return;
}
Pattern pattern = null;
try {
pattern = Pattern.compile(argv[0]);
}
catch (PatternSyntaxException e) {
System.err.println(e);
return;
}
Splitter splitter = new Splitter(pattern);
String text = argv[1];
int counter = 1;
for (String part : splitter.split(text)) {
System.out.printf("Part %d: \"%s\"\n", counter++, part);
}
}
}
/*
Example:
> java Splitter "\W+" "Hello World!"
Part 1: "Hello"
Part 2: " "
Part 3: "World"
Part 4: "!"
Part 5: ""
*/
我真的不喜欢另一种方式,在前面和后面都有一个空的元素。分隔符通常不在字符串的开头或结尾,因此最常见的结果是浪费了两个很好的数组槽。
编辑:
修复了限制情况。测试用例的注释源可以在这里找到:
http://snippets.dzone.com/posts/show/6453
https://stackoverflow.com/questions/2206378
复制相似问题