我需要提取以下单词之后的所有单词,直到句子(/[Ee]ach+/) ([tag:NN]+|[tag:NNS]+) (/has+/|/have+/)结束,但我在第13行遇到错误,以下是我的代码:
1 String file="Each campus has one club. Each programme has a unique code, title, level and duration.";
2 Properties props = new Properties();
3 props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
4 StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
5 Annotation document = new Annotation(file);
6 pipeline.annotate(document);
7 List<CoreLabel> tokens = new ArrayList<CoreLabel>();
8 List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
9 for(CoreMap sentence: sentences)
10 {
11 for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class))
12 tokens.add(token);
13 TokenSequencePattern pattern = TokenSequencePattern.compile("(/[Ee]ach+/) ([tag:NN]+|[tag:NNS]+) (/has+/|/have+/) [A-Z]");
14 TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
15 while( matcher.find()){
16 JOptionPane.showMessageDialog(rootPane, matcher.group());
17 String matched = matcher.group();
18 }
19 tokens.removeAll(tokens);
20 } 发布于 2014-07-21 21:04:45
我想你指的是这个正则表达式:
(?i)each[^.]+[.]以Java字符串表示的正则表达式:
"(?i)each[^.]+[.]"以及使用它的Java代码:
String file = "Each campus has one club. Each programme has a unique code, title, level and duration.";
String pattern = "(?i)each[^.]+[.]";
Pattern compile = Pattern.compile(pattern);
Matcher matcher = compile.matcher(file);
while (matcher.find()) {
JOptionPane.showMessageDialog(null, matcher.group(0));
}发布于 2014-07-21 21:23:32
您在许多语言中看到的正则表达式周围的斜杠,例如
/someregex/与正则表达式没有任何关系:斜杠是应用程序语言的伪像,而java不是使用它们的语言之一。
一旦去掉了这些斜杠,修复了正则表达式的更改,删除了不正确的字符类,以及其他一些调整,这个正则表达式就应该可以工作了:
([Ee]ach|tag:NNS?|ha(s|ve)) +\w+https://stackoverflow.com/questions/24863261
复制相似问题