我正在使用Java中的Stanford POS标记器,但我不知道如何计算每种类型的标记。这是我到目前为止所知道的:
int pos = 0;
int end = tagged.length() - 1;
int nouns = 0;
int adjectives = 0;
int adverbs = 0;
while (pos < (end - 1)){
pos++;
String sequence = tagged.substring(pos - 1, pos + 2);
//System.out.println(sequence);
if (sequence.equals("_NN")){
nouns++;
}
if (sequence.equals("_JJ")){
adjectives++;
}
if (sequence.equals("_RB")){
adverbs++;
}
}标签是带标签的字符串。
下面是一个标记字符串的示例:
This_DT is_VBZ a_DT good_JJ sample_NN sentence_NN ._. Here_RB is_VBZ another_DT good_JJ sample_NN sentence_NN ._.发布于 2012-11-27 00:41:35
在您的例子中,以下(尽管不是最优的)代码可以作为指导
public class Main {
public static void main(final String[] args) throws Exception {
final String tagged = "World_NN Big_RBS old_RB stupid_JJ";
int nouns = 0;
int adjectives = 0;
int adverbs = 0;
final String[] tokens = tagged.split(" ");
for (final String token : tokens) {
final int lastUnderscoreIndex = token.lastIndexOf("_");
final String realToken = token.substring(lastUnderscoreIndex + 1);
if ("NN".equals(realToken)) {
nouns++;
}
if ("JJ".equals(realToken)) {
adjectives++;
}
if ("RB".equals(realToken) || "RBS".equals(realToken)) {
adverbs++;
}
}
System.out.println(String.format("Nouns: %d Adjectives: %d, Adverbs: %d", nouns, adjectives, adverbs));
}
}还有它的fiddle。
https://stackoverflow.com/questions/13568895
复制相似问题