我正试着从斯坦福大学CoreNLP开始学习,甚至无法通过第一个简单的例子。
这是我的代码:
package stanford.corenlp;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import com.google.common.io.Files;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import java.util.logging.Level;
import java.util.logging.Logger;
private void test2() {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "Now is the time for all good men to come to the aid of their country.";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
}
public static void main(String[] args) throws IOException {
StanfordNLP nlp = new StanfordNLP();
nlp.test2();
}
}
下面是堆栈跟踪:
Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:791)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:312)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:265)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:55)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$42(StanfordCoreNLP.java:496)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:93)
at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:108)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:789)
... 16 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 0 seconds)
我遗漏了什么?
发布于 2019-12-31 23:05:23
2019-12-31为清晰/参考;注: Linux终端。
下载https://stanfordnlp.github.io/CoreNLP/download.html](https://stanfordnlp.github.io/CoreNLP/download.html])并解压它,并将其解压缩。
pwd; ls -l
/mnt/Vancouver/apps/CoreNLP/src-local/zzz
-rw-r--r-- 1 victoria victoria 393239982 Dec 31 14:13 stanford-corenlp-full-2018-10-05.zip
unzip stanford-corenlp-full-2018-10-05.zip
# ...
ls -l
drwxrwxr-x 5 victoria victoria 4096 Oct 8 2018 stanford-corenlp-full-2018-10-05
-rw-r--r-- 1 victoria victoria 393239982 Dec 31 14:13 stanford-corenlp-full-2018-10-05.zip
保存"BasicPipelineExample.java“代码
在一个名为BasicPipelineExample.java
的文件中:
/mnt/Vancouver/apps/CoreNLP/src-local/zzz/BasicPipelineExample.java
编译它
pwd ## "sanity check"
/mnt/Vancouver/apps/CoreNLP/src-local/zzz/
javac -cp stanford-corenlp-3.9.2.jar BasicPipelineExample.java -Xdiags:verbose
它给出了Java类文件BasicPipelineExample.class
,并从该dir运行它,
java -cp .:* BasicPipelineExample
增编
上面的代码描述了在Java环境中对CoreNLP的访问,如下所述:https://stanfordnlp.github.io/CoreNLP/api.html#quickstart-with-convenience-wrappers
对于那些更倾向于(包括我自己)的人,斯坦福在Python环境中提供了基本相同的功能,如下所述:client.html
例如,
import stanfordnlp
from stanfordnlp.server import CoreNLPClient
# JSON output [default]:
client = CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', \
'parse', 'depparse','coref'], timeout=30000, memory='16G')
# Plain-text ourput (much more compact):
client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, \
depparse, coref', output_format='text', timeout=30000, memory='16G')
text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.'
# This auto-starts the client() instance:
ann = client.annotate(text)
# ....
sentence = ann.sentence[0]
print(sentence)
# ... copious output ...
print(ann)
# ... more succinct ...
注意:如果您使用的是output_format='text'
参数,您可以这样做
print(ann)
但不是这个
sentence = ann.sentence[0]
print(sentence)
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'sentence'
使用stanfordnlp
包,还可以设置管道,如下所述:https://stanfordnlp.github.io/stanfordnlp/
例如,
import stanfordnlp
stanfordnlp.download('en')
nlp = stanfordnlp.Pipeline()
text = 'Bananas are an excellent source of potassium.'
text_nlp = nlp(text)
text_nlp.sentences[0].print_dependencies()
最后--尽管我觉得功能有限(请参阅。斯坦福--作者的CoreNLP库)--通过在spaCy:https://github.com/explosion/spacy-stanfordnlp中访问CoreNLP获得了一些类似的结果。
import stanfordnlp
from spacy_stanfordnlp import StanfordNLPLanguage
snlp = stanfordnlp.Pipeline(lang="en")
nlp = StanfordNLPLanguage(snlp)
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_)
# ...
https://stackoverflow.com/questions/44910934
复制相似问题