NLTK For Python3版本变化说明

数据饕餮

发布于 2019-01-14 16:05:34

7060

发布于 2019-01-14 16:05:34

文章被收录于专栏：数据饕餮

1.Here are some changes you may need to make:

grammar: ContextFreeGrammar → CFG, WeightedGrammar → PCFG, StatisticalDependencyGrammar → ProbabilisticDependencyGrammar, WeightedProduction → ProbabilisticProduction draw.tree: TreeSegmentWidget.node() → TreeSegmentWidget.label(), TreeSegmentWidget.set_node() → TreeSegmentWidget.set_label() parsers: nbest_parse() → parse() ccg.parse.chart: EdgeI.next() → EdgeI.nextsym() Chunk parser: top_node → root_label; chunk_node → chunk_label WordNet properties are now access methods, e.g. Synset.definition → Synset.definition() sem.relextract: mk_pairs() → _tree2semi_rel(), mk_reldicts() → semi_rel2reldict(), show_clause() → clause(), show_raw_rtuple() → rtuple() corpusname.tagged_words(simplify_tags=True) → corpusname.tagged_words(tagset=’universal’) util.clean_html() → BeautifulSoup.get_text(). clean_html() is now dropped, install & use BeautifulSoup or some other html parser instead. util.ibigrams() → util.bigrams() util.ingrams() → util.ngrams() util.itrigrams() → util.trigrams() metrics.windowdiff → metrics.segmentation.windowdiff(), metrics.windowdiff.demo() was removed. parse.generate2 was re-written and merged into parse.generate

2.Creating objects from strings:

Many objects now support a fromstring() method tree.Tree.parse() → tree.Tree.fromstring() tree.Tree() → tree.Tree.fromstring() chunk.RegexpChunkRule.parse() → chunkRegexpChunkRule.fromstring() grammar.parse_cfg() → CFG.fromstring() (same for other types of grammar) sem.LogicParser.parse() → sem.Expression.fromstring() sem.DrtParser.parse() → sem.DrtExpression.fromstring() sem.parse_valuation() → sem.Valuation.fromstring() sem.parse_type() → sem.Type.fromstring() Operations on lists of sentences or other items: tokenize.batch_tokenize() → tokenize.tokenize_sents() tag.batch_tag() → tag.tag_sents() parse.batch_parse() → parse.parse_sents() classify.batch_classify() → classify.classify_many() sem.batch_interpret() → sem.interpret_sents() sem.batch_evaluate() → sem.evaluate_sents() chunk.batch_ne_chunk() → chunk.ne_chunk_sents() Changes in probability.FreqDist: fdist.keys() → sorted(fdist) fdist.inc(x) → fdist[x] += 1 fdist.samples() → sorted(fdist.keys()) fdist.Nr(r) → fdist.Nr()[r] fdist.Nr_nonzero() → fdist.Nr().items() cfdist.conditions() → sorted(cfdist.conditions()) Porter stemmer changes: adjust_case(), cons(), cvc(), doublec(), m(), step1ab(), step1c(), step2(), step3(), step4(), step5(), vowelinstem() made private ends(), r(), setto() removed

3.Removed modules, classes and functions:

classify.svm was removed. For classification based on support vector machines (SVMs) use classify.scikitlearn or scikit-learn directly. See https://github.com/nltk/nltk/issues/450. probability.GoodTuringProbDist class was removed. See https://github.com/nltk/nltk/issues/381. HiddenMarkovModelTaggerTransformI and its subclasses are removed. See https://github.com/nltk/nltk/issues/374. classify.maxent no longer support algorithms backed by scipy.maxentropy. See https://github.com/nltk/nltk/issues/321. misc.babelfish was removed. See https://github.com/nltk/nltk/issues/265. sourcedstring was removed. See https://github.com/nltk/nltk/issues/322. yamltags was removed. JSON is now preferred instead. See https://github.com/nltk/nltk/issues/540 mallet was removed, including the tag.crf module. See https://github.com/nltk/nltk/issues/104 tag.simplify was removed. See https://github.com/nltk/nltk/issues/483 model was removed. See https://github.com/nltk/nltk/issues?labels=model corpus.reader.wordnet._lcs_by_depth was removed. See https://github.com/nltk/nltk/issues/422.

4.Miscellaneous changes:

probability.ConditionalProbDist.default_factory now inherits from dict instead of defaultdict probability.ConditionalProbDistI.default_factory now inherits from dict instead of defaultdict probability.DictionaryConditionalProbDist.default_factory now inherits from dict instead of defaultdict tag.senna.SennaTagger → classify.Senna tag.senna.POSTagger → tag.SennaTagger tag.senna.CHKTagger → tag.SennaChunkTagger

5.Printing changes (from 3.0.2, see https://github.com/nltk/nltk/issues/804):

classify.decisiontree.DecisionTreeClassifier.pp → pretty_format metrics.confusionmatrix.ConfusionMatrix.pp → pretty_format sem.lfg.FStructure.pprint → pretty_format sem.drt.DrtExpression.pretty → pretty_format parse.chart.Chart.pp → pretty_format Tree.pprint() → pformat FreqDist.pprint → pformat Tree.pretty_print → pprint Tree.pprint_latex_qtree → pformat_latex_qtree Environment variables for third-party software: These have been normalised; please see Installing Third Party Software More background on Python 3 and NLTK 3: http://docs.python.org/2/library/2to3.html http://docs.python.org/dev/whatsnew/3.0.html

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018年05月21日，如有侵权请联系 cloudcommunity@tencent.com 删除

https