首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >nltk不将$NLTK_DATA添加到搜索路径中?

nltk不将$NLTK_DATA添加到搜索路径中?
EN

Stack Overflow用户
提问于 2016-04-03 07:55:14
回答 2查看 14.9K关注 0票数 4

在linux下,我设置了env $NLTK_DATA('/home/user/data/nltk'),并按预期进行了测试工作。

代码语言:javascript
运行
复制
>>> from nltk.corpus import brown
>>> brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

但是当运行另一个python脚本时,我得到了:

代码语言:javascript
运行
复制
LookupError: 
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found.  Please
use the NLTK Downloader to obtain the resource:  >>>
nltk.download()
Searched in:
- '/home/user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

如我们所见,在手动追加$NLTK_DATA dir之后,nltk不会将NLTK_DATA添加到搜索路径中:

代码语言:javascript
运行
复制
nltk.data.path.append("/NLTK_DATA_DIR");

脚本按预期运行,问题是:

如何使nltk自动添加$NLTK_DATA到其搜索路径?

EN

Stack Overflow用户

发布于 2016-04-03 08:39:56

如果您不想在运行脚本之前设置$NLTK_DATA,可以在python脚本中使用:

代码语言:javascript
运行
复制
import nltk
nltk.path.append('/home/alvas/some_path/nltk_data/')

例如,让我们将nltk_data移动到NLTK不会自动找到的非标准路径:

代码语言:javascript
运行
复制
alvas@ubi:~$ ls nltk_data/
chunkers  corpora  grammars  help  misc  models  stemmers  taggers  tokenizers
alvas@ubi:~$ mkdir some_path
alvas@ubi:~$ mv nltk_data/ some_path/
alvas@ubi:~$ ls nltk_data/
ls: cannot access nltk_data/: No such file or directory
alvas@ubi:~$ ls some_path/nltk_data/
chunkers  corpora  grammars  help  misc  models  stemmers  taggers  tokenizers

现在,我们使用nltk.path.append()黑客:

代码语言:javascript
运行
复制
alvas@ubi:~$ python
>>> import os
>>> import nltk
>>> nltk.path.append('/home/alvas/some_path/nltk_data/')
>>> nltk.pos_tag('this is a foo bar'.split())
[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]
>>> nltk.data
<module 'nltk.data' from '/usr/local/lib/python2.7/dist-packages/nltk/data.pyc'>
>>> nltk.data.path
['/home/alvas/some_path/nltk_data/', '/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']
>>> exit()

让我们把它移回来,看看它是否有效:

代码语言:javascript
运行
复制
alvas@ubi:~$ ls nltk_data
ls: cannot access nltk_data: No such file or directory
alvas@ubi:~$ mv some_path/nltk_data/ .
alvas@ubi:~$ python
>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']
>>> nltk.pos_tag('this is a foo bar'.split())
[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]

如果您真的很想自动找到nltk_data,请使用以下内容:

代码语言:javascript
运行
复制
import scandir
import os, sys
import time

import nltk

def find(name, path):
    for root, dirs, files in scandir.walk(path):
        if root.endswith(name):
            return root

def find_nltk_data():
    start = time.time()
    path_to_nltk_data = find('nltk_data', '/')
    print >> sys.stderr, 'Finding nltk_data took', time.time() - start
    print >> sys.stderr,  'nltk_data at', path_to_nltk_data
    with open('where_is_nltk_data.txt', 'w') as fout:
        fout.write(path_to_nltk_data)
    return path_to_nltk_data

def magically_find_nltk_data():
    if os.path.exists('where_is_nltk_data.txt'):
        with open('where_is_nltk_data.txt') as fin:
            path_to_nltk_data = fin.read().strip()
        if os.path.exists(path_to_nltk_data):
            nltk.data.path.append(path_to_nltk_data)
        else:
            nltk.data.path.append(find_nltk_data())
    else:
        path_to_nltk_data  = find_nltk_data()
        nltk.data.path.append(path_to_nltk_data)


magically_find_nltk_data()
print nltk.pos_tag('this is a foo bar'.split())

让我们称之为python脚本,test.py

代码语言:javascript
运行
复制
alvas@ubi:~$ ls nltk_data/
chunkers  corpora  grammars  help  misc  models  stemmers  taggers  tokenizers
alvas@ubi:~$ python test.py
Finding nltk_data took 4.27330780029
nltk_data at /home/alvas/nltk_data
[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]
alvas@ubi:~$ mv nltk_data/ some_path/
alvas@ubi:~$ python test.py
Finding nltk_data took 4.75850391388
nltk_data at /home/alvas/some_path/nltk_data
[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]
票数 7
EN
查看全部 2 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/36382937

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档