首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >我如何在python中使用NLTK,并在每个单词的词类下制作一个漂亮的表?

我如何在python中使用NLTK,并在每个单词的词类下制作一个漂亮的表?
EN

Stack Overflow用户
提问于 2021-05-08 13:51:28
回答 1查看 145关注 0票数 0

,这是我用来运行nltk程序的代码

导入nltk #从nltk.corpus导入PlaintextCorpusReader导入自然语言工具包,从nltk导入PlainTextCorpusReader模块,导入word_tokenize、pos_tag、FreqDist,从prettytable pos_tag导入PrettyTable

代码语言:javascript
运行
复制
# PARTS of SPEECH Lookup
POSTAGS = {
        'CC':   'conjunction',
        'CD':   'CardinalNumber',
        'DT':   'Determiner',
        'EX':   'ExistentialThere',
        'FW':   'ForeignWord',
        'IN':   'Preposition',
        'JJ':   'Adjective',
        'JJR':  'AdjectiveComparative',
        'JJS':  'AdjectiveSuperlative',
        'LS':   'ListItem',
        'MD':   'Modal',
        'NN':   'Noun',
        'NNS':  'NounPlural',
        'NNP':  'ProperNounSingular',
        'NNPS': 'ProperNounPlural',
        'PDT':  'Predeterminer',
        'POS':  'PossessiveEnding',
        'PRP':  'PersonalPronoun',
        'PRP$': 'PossessivePronoun',
        'RB':   'Adverb',
        'RBR':  'AdverbComparative',
        'RBS':  'AdverbSuperlative',
        'RP':   'Particle',
        'SYM':  'Symbol',
        'TO':   'to',
        'UH':   'Interjection',
        'VB':   'Verb',
        'VBD':  'VerbPastTense',
        'VBG':  'VerbPresentParticiple',
        'VBN':  'VerbPastParticiple',
        'VBP':  'VerbNon3rdPersonSingularPresent',
        'VBZ':  'Verb3rdPersonSingularPresent',
        'WDT':  'WhDeterminer',
        'WP':   'WhPronoun',
        'WP$':  'PossessiveWhPronoun',
        'WRB':  'WhAdverb'
        }

# Read all contents of the corpus
stopWords = set(stopwords.words('english')) 
Corpus    = PlaintextCorpusReader('./CORPUS', '.*')
rawText   = Corpus.raw()
rawText   = re.sub("[^a-zA-Z' ]", ' ', rawText)   
  
# Extract tokens from the raw text
tokens = nltk.word_tokenize(rawText)
filteredTokens = [w for w in tokens if not w in stopWords] 
TextCorpus = nltk.Text(filteredTokens)  

print ("Compiling Vocabulary Frequencies")
print(TextCorpus.vocab())

# Take sampling of the parts of speech found
posTagged = pos_tag(filteredTokens[0:1000])

tblTags = PrettyTable(['Token', 'Part-of-Speech'])

for taggedToken in posTagged:
    tblTags.add_row([taggedToken[0], taggedToken[1]])

print(tblTags.get_string())

这段代码产生这个

代码语言:javascript
运行
复制
+-----------------+----------------+
|      Token      | Part-of-Speech |
+-----------------+----------------+
|       LOS       |      NNP       |
|     ANGELES     |      NNP       |
|    CALIFORNIA   |      NNP       |
|    WEDNESDAY    |      NNP       |
|     JANUARY     |      NNP       |
|        A        |      NNP       |
|        M        |      NNP       |
|    DEPARTMENT   |      NNP       |
|        NO       |      NNP       |
|       HON       |      NNP       |
|      LANCE      |      NNP       |
|        A        |      NNP       |

,但是我希望它看起来像这样,每个正确的列下面的单词,当操作.add_row时,我不能让它跟随每一个正确的列

代码语言:javascript
运行
复制
+------+-----------+------+------------+--------------------+------------------+------+
| Word | Adjective | Noun | NounPlural | ProperNounSingular | ProperNounPlural | Verb |
+------+-----------+------+------------+--------------------+------------------+------+
EN

回答 1

Stack Overflow用户

发布于 2021-05-08 14:59:21

您可以尝试这样的方法,即构建一个dataframe。

代码语言:javascript
运行
复制
import pandas as pd
my_df=pd.DataFrame(columns=list(POSTAGS.values()))
for taggedToken in posTagged:
    my_map={}
    my_map[POSTAGS.get(taggedToken[1])]=taggedToken[0]
    my_df=my_df.append(my_map,ignore_index=True)
my_df=my_df.fillna(" ")
my_df.head(2)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67448297

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档