给定文本数据,我想根据其他信息的存在来提取数字信息。
例如,“现在有119人在健身房,100人在举重室,19人在跑步机上”。如果我对“举重室”感兴趣,那么我会提取"100“,或者如果我对”跑步机“感兴趣,那么我会提取"19”。
我可以使用Python的spaCy包创建依赖树。有办法提取这些依赖关系吗?
发布于 2018-07-04 11:39:21
如果您的数据总是这样干净,那么您可能可以使用简单的正则表达式来解决这个问题。您可以简单地查找在感兴趣字符串之前出现的第一个数字。
至于使用空间性进行依赖分析,这是句子的输出。
There expl are VERB [] are ROOT are VERB [There, people, now, .] 119 nummod people NOUN [] people attr are VERB [119, in] in prep people NOUN [gym] the det gym NOUN [] gym pobj in ADP [the] right advmod now ADV [] now advmod are VERB [right] . punct are VERB [] 100 nummod people NOUN [] people ROOT people NOUN [100, in, ,, 19] in prep people NOUN [room] the det room NOUN [] weight compound room NOUN [] room pobj in ADP [the, weight] , punct people NOUN [] 19 appos people NOUN [on] on prep 19 NUM [treadmills] treadmills pobj on ADP []
这意味着您需要首先定位权重空间,然后追溯到100,在本例中,这可能比使用正则表达式更复杂。
https://datascience.stackexchange.com/questions/25823
复制相似问题