有人能告诉我,我怎样才能考虑一个词周围的单词吗?如果我们有一句话:“今天天气很好,我们喜欢走路。”然后,如果窗口大小为5,我想得到以下内容:
诸若此类。考虑到大写并不是问题:
bigrams = [p for s in corpus_lemm for p in nltk.bigrams(w for w in s)] #take bigrams inside of each sentence
但是,如何考虑给定窗口大小的单词呢?
非常感谢您的帮助!
发布于 2016-11-04 21:33:37
对不起,我对Python没有太多的控制,但是在JS中,可以这样做。希望您可以将其实现到Python中。
var str = "Today the weather is fine and we love to walk.",
arr = str.split(/\s+/),
win = 5,
result = arr.map((w,i,a) => Array(win).fill()
.map((e,j) => a[i + j + -1 * Math.floor(win/2)])
.reduce((p,c) => p ? c ? p + " " + c
: p
: c));
console.log(result);
根据你的评论..。在坚持同样的算法的同时,我可能会扩展我的答案如下。
var arr = [1,2,3,4,5,6,7,8],
win = 5,
result = arr.map((_,i,a) => Array(win).fill()
.map((e,j) => a[i + j + -1 * Math.floor(win/2)])
.reduce((p,c) => p ? c ? [].concat(p,c)
: p
: c ? c
: undefined));
console.log(JSON.stringify(result));
发布于 2016-11-04 21:55:42
我不太确定我是否理解窗口,但似乎是您想要的输出。
s = "Today the weather is fine and we love to walk"
words = s.split()
win_len = 5
half_win = win_len // 2
print "\n".join(words[:half_win])
for i in range(len(words) - win_len + 1):
window = words[i:i+win_len]
# print " ".join(window)
print window[len(window) // 2]
print "\n".join(words[-half_win:])
输出
Today
the
weather
is
fine
and
we
love
to
walk
发布于 2016-11-04 22:36:59
您可以使用list.index
和列表切片来检索所需的单词。
def words(text, search, window):
words = s.split()
i = words.index(search)
low = i - window // 2
high = low + window
low = max(low, 0)
return words[low:high]
https://stackoverflow.com/questions/40430931
复制相似问题