文章/答案/技术大牛

发布

社区首页 >问答首页 >两个句子之间的软余弦相似度

问两个句子之间的软余弦相似度
EN

Stack Overflow用户

提问于 2020-01-03 13:07:44

回答 3查看 4.6K关注 0票数 0

我正在尝试找到一种简单的方法来计算两个句子之间的软余弦相似度。

这是我的尝试和学习：

from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

print(softcossim(sent_1, sent_2, similarity_matrix))

我不能理解similarity_matrix。请帮我找出来，然后再找出python中的软余弦相似度。

python

gensim

cosine-similarity

回答 3

Stack Overflow用户

发布于 2021-01-21 00:55:10

从Gensim的当前版本3.8.3开始，问题和以前的答案中的一些方法调用已被弃用。这些不推荐使用的函数已从4.0.0测试版中删除。似乎无法在回复@EliadL时提供代码，因此添加新注释。

Gensim 3.8.3和4.0.0中解决此问题的当前方法如下：

import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_index = WordEmbeddingSimilarityIndex(fasttext_model300)
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(similarity_matrix.inner_product(sent_1, sent_2, normalized=True))
#> 0.68463486

对于Gensim v3.8.3的用户，我也发现这个Notebook对理解软余弦相似性以及如何使用Gensim应用软余弦相似性很有帮助。

到目前为止，对于Gensim 4.0.0测试版的用户来说，这个Notebook是值得一看的。

票数 2

Stack Overflow用户

发布于 2020-01-21 01:44:28

使用this tutorial

import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_matrix = fasttext_model300.similarity_matrix(dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(softcossim(sent_1, sent_2, similarity_matrix))
#> 0.7909639717134869

票数 1

Stack Overflow用户

发布于 2021-05-14 20:42:15

可以在gensim 4.0.0以上的版本中使用gensim.similarities中的SoftCosineSimilarity类

from gensim.similarities import SoftCosineSimilarity
#Calculate Soft Cosine Similarity between the query and the documents.
def find_similarity(query,documents):
  query = dictionary.doc2bow(query)
  index = SoftCosineSimilarity(
    [dictionary.doc2bow(document) for document in documents],
    similarity_matrix)
  similarities = index[query]
  return similarities

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59573454

复制

相似问题

问两个句子之间的软余弦相似度
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问两个句子之间的软余弦相似度EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问两个句子之间的软余弦相似度
EN