首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >两个句子之间的软余弦相似度

两个句子之间的软余弦相似度
EN

Stack Overflow用户
提问于 2020-01-03 13:07:44
回答 3查看 4.6K关注 0票数 0

我正在尝试找到一种简单的方法来计算两个句子之间的软余弦相似度。

这是我的尝试和学习:

代码语言:javascript
运行
复制
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

print(softcossim(sent_1, sent_2, similarity_matrix))

我不能理解similarity_matrix。请帮我找出来,然后再找出python中的软余弦相似度。

EN

回答 3

Stack Overflow用户

发布于 2021-01-21 00:55:10

从Gensim的当前版本3.8.3开始,问题和以前的答案中的一些方法调用已被弃用。这些不推荐使用的函数已从4.0.0测试版中删除。似乎无法在回复@EliadL时提供代码,因此添加新注释。

Gensim 3.8.3和4.0.0中解决此问题的当前方法如下:

代码语言:javascript
运行
复制
import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_index = WordEmbeddingSimilarityIndex(fasttext_model300)
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(similarity_matrix.inner_product(sent_1, sent_2, normalized=True))
#> 0.68463486

对于Gensim v3.8.3的用户,我也发现这个Notebook对理解软余弦相似性以及如何使用Gensim应用软余弦相似性很有帮助。

到目前为止,对于Gensim 4.0.0测试版的用户来说,这个Notebook是值得一看的。

票数 2
EN

Stack Overflow用户

发布于 2020-01-21 01:44:28

使用this tutorial

代码语言:javascript
运行
复制
import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_matrix = fasttext_model300.similarity_matrix(dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(softcossim(sent_1, sent_2, similarity_matrix))
#> 0.7909639717134869
票数 1
EN

Stack Overflow用户

发布于 2021-05-14 20:42:15

可以在gensim 4.0.0以上的版本中使用gensim.similarities中的SoftCosineSimilarity类

代码语言:javascript
运行
复制
from gensim.similarities import SoftCosineSimilarity
#Calculate Soft Cosine Similarity between the query and the documents.
def find_similarity(query,documents):
  query = dictionary.doc2bow(query)
  index = SoftCosineSimilarity(
    [dictionary.doc2bow(document) for document in documents],
    similarity_matrix)
  similarities = index[query]
  return similarities
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59573454

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档