我正在尝试找到一种简单的方法来计算两个句子之间的软余弦相似度。
这是我的尝试和学习:
from gensim.matutils import softcossim
sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()
print(softcossim(sent_1, sent_2, similarity_matrix))
我不能理解similarity_matrix
。请帮我找出来,然后再找出python中的软余弦相似度。
发布于 2021-01-21 00:55:10
从Gensim的当前版本3.8.3开始,问题和以前的答案中的一些方法调用已被弃用。这些不推荐使用的函数已从4.0.0测试版中删除。似乎无法在回复@EliadL时提供代码,因此添加新注释。
Gensim 3.8.3和4.0.0中解决此问题的当前方法如下:
import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim
sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()
# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')
# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)
# Prepare the similarity matrix
similarity_index = WordEmbeddingSimilarityIndex(fasttext_model300)
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)
# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)
# Compute soft cosine similarity
print(similarity_matrix.inner_product(sent_1, sent_2, normalized=True))
#> 0.68463486
对于Gensim v3.8.3的用户,我也发现这个Notebook对理解软余弦相似性以及如何使用Gensim应用软余弦相似性很有帮助。
到目前为止,对于Gensim 4.0.0测试版的用户来说,这个Notebook是值得一看的。
发布于 2020-01-21 01:44:28
import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim
sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()
# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')
# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)
# Prepare the similarity matrix
similarity_matrix = fasttext_model300.similarity_matrix(dictionary)
# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)
# Compute soft cosine similarity
print(softcossim(sent_1, sent_2, similarity_matrix))
#> 0.7909639717134869
发布于 2021-05-14 20:42:15
可以在gensim 4.0.0以上的版本中使用gensim.similarities中的SoftCosineSimilarity类
from gensim.similarities import SoftCosineSimilarity
#Calculate Soft Cosine Similarity between the query and the documents.
def find_similarity(query,documents):
query = dictionary.doc2bow(query)
index = SoftCosineSimilarity(
[dictionary.doc2bow(document) for document in documents],
similarity_matrix)
similarities = index[query]
return similarities
https://stackoverflow.com/questions/59573454
复制相似问题