我用VGG-16计算了文本特征和图像特征。文本特征范围从-1.58到1.58,而图像特征范围在0到521之间。我想连接文本和图像特征,并使用它们来计算余弦相似性。然而,正如您可能已经注意到的,规模上的差异意味着图像特征将完全支配文本特征。
我的想法是使用类似sklearn的MinMaxScaler之类的方法,并将图像特征缩小到与SBERT计算特性相同的范围;但是,我不确定这是否是我的情况下的最佳解决方案,因为这里的其他方法建议将这两个特性标准化。在我的例子中,我会说文本特性比图像特性更重要的 <#>more。
产品类别:https://github.com/UKPLab/sentence-transformers:通过将两个特征与不同的单元结合起来创建一个功能?
发布于 2023-02-23 15:09:20
在我看来,您找到了合适的答案,因为本文包括常规的规范化和加权。
我认为这个答案规范了这两个特性,但是根据您的项目,这在某种程度上是无用的,因为在计算余弦相似点时,规范化会自动占据位置。
因此,您可以将文本特征范围转换为图像特征范围,我建议使用此示例。
text_feature_v2 = [ele / 1.58 * 260.5 + 260.5 for ele in text_feature]
concated_feature = [*text_feature_v2, *text_feature_v2, *image_feature]在这里,我将两个相同的文本功能连接起来,以增强其重要性。
我将提供我的python代码。
from numpy import dot
from numpy.linalg import norm
from random import randint
def rand_text_feature(dimension=4):
    """Returns dimension-sized array between [0, 521]."""
    res = [randint(0, 521) for _ in range(dimension)]
    return res
def rand_image_feature(dimension=4):
    """Returns dimension-sized array between [0, 521]."""
    res = [randint(0, 521) for _ in range(dimension)]
    return res
def cos_sim(arr1, arr2):
    """Returns Cosine similarity of two arrays."""
    return dot(arr1, arr2)/(norm(arr1)*norm(arr2))
# prepare two pairs of features
text_feature1 = rand_text_feature()
image_feature1 = rand_image_feature()
text_feature2 = rand_text_feature()
image_feature2 = rand_image_feature()
# Prints similarity of texts and images.
print('similarity of two texts')
print(cos_sim(text_feature1, text_feature2))
print('similarity of two images')
print(cos_sim(image_feature1, image_feature2))
# compute cosine similarity traditionally
feature1 = [*text_feature1, *image_feature1]
feature2 = [*text_feature2, *image_feature2]
print('similarity of concatenated feature')
print(cos_sim(feature1, feature2)) 
# compute cosine similarity regarding my proposal
enhanced_feature1 = [*text_feature1, *text_feature1, *image_feature1]
enhanced_feature2 = [*text_feature2, *text_feature2, *image_feature2]
print('similarity of concatenated feature enhancing text')
print(cos_sim(enhanced_feature1, enhanced_feature2))这就是结果。
similarity of two texts
0.8618949874358144
similarity of two images
0.598022653964154
similarity of concatenated feature
0.7335241784245647
similarity of concatenated feature enhancing text
0.7767832080432862https://datascience.stackexchange.com/questions/118742
复制相似问题