文章/答案/技术大牛

发布

社区首页 >问答首页 >不同尺度的文本与图像特征的结合

问不同尺度的文本与图像特征的结合
EN

Data Science用户

提问于 2023-02-23 13:02:04

回答 1查看 57关注 0票数 0

我用VGG-16计算了文本特征和图像特征。文本特征范围从-1.58到1.58，而图像特征范围在0到521之间。我想连接文本和图像特征，并使用它们来计算余弦相似性。然而，正如您可能已经注意到的，规模上的差异意味着图像特征将完全支配文本特征。

我的想法是使用类似sklearn的MinMaxScaler之类的方法，并将图像特征缩小到与SBERT计算特性相同的范围；但是，我不确定这是否是我的情况下的最佳解决方案，因为这里的其他方法建议将这两个特性标准化。在我的例子中，我会说文本特性比图像特性更重要的 <#>more。

产品类别：https://github.com/UKPLab/sentence-transformers：通过将两个特征与不同的单元结合起来创建一个功能？

bert

feature-scaling

vgg16

回答 1

Data Science用户

发布于 2023-02-23 15:09:20

在我看来，您找到了合适的答案，因为本文包括常规的规范化和加权。

我认为这个答案规范了这两个特性，但是根据您的项目，这在某种程度上是无用的，因为在计算余弦相似点时，规范化会自动占据位置。

因此，您可以将文本特征范围转换为图像特征范围，我建议使用此示例。

text_feature_v2 = [ele / 1.58 * 260.5 + 260.5 for ele in text_feature]
concated_feature = [*text_feature_v2, *text_feature_v2, *image_feature]

在这里，我将两个相同的文本功能连接起来，以增强其重要性。

我将提供我的python代码。

from numpy import dot
from numpy.linalg import norm
from random import randint


def rand_text_feature(dimension=4):
    """Returns dimension-sized array between [0, 521]."""
    res = [randint(0, 521) for _ in range(dimension)]
    return res

def rand_image_feature(dimension=4):
    """Returns dimension-sized array between [0, 521]."""
    res = [randint(0, 521) for _ in range(dimension)]
    return res

def cos_sim(arr1, arr2):
    """Returns Cosine similarity of two arrays."""
    return dot(arr1, arr2)/(norm(arr1)*norm(arr2))

# prepare two pairs of features
text_feature1 = rand_text_feature()
image_feature1 = rand_image_feature()

text_feature2 = rand_text_feature()
image_feature2 = rand_image_feature()



# Prints similarity of texts and images.
print('similarity of two texts')
print(cos_sim(text_feature1, text_feature2))
print('similarity of two images')
print(cos_sim(image_feature1, image_feature2))

# compute cosine similarity traditionally
feature1 = [*text_feature1, *image_feature1]
feature2 = [*text_feature2, *image_feature2]

print('similarity of concatenated feature')
print(cos_sim(feature1, feature2)) 

# compute cosine similarity regarding my proposal
enhanced_feature1 = [*text_feature1, *text_feature1, *image_feature1]
enhanced_feature2 = [*text_feature2, *text_feature2, *image_feature2]

print('similarity of concatenated feature enhancing text')
print(cos_sim(enhanced_feature1, enhanced_feature2))

这就是结果。

similarity of two texts
0.8618949874358144
similarity of two images
0.598022653964154
similarity of concatenated feature
0.7335241784245647
similarity of concatenated feature enhancing text
0.7767832080432862

如果文本比图像更相似，我的算法打印出更高的相似度，

，否则，打印更低的相似度。

票数 1

页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://datascience.stackexchange.com/questions/118742

复制

相似问题

问不同尺度的文本与图像特征的结合
EN

回答 1

Data Science用户

如果文本比图像更相似，我的算法打印出更高的相似度，

，否则，打印更低的相似度。

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不同尺度的文本与图像特征的结合EN

回答 1

Data Science用户

如果文本比图像更相似，我的算法打印出更高的相似度，

，否则，打印更低的相似度。

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不同尺度的文本与图像特征的结合
EN