A Survey of Zero-Shot Learning: Settings, Methods, and Applications-阅读笔记

百川AI

发布于 2022-01-04 08:22:44

5780

发布于 2022-01-04 08:22:44

文章被收录于专栏：我还不懂对话

文章目录

论文内容：

概述了零样本学习，划分为三种类别；
零样本学习过程中所采用的不同语义空间；
对现有零样本学习方法进行了分类，并在每个类别下介绍了具有代表性的方法；
讨论了零样本学习的不同应用方向；
零样本学习的未来研究方向。

INTRODUCTION

零样本学习的应用场景：

目标类别数据量特别大，场景CV领域。
目标类别非常少见。
目标类别经常变换，例如某公司的产品图像，会随着时代和潮流改变风格。
某些任务上，标注数据非常昂贵。

因此ZSL应运而生。

zero-shot learning定义：

给定训练标签数据

D^{tr}

属于已知类别S，zero-shot learning目标学习分类器：

f^{u}(\cdot) : X \rightarrow \mathcal{U}

，能够对测试集

X^{te}

预测出类别

Y^{te} \in \mathcal{U}

，

\mathcal{U}

表示未见类别。

现有的zsl类别：

[外链图片转存失败(img-wHytIo4i-1567006037501)(assets/image-20190828124733572.png)]

Semantic space

在语义空间上，每一个类别都有一个相应的表示，一般语义空间都是一个向量空间，而类别都已其对应的向量集合。

定义语义空间

{T}

，定义

t_i^s \in T

是已知类别

c_i^s

的原型，定义

t_i^u \in T

是已知类别

c_i^u

的原型，,

T^s=\{t_i^s\}_{i=1}^{N_s}

表示已知类别的语义空间，

T^u=\{t_i^u\}_{i=1}^{N_u}

表示位置类别的语义空间，定义

\pi(\cdot) :\mathcal{S} \cup \mathcal{U} \rightarrow {T}

表示类别到语义空间的映射关系。在zsl中，

T^u, T^s

都会在分类器

f^{u}(\cdot)

的训练中出现。

Learning settings

Class-Inductive Instance-Inductive (CIII) Setting

Only labeled training instances

D^{tr}

and seen class prototypes

T^s

are used in model learning.

Class-Transductive Instance-Inductive (CTII) Setting

Labeled training instances

D^{tr}

, seen class prototypes

T^s

and unseen class prototypes

T^u

are used in model learning.

Class-TransductiveInstance-Transductive(CTIT)Setting

Labeledtraininginstances

D^{tr}

, seen class prototypes

T^s

, unlabeled testing instances

X^{te}

and unseen class prototypes

T^u

are used in model learning.

使用的符号定义：

[外链图片转存失败(img-1r4RYxIK-1567006037502)(assets\1566921814479.png)]

SEMANTIC SPACES

[外链图片转存失败(img-JZv625yo-1567006037502)(assets/1566922401213.png)]

Engineered Semantic Spaces

engineered semantic spaces: SEMANTIC SPACES的每一维都是人工设计的。

Attribute spaces

Attribute spaces are kinds of semantic spaces that are constructed by a set of attributes.

通过一系列属性表示语义空间。

Lexical spaces

Lexical spaces are kinds of semantic spaces that are constructed by a set of lexical items

Text-keyword spaces

constructed by a set of keywords extracted from the text descriptions of each class

text-keyword space is constructed through extracting keywords and using each of them as a dimension in the constructed space

Some problem-specific spaces

Some engineered semantic spaces are designed specifically for certain problems.

Learned Semantic Spaces

通过机器学习模型输出的来获得语义空间。

Label-embedding spaces

embedding of class labels

类别标签都是由词语或者短语组成，因此可以通过word embedding方法将其转化为相应的向量空间。

embedding方法有很多，包括

word2vec
flove

不同的语料库也会训练出不同embedding，例如公用语料wikipedia，专用语料如Flickr。

同一个类别也可以生成多个语义向量。

Text-embedding spaces

embedding the text descriptions for each class

类别的描述文本作为模型输入，输出向量作为类别的向量表达。

Image-representation spaces

the class prototypes are obtained from images belonging to each class.

将属于某个类别的图片，输入到预训练的模型（例如GoogLeNet）,输出向量作为类别表征向量。

learned semantic spaces总结：优势：1）减少人力。2）能捕捉人容易忽略的信息。劣势：机器学习模型生成的语义空间是黑核的，难以结合域的知识到语义空间中。

METHODS

Classifier-Based Methods

使用one-vs-rest方法来学习zero-shot分类器

f^u_i(\cdot)

，对于每一个未知类别

c_i^u

，学习一个one-vs-rest的二分类器，

f^u_i(\cdot): R^D \rightarrow \{0,1\}

，二分类器分类是否是类别

c_i^u \in \mathcal{U}

，最终的zero-shot分类器

f^u(\cdot)

由多个二分类器组成:

\left\{f_{i}^{u}(\cdot) | i=1, \ldots, N_{u}\right\}

Correspondence methods

语义空间的prototype是类别的一种表征，one-vs-rest分类器输出也是其表征，Correspondence methods目标在学习这两种表征之间的correspondence function（我理解是映射关系）

Relationship methods

Combination methods

Future Directions

输入数据的特性研究

基于传感器的活动识别，可以利用数据的时序特性；
目标分类可以利用part信息；
视频相关问题，可以利用多模态信息

训练数据的选择

异构训练数据和测试数据： 1）不同语义类型：训练数据是物体图像，测试数据是场景图像 2）不同数据类型：训练数据是图像+视频，测试图像是视频
训练数据的动态选择假设1：可见类能够动态选择假设2：训练样例可以动态标记

辅助信息的选择和保持

目标是选择更有帮助的辅助信息。现在的辅助信息是受人类视觉识别系统启发。应该有其他的方法来作为辅助信息。比如，人类定义的相似性信息；比如学到的属性信息。

保持信息，是因为在只学习分类器的时候很可能会丢弃一部分语义信息。比如有的方法通过一个重构模型来保持更多的语义信息。

更实际和应用特定的问题设定

例如，广义的零样本学习，就是要同时识别已见类和未见类。更实际和更特定于任务的设定，会被探索。比如，在某些应用中，要识别的类别特别多。这就是大规模场景下的设定。还有一些训练实例和语义信息是在线可获取的，一些在线增量学习通过学习新的属性并用在线的方式适配这些新的属性。基于应用的特定，更多的场景特定的零样本学习问题将会被探索。