专栏首页CreateAMindDense Associative Memory Is Robust to Adversarial Inputs

Dense Associative Memory Is Robust to Adversarial Inputs

Dense Associative Memory Is Robust to Adversarial Inputs

https://github.com/DimaKrotov/Dense_Associative_Memory/blob/master/Dense_Associative_Memory_training.ipynb

Abstract

Deep neural networks (DNNs) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation so that the resulting deformed input is misclassified by the network. These findings emphasize the differences between the ways DNNs and humans classify patterns and raise a question of designing learning algorithms that more accurately mimic human perception compared to the existing methods.

Our article examines these questions within the framework of dense associative memory (DAM) models. These models are defined by the energy function, with higher-order (higher than quadratic) interactions between the neurons. We show that in the limit when the power of the interaction vertex in the energy function is sufficiently large, these models have the following three properties. First, the minima of the objective function are free from rubbish images, so that each minimum is a semantically meaningful pattern. Second, artificial patterns poised precisely at the decision boundary look ambiguous to human subjects and share aspects of both classes that are separated by that decision boundary. Third, adversarial images constructed by models with small power of the interaction vertex, which are equivalent to DNN with rectified linear units, fail to transfer to and fool the models with higher-order interactions. This opens up the possibility of using higher-order models for detecting and stopping malicious adversarial attacks. The results we present suggest that DAMs with higher-order energy functions are more robust to adversarial and rubbish inputs than DNNs with rectified linear units.

1  Introduction

In a recent paper Krotov and Hopfield (2016) proposed that dense associative memory (DAM) models with higher-order interactions in the energy function learn representations of the data, which strongly depend on the power of the interaction vertex. The network extracts features from the data for small values of this power, but as the power of the interaction vertex is increased, there is a gradual shift to a prototype-based representation, the two extreme regimes of pattern recognition known in cognitive psychology. Remarkably, there is a broad range of powers of the energy function, for which the representation of the data is already in the prototype regime, but the accuracy of classification is still competitive with the best available algorithms (based on DNN with rectified linear units, ReLUs). This suggests that the DAM models might behave very differently compared to the standard methods used in deep learning with respect to adversarial deformations.

In this article, we report three main results. First, using gradient descent in the pixel space, a set of “rubbish” images is constructed that correspond to the minima of the objective function used in training. This is done on the MNIST data set of handwritten digits using different values of the power of the interaction vertex, which is denoted by nn. For small values of the power nn, these images indeed look like speckled rubbish noise and do not have any semantic content for human vision, a result consistent with (Nguyen et al., 2015). However, as the power of the interaction vertex is increased, the images gradually become less speckled and more semantically meaningful. In the limit of very large n≈20,…,30n≈20,…,30, these images are no longer rubbish at all. They represent plausible images of handwritten digits that could possibly have been produced by a human. Second, starting from clean images from the data set, a set of adversarial images is constructed in such a way that each image is placed exactly on the decision boundary between two label classes. For small powers nn, these images look very similar to the initial clean image with a little bit of speckled noise added, but they are misclassified by the neural network, a result consistent with Szegedy et al. (2013). However, as the power of the interaction vertex is increased, these adversarial images become less and less similar to the initial clean image. In the limit of very large powers, these adversarial images look either like a morphed image of two digits (the initial clean image and another digit from the class that the deformation targets) or the initial digit superimposed on a ghost image from the target class. Either way, the interpretation of the artificial patterns generated by the neural net on the decision boundary requires the presence of another digit from the target class in addition to the initial seed from the data set and cannot be explained by simply adding noise to the initial clean image. Third, adversarial and rubbish images generated by models with small nn can be transferred to and fool another model with a small (but possibly different) value nn. However, they fail to transfer to models with large nn. Thus, rubbish and adversarial images generated by models with small nn cannot fool models with large nn. In contrast, the “rubbish” images generated by models with large nn can be transferred to models with small nn, but this is not a problem since those “rubbish” images are actually not rubbish at all and look like credible handwritten digits. These results suggest that the DAMs with a large power of the interaction vertex in the energy function better mimic the psychology of human visual perception than DAMs with a small power (at least on a simple MNIST data set). The latter are equivalent to DNNs with ReLUs (Krotov & Hopfield, 2016).

7  Discussion and Conclusion

Although modern machine learning techniques outperform humans on many classification tasks, there is a serious concern that they do not understand the structure of the training data. A clear demonstration of this lack of understanding was presented in Szegedy et al. (2013) and Nguyen et al. (2015), which showed two examples of nonsensical predictions of DNNs that contradict human visual perception: adversarial images and rubbish images. In this article, we propose that DAMs with higher-order interactions in the energy function produce more sensible interpretations (consistent with human vision) of adversarial and rubbish images on MNIST. We argue that these models better mimic human visual perception than DNNs with ReLUs.

A possible explanation of adversarial examples, pertaining to neural networks being too linear, was given in Goodfellow et al. (2014). Our explanation follows the same line of thought with some differences. One result of Krotov and Hopfield (2016) is that DAMs with large powers of the interaction vertex in the energy function are dual to feedforward neural nets with highly nonlinear activation functions—the rectified polynomials of higher degrees. From the perspective of this duality, one might expect that by simply replacing the ReLUs in DNNs by the higher rectified polynomials, one might solve the problem of adversarial and rubbish images for a sufficiently large power of the activation function. We tried that and discovered that although DNNs with higher rectified polynomials alone perform better than DNNs with ReLUs from the adversarial perspective, they are worse than DAMs with the update rule, equation 2.1. These observations need further comprehensive investigation. Thus, simply changing ReLUs to higher rectified polynomials is not enough to get rid of adversarial problems, and other aspects of the training algorithm presented in section 2 are important.

For all values of nn studied in this article, the classification decision is made based on the votes of both feature and prototype detectors. These votes are “undemocratic,” so that the larger nn is, the stronger the prototype votes are. In the extreme limit n→∞n→∞, DAM reduces to the nearest-neighbor classifier. In this regime, DAMs are somewhat similar to RBF networks in which the hidden units compute a gaussian weight around a set of memorized templates. RBF networks are resistant to adversarial examples (Goodfellow et al., 2014) but are poor at generalization, as are DAMs at n→∞n→∞. In a sense, the contribution of this study is the identification of a strength of nonlinearity of the neural networks (power nn) for which the generalization ability is still strong but the adversarial problems are already sufficiently mitigated. In all four models that we have discussed, the power nn is chosen in such a way that features cooperate with prototypes to have a good generalization performance. Thus, it is important to keep nnlarge but finite, so that DAMs are not in the pure RBF regime.

There are two straightforward ideas for possible extensions of this work. First, it would be interesting to complement the proposed training procedure with adversarial training—in other words, to train the large nn networks using the algorithm of section 2but on a combination of clean images and adversarial images, along the lines of Goodfellow et al. (2014) and Nøkland (2015). We expect that this should further increase the robustness to adversarial examples and increase the classification accuracy on the clean images. Second, it would be interesting to investigate the proposed methods in the convolutional setting. Naively, one expects that the adversarial problems are more severe in the fully connected networks than in the convolutional networks. For this reason, we used the fully connected networks for our experiments. We expect that the training algorithm of section 2 can be combined with convolutional layers to better describe images.

Although more work is required to fully resolve the problem of adversarial and rubbish images, we believe that this article has identified a promising computational regime that significantly mitigates the vulnerability of neural networks to adversarial and rubbish images and that remains little investigated.

参考:

https://arxiv.org/pdf/1701.00939.pdf

https://github.com/DimaKrotov/Dense_Associative_Memory/blob/master/Dense_Associative_Memory_training.ipynb

本文分享自微信公众号 - CreateAMind(createamind)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2019-05-31

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Yoshua Bengio 3篇强化学习论文学习disentangling 特征

    用户1908973
  • 逆强化学习-学习人先验的动机

    LEARNING A PRIOR OVER INTENT VIA META-INVERSE REINFORCEMENT LEARNING

    用户1908973
  • TCN v2 + 3Dconv 运动信息

    https://sites.google.com/view/actionablerepresentations

    用户1908973
  • 网络系统的可扩展多智能体强化学习(CS AI)

    长期以来,人们已经认识到,由于状态和操作空间的大小在代理程序数量中呈指数级增长,因此多代理程序强化学习(MARL)面临着重大的可伸缩性问题。在本文中,我们确定了...

    刘子蔚
  • 记录一次Stack上关于“数学之美”的brainstorm

    math.stackexchange.com是stackExchange上一个和数学有关的论坛, 有点类似我们的贴吧, 当然质量肯定不可相提并论的.

    Jean
  • 用数学为爱情保鲜

    16/5/22 数学的力量 爱情数学 心得: 数学的力量是很强大的,它存在于我们的生活中,影响着我们的生活,无处不在。 说得简单一些,数学就是一门研究...

    杨熹
  • 霍克斯模型的电信模式揭示了人际关系的动态和个性特征(社会和信息网络)

    我们的手机包含了大量关于我们的私人信息,这不是新闻,这也是为什么我们要尽量保证手机的安全。但即使是我们通信方式的痕迹,也能看出不少关于我们的信息。在这项工作中,...

    Jillchen996
  • Social networks and health: Communicable but not infectious

    Harvard Men’s Health Watch Poet and pastor John Donne famously proclaimed “No ma...

    互联网金融打杂
  • CentOS 7 install mongo redis

    MongoDB is a document-oriented database that is free and open-source. It is clas...

    heidsoft
  • 关于深度学习自然梯度的局部性(CS LG)

    我们研究了自然梯度法在深度贝叶斯网络(包括神经网络)中的学习。有两个与这种学习系统相关的自然几何体,包括可见单元和隐藏单元。一个几何体与完整系统相关,另一个几何...

    刘持诚

扫码关注云+社区

领取腾讯云代金券