文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么tf.contrib.layers.instance_norm层包含StopGradient操作？

问为什么tf.contrib.layers.instance_norm层包含StopGradient操作？
EN

Stack Overflow用户

提问于 2020-11-11 05:06:33

回答 1查看 196关注 0票数 3

为什么tf.contrib.layers.instance_norm层包含StopGradient操作？也就是说，为什么需要它？

似乎在更简单的层tf.nn.moments中也有StopGradient (它可以是tf.contrib.layers.instance_norm的构建块)。

x_m, x_v = tf.nn.moments(x, [1, 2], keep_dims=True)

我还在tf.nn.moments源代码中找到了关于StopGradient的注释：

# The dynamic range of fp16 is too limited to support the collection of
# sufficient statistics. As a workaround we simply perform the operations
# on 32-bit floats before converting the mean and variance back to fp16
y = math_ops.cast(x, dtypes.float32) if x.dtype == dtypes.float16 else x
# Compute true mean while keeping the dims for proper broadcasting.
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
# sample variance, not unbiased variance
# Note: stop_gradient does not change the gradient that gets
#       backpropagated to the mean from the variance calculation,
#       because that gradient is zero
variance = math_ops.reduce_mean(
    math_ops.squared_difference(y, array_ops.stop_gradient(mean)),
    axes,
    keepdims=True,
    name="variance")

所以这是一种优化，因为梯度总是为零？

tensorflow

deep-learning

batch-normalization

回答 1

Stack Overflow用户

发布于 2020-11-11 08:35:30

尝试回答。

这个设计告诉我们，最小化二阶矩，我们不想通过一阶矩传播梯度。这有意义吗？如果我们试图最小化E[x^2]-E[x]^2，我们就会最小化E[x^2]，同时最大化E[x]^2。第一项将减少每个元素的绝对值(将它们拖到中心)。第二项将通过梯度增加所有值，这将无助于最小化方差，但可能会对其他梯度路径产生负面影响。

因此，我们不会通过一阶矩传播二阶矩的梯度，因为这个梯度不会影响二阶矩，至少在使用普通的SGD时是这样。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64776769

复制

相似问题

问为什么tf.contrib.layers.instance_norm层包含StopGradient操作？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么tf.contrib.layers.instance_norm层包含StopGradient操作？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么tf.contrib.layers.instance_norm层包含StopGradient操作？
EN