文章/答案/技术大牛

发布

CAM，Grad-CAM，Grad-CAM＋可视化CNN方式的代码实现和对比

文章来源：企鹅号 - deephub

当使用神经网络时，我们可以通过它的准确性来评估模型的性能，但是当涉及到计算机视觉问题时，不仅要有最好的准确性，还要有可解释性和对哪些特征/数据点有助于做出决策的理解。模型专注于正确的特征比模型的准确性更重要。

理解CNN的方法主要有类激活图(Class Activation Maps, CAM)、梯度加权类激活图(Gradient Weighted Class Activation Mapping, Grad-CAM)和优化的 Grad-CAM（ Grad-CAM++）。它们的思想都是一样的：如果我们取最后一个卷积层的输出特征映射并对它们施加权重，就可以得到一个热图，可以表明输入图像中哪些部分的权重高（代表了整个图的特征）。

Class Activation Maps

CAM是一种将CNN所看到或关注的内容可视化并为我们生成类输出的方法。

通过将图像传递给CNN，我们获得了相同图像的低分辨率特征图。

CAM的思想是，删除那些完全连接的神经网络，并用全局平均池化层代替它们，特征图中所有像素的平均值就是它的全局平均值。通过将GAP应用于所有特征映射将获得它们的标量值。

对于这些标量值，我们应用表明每个特征映射对特定类重要性的权重，权重是通过训练一个线性模型来学习的。

激活图将是所有这些特征图的加权组合。

def generate_cam(input_model, image, layer_name='block5_conv3', H=224, W=224):

cls = np.argmax(input_model.predict(image)) # Obtain the predicted class

conv_output = input_model.get_layer(layer_name).output #Get the weights of the last output layer

last_conv_layer_model = keras.Model(input_model.inputs, conv_output) #Create a model with the last output layer

class_weights = input_model.get_layer(layer_name).get_weights()[0] # Get the weights of the output layer\

class_weights = class_weights[0,:,:,:]

class_weights = np.mean(class_weights, axis=(0, 1))

last_conv_output = last_conv_layer_model.predict(image) #The feature map output from last output layer

last_conv_output = last_conv_output[0, :]

cam = np.dot(last_conv_output, class_weights)

cam = zoom(cam, H/cam.shape[0]) #Spatial Interpolation/zooming to image size

cam = cam / np.max(cam) #Normalizing the gradcam

return cam

但是CAM有一个最大的缺点就是必须重新训练模型才能得到全局平均池化后得到的权重。对于每一类必须学习一个线性模型。也就是说将有n个权重(等于最后一层的过滤器)* n个线性模型(等于类)。并且还必须修改网络架构来创建CAM这对于现有的模型来说改动太大，所以Grad-CAM解决了这些缺点。

Grad-CAM（ Gradient Weighted Class Activation Mapping）

Grad-CAM背后的思想是，依赖于最后一个卷积层的特征映射中使用的梯度，而不是使用网络权重。这些梯度是通过反向传播得到的。

这不仅解决了再训练问题，还解决了网络架构修改问题，因为只使用梯度而不使用GAP层。

我们只要在最后一个卷积层中计算用于顶部预测类的特征映射的梯度。然后我们对这些权重应用全局平均。权重与最后一层得到的特征映射的点积就是Grad-CAM输出。然后通过在其上应用ReLU，识别图像中仅对我们的图像有积极贡献的部分。

最后就是将Grad-CAM调整为图像大小并规范化，以便它可以叠加在图像上。

def grad_cam(input_model, image, layer_name='block5_conv3',H=224,W=224):

cls = np.argmax(input_model.predict(image)) #Get the predicted class

y_c = input_model.output[0, cls] #Probability Score

conv_output = input_model.get_layer(layer_name).output #Tensor of the last layer of cnn

grads = K.gradients(y_c, conv_output)[0] #Gradients of the predicted class wrt conv_output layer

get_output = K.function([input_model.input], [conv_output, grads])

output, grads_val = get_output([image]) #Gives output of image till conv_output layer and the gradient values at that level

output, grads_val = output[0, :], grads_val[0, :, :, :]

weights = np.mean(grads_val, axis=(0, 1)) #Mean of gradients which acts as our weights

cam = np.dot(output, weights) #Grad-CAM output

cam = np.maximum(cam, 0) #Applying Relu

cam = zoom(cam,H/cam.shape[0]) #Spatial Interpolation/zooming to image size

cam = cam / cam.max() #Normalizing the gradcam

return camGrad-CAM++

Grad-CAM++不仅包括gradcam技术，它增加了引导反向传播，只通过类别预测的正梯度进行反向传播。

Grad-CAM++这种优化的原因是因为Grad-CAM在识别和关注多次出现的对象或具有低空间占用的对象方面存在问题。

所以Grad-CAM++给予与预测类相关的梯度像素更多的重要性（正梯度），通过使用更大的因子而不是像Grad-CAM那样使用常数因子来缩放它们。这个比例因子在代码中用alpha表示。

def grad_cam_plus(input_model, image, layer_name='block5_conv3',H=224,W=224):

cls = np.argmax(input_model.predict(image))

y_c = input_model.output[0, cls]

conv_output = input_model.get_layer(layer_name).output

grads = K.gradients(y_c, conv_output)[0]

first = K.exp(y_c)*grads #Variables used to calculate first second and third gradients

second = K.exp(y_c)*grads*grads

third = K.exp(y_c)*grads*grads*grads

#Gradient calculation

get_output = K.function([input_model.input], [y_c,first,second,third, conv_output, grads])

y_c, conv_first_grad, conv_second_grad,conv_third_grad, conv_output, grads_val = get_output([img])

global_sum = np.sum(conv_output[0].reshape((-1,conv_first_grad[0].shape[2])), axis=0)

#Used to calculate the alpha values for each spatial location

alpha_num = conv_second_grad[0]

alpha_denom = conv_second_grad[0]*2.0 + conv_third_grad[0]*global_sum.reshape((1,1,conv_first_grad[0].shape[2]))

alpha_denom = np.where(alpha_denom != 0.0, alpha_denom, np.ones(alpha_denom.shape))

alphas = alpha_num/alpha_denom

#Calculating the weights and alpha's which is the scale at which we multiply the weights with more importance

weights = np.maximum(conv_first_grad[0], 0.0)

alpha_normalization_constant = np.sum(np.sum(alphas, axis=0),axis=0)

alphas /= alpha_normalization_constant.reshape((1,1,conv_first_grad[0].shape[2])) #Normalizing alpha

#Weights with alpha multiplied to get spatial importance

deep_linearization_weights = np.sum((weights*alphas).reshape((-1,conv_first_grad[0].shape[2])),axis=0)

grad_CAM_map = np.sum(deep_linearization_weights*conv_output[0], axis=2) #Grad-CAM++ map

cam = np.maximum(grad_CAM_map, 0)

cam = zoom(cam,H/cam.shape[0])

cam = cam / np.max(cam)

return cam结果对比

这里我们使用VGG16，对一些图像进行了比较，下图中可以看到CAM、Grad-CAM和Grad-CAM++的看法有多么不同。虽然它们都主要集中在它的上半身，但Grad-CAM++能够将其整体视为重要部分，而CAM则将其某些部分视为非常重要的特征，而将一些部分视为其预测的辅助。而Grad-CAM只关注它的冠和翅膀作为决策的重要特征。

对于这张风筝的图像，CAM显示它关注的是除了风筝之外的所有东西（也就是天空），但是使用gradcam则看到到模型关注的是风筝，而gradcam ++通过增加重要的突出空间进一步加强了这一点。这里需要注意的是，模型错误地将其分类为降落伞，但风筝类紧随其后。也就是说，其实CAM更好的捕捉到了错误的原因。

发表于: 2023-06-082023-06-08 08:57:49
原文链接：https://kuaibao.qq.com/s/20230608A01LB100?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

CAM，Grad-CAM，Grad-CAM＋可视化CNN方式的代码实现和对比

相关快讯

扫码

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐