FCN重写笔记

平凡的学生族

发布于 2019-05-25 09:59:00

8490

发布于 2019-05-25 09:59:00

文章被收录于专栏：后端技术

此文只是重写时的笔记，正式发布的感想请看相关阅读

tf.squeeze 和 tf.expand_dims

参考tf.expand_dims和tf.squeeze函数

另外，当指定了squeeze_dims时，对应维度大小必须为1

glob

glob模块

collections.namedtuples

不准确地说，它相当于一个只有属性的class

示例: Examples = collections.namedtuple("Examples", "paths, inputs, targets, count, steps_per_epoch")

examples = Examples(
        paths=paths_batch,
        inputs=inputs_batch,
        targets=targets_batch,
        count=len(input_paths),
        steps_per_epoch=steps_per_epoch,
    )

scipy.misc.imread

scipy.misc.imread官方教程

scipy.misc.imresize

scipy.misc.imresize官方教程

不知道inter='nearest'的作用，之后要补齐。

resize_image = misc.imresize(image,
                                         [resize_size, resize_size], interp='nearest')

而且它会改变图像的数值，标准归一化到0-255的区间

arr = np.array([[[100, 2, 220], [3, 4, 5]], [[1, 2, 3], [3, 4, 5]]])

print(type(arr))
print(arr.shape)

resize_size = 4
arr = misc.imresize(arr, [resize_size, resize_size], interp='nearest')
print(type(arr))
print(arr.shape)
print(arr)

输出

<class 'numpy.ndarray'>
(2, 2, 3)
<class 'numpy.ndarray'>
(4, 4, 3)
[[[115   1 255]
  [115   1 255]
  [  2   3   5]
  [  2   3   5]]

 [[115   1 255]
  [115   1 255]
  [  2   3   5]
  [  2   3   5]]

 [[  0   1   2]
  [  0   1   2]
  [  2   3   5]
  [  2   3   5]]

 [[  0   1   2]
  [  0   1   2]
  [  2   3   5]
  [  2   3   5]]]

输出，矩阵数值被改变了

查了一下原因，没仔细看，记录一下：

scipy.misc.imresize changes image range

噢原因是这个:

改用skimage.transform.resize吧

BatchDatsetReader._read_images

self.__channels = True
self.images = np.array([self._transform(filename['image']) for filename in self.files])
self.__channels = False
self.annotations = np.array(
    [np.expand_dims(self._transform(filename['annotation']), axis=3) for filename in self.files])

print ("self.images.shape:", self.images.shape)
print ("self.annotations.shape:", self.annotations.shape)

tf.train.saver

原代码:

saver.save(sess, FLAGS.logs_dir + "model.ckpt", itr)

官方文档

用法:

saver.save(sess, 'my-model', global_step=0) ==> filename: 'my-model-0'
...
saver.save(sess, 'my-model', global_step=1000) ==> filename: 'my-model-1000'

结果:

tf.nn.sparse_softmax_cross_entropy_with_logits

transposed convolution/deconvolution(转置卷积/逆卷积)

通过转置矩阵的方式说明转置卷积: Up-sampling with Transposed Convolution

动画演示: Convolution arithmetic

tf.nn.conv2d_transpose

tensorflow学习笔记(三十二):conv2d_transpose ("解卷积")

conv2d_transpose 中会计算 output_shape 能否通过给定的参数计算出 inputs的维度，如果不能，则报错

1. 数据集情况分析：

image数据大部分是三维的(h, w, 3)，但有少部分是灰度图，也就是二维的(h, w)

annotation数据则都是二维的(h, w)

因此处理image数据时，如果遇到二维的图片，要先转为三维且有3个通道的图片。

2. 遇到的问题

2.1 问题1 scipy.misc.imresize is deprecated

问题描述

原作者的代码中，图片的变形使用的是scipy.misc.imresize函数。

但我发现这个函数除了对图片变形，还会自行做一些多余的动作。它会把数组里的值标准归一化到0, 255的区间内，破坏图片原本的信息。

arr = np.array([[[100, 2, 220], [3, 4, 5]], [[1, 2, 3], [3, 4, 5]]])

print(type(arr))
print(arr.shape)

resize_size = 4
arr = misc.imresize(arr, [resize_size, resize_size], interp='nearest')
print(type(arr))
print(arr.shape)
print(arr)

输出

<class 'numpy.ndarray'>
(2, 2, 3)
<class 'numpy.ndarray'>
(4, 4, 3)
[[[115   1 255]
  [115   1 255]
  [  2   3   5]
  [  2   3   5]]

 [[115   1 255]
  [115   1 255]
  [  2   3   5]
  [  2   3   5]]

 [[  0   1   2]
  [  0   1   2]
  [  2   3   5]
  [  2   3   5]]

 [[  0   1   2]
  [  0   1   2]
  [  2   3   5]
  [  2   3   5]]]

解决方法

最后查阅官方文档才知道这个函数已经被废止。

于是我将对图片的操作都改用skimage库实现了。而对图片的变形则使用skimage.transform.resize函数。

2.2 问题2 ValueError: could not broadcast input array from shape (224,224,3) into shape (224,224)

问题描述

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    reader = ImageReader("train")
  File "/root/Desktop/FCN/ImageReader.py", line 58, in __init__
    self.image_list = np.array([self.readImage(record["image"]) for record in self.records])
ValueError: could not broadcast input array from shape (224,224,3) into shape (224,224)

在改用skimage库操作图片后，出现了无法把元素合并到一个数组的问题。对image里的图片的操作失败了。

查阅stackoverflow的问题发现原来是元素的维度并不统一。我原以为所有image里的图片都是三通道的，也就是(h, w, 3)的。这样如果我要得到固定尺寸的图片(比如224 * 224)，只需调用skimage.transform.resize，就能把图片转为(224, 224, 3)。理应所有图片都会被转换成(224, 224, 3)的维度。可是既然图片们无法共容在一个数组里，说明有的图片没有转换成这种维度。

问题原因

原来，image里并不是所有图片都是(h, w, 3)形式的，有的图片是灰度图(在20210张图片中有4张是灰度图)，也就是(h, w)形式。而我的代码没有考虑到这一点，导致这几张灰度图被转换后的维度错误。

解决方法

对于这几张灰度图，需要将其转换为三通道的形式。只需要把单通道上的值重复三次作为三个通道的值即可。

2.3 问题3 图片转换后内容被破坏

问题描述

在给skimage.transform.resize添加reserve_range = True设置后，发现转换后的图片内容完全被破坏。似乎维持值的范围会破坏图片的可见性。

问题原因

查阅了stackoverflow

原来pyplot.imshow只能显示0.0, 1.0范围的图片，而reserve_range = True会使图片仍然在0, 255范围内，且数据类型为float64，被以0.0, 1.0的范围来看待，这就无法正确显示了。

另外，查阅官方文档的reserve_range参数

preserve_range : bool, optional Whether to keep the original range of values. Otherwise, the input image is converted according to the conventions of img_as_float. 确实如果不设置reserve_range = True，函数会把值的范围标准归一化到0.0, 1.0内，也就是img_as_float. 问题解决 显示图片时先使用image = np.copy(old_image).astype('uint8')，把类型从float64转换为uint8即可。

2.4 查看源代码的卷积核维度

通过在源代码中添加如下代码可输出各层卷积核的维度

输出：

仅截取部分输出

根据输出，我发现源代码使用的是VGG-19，而论文中使用的是VGG-16。两者的效果应该差不多，为了保持一致，我依旧按照VGG-19来叠加。

2.5 tf.layers.conv2d_transpose的放大倍数

tf.layers.conv2d_transpose只能指定strides来调整输出图片的尺寸。

strides = [2, 2]时放大两倍，strides = [8, 8]时放大8倍

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018.05.17 ，如有侵权请联系 cloudcommunity@tencent.com 删除

官方文档

编程算法

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

官方文档

编程算法

登录后参与评论

0 条评论

热度