【深度学习自动上色，数月工作几秒完成】开源神经网络图片上色技术解析

新智元

发布于 2018-03-22 10:21:26

1.9K0

发布于 2018-03-22 10:21:26

文章被收录于专栏：新智元

【新智元导读】本文是作者对Reddit社区用户Amir Avni深度学习上色机器人的实现，看完本文后，你也能打造媲美大师级着色效果的自动上色神经网络应用。此外，还有一个好处，即使失败了（例如本文头图，见上），你也可以说这是艺术：）

如今，上色都是人手工用Photoshop做的，一张图片要花好几个月才能完成，需要进行大量调查研究，光是其中的一张脸就需要多达20层图层。但是，基于深度神经网络的自动着色机器人，可以几秒钟就实现PS几个月的效果，而且成效越来越惊人。

下面，我们将分三个步骤展示如何打造你自己的着色神经网络。第一部分讲解核心逻辑。我们将构建一个40行代码的神经网络，作为“Alpha”着色机器人，这个代码片段实际上没有太多的魔法，但可以让你熟悉基本操作。

然后，我们将创建一个可以泛化的神经网络——“Beta”版本。Beta机器人能对以前没有看到的图像着色。

最后，我们将神经网络与一个分类器相结合，得到“最终”版本。我们将使用120万张图像训练过的Inception Resnet V2。为了让着色效果吸引眼球，我们将使用Unsplash（免费图库，里面的图片非常有艺术感和设计感）的人像作为数据，训练我们的神经网络。

核心技术拆解：自动着色=发现灰度与彩色间的特征

在本节中，我将概述如何渲染图像，数字颜色的基础知识以及神经网络的主要逻辑。

黑白图像可以在像素网格中表示。每个像素具有对应于其亮度的值，范围为0 - 255，从黑色到白色。

彩色图像由三层组成：红色层，绿色层和蓝色层。直观地，你可能会认为植物只存在于绿色层。但是，如下图所示，绿色的叶子在三个通道中都有。这些层不仅可以确定颜色，还可以确定亮度。

为了得到白色这个颜色，需要将所有颜色均匀分布。通过添加等量的红色和蓝色，绿色会变得更亮。因此，彩色图像使用三层对颜色和对比度进行编码：

就像黑白图像一样，彩色图像中每个图层的值也都为0 - 255。值为0意味着该图层中没有颜色。如果所有颜色通道的值都为0，则图像像素为黑色。

神经网络会创建输入值和输出值之间的关系。更准确地说，着色任务实际上就是网络需要找到链接灰度图像与彩色图像的特征。

因此，着色机器人要寻找的，就是将灰度值网格链接到三色网格的特征。

Alpha版本：40行代码，实现基础着色机器人

我们从简单的神经网络开始，给一张女性脸部图像（见下）着色。

只需40行代码，我们就能实现以下转换。中间的图像是用神经网络完成的，右边的图片是原始的彩色照片。当然，这里的网络使用了相同的图像做训练和测试，稍后我们将在Beta版本中再来讲这一点。

颜色空间

首先，我们使用一种算法来改变颜色通道，从RGB到Lab。L表示亮度，a和b分别表示颜色光谱，绿-红和蓝-黄。

如下所示，Lab编码的图像有一层灰度，将三层颜色层压成两层。这意味着我们可以在最终预测中使用原始的灰度图像。此外，我们只有两个通道做预测。

人类眼睛中有94％的细胞是确定亮度的，这是个科学事实。只有6％的受体被用作颜色的传感器。如上图所示，灰度图像比彩色层更加清晰。这也是我们最终预测中保持灰度图像的另一个原因。

从黑白到彩色

我们的最终预测是这样的。我们有一个输入灰度层，我们想预测Lab中的两个彩色层。要创建最终的彩色图像，我们将纳入用于输入的L/灰度图像，从而创建一个Lab图像。

我们使用卷积滤波器将一层转成两层。你可以将它们视为3D眼镜中的蓝/红滤镜。每个滤波器确定我们在图片中看到的内容，可以突出显示或删除某些东西，从图片中提取信息。网络可以从滤波器中创建新的图像，也可以将多个滤波器组合成一个图像。

卷积神经网络的每个滤波器都自动调整，以帮助预期的结果。我们从堆叠数百个滤镜开始，然后将它们缩小为两层，即a层和b层。

下面是FloydHub代码：

# Get images
image = img_to_array(load_img('woman.png'))
image = np.array(image, dtype=float)

# Import map images into the lab colorspace

X = rgb2lab(1.0/255*image)[:,:,0]
Y = rgb2lab(1.0/255*image)[:,:,1:]
Y = Y / 128X = X.reshape(1, 400, 400, 1)
Y = Y.reshape(1, 400, 400, 2)

model = Sequential()
model.add(InputLayer(input_shape=(None, None, 1)))

# Building the neural network

model = Sequential()
model.add(InputLayer(input_shape=(None, None, 1)))
model.add(Conv2D(8, (3, 3), activation='relu', padding='same', strides=2))
model.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(16, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(16, (3, 3), activation='relu', padding='same', strides=2))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', strides=2))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(16, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))

# Finish model

model.compile(optimizer='rmsprop',loss='mse')

#Train the neural network

model.fit(x=X, y=Y, batch_size=1, epochs=3000)
print(model.evaluate(X, Y, batch_size=1))

# Output colorizations

output = model.predict(X)
output = output * 128
canvas = np.zeros((400, 400, 3))
canvas[:,:,0] = X[0][:,:,0]
canvas[:,:,1:] = output[0]
imsave("img_result.png", lab2rgb(cur))
imsave("img_gray_scale.png", rgb2gray(lab2rgb(cur)))

运行上述网络的FloydHub指令：

floyd run --data emilwallner/datasets/colornet/2:data --mode jupyter --tensorboard

Beta版本：为没有见过的图像着色

Alpha版本不能很好地给未经训练的图像着色。接下来，我们将在Beta版本中做到这一点——将上面的将神经网络泛化。

以下是使用Beta版本对测试图像着色的结果。

我们没有使用ImageNet，而是在FloydHub上创建了一个高质量图像的公共数据集。图片来自Unsplash——公开的专业摄影师创意图片。这个数据集包括9.5万个训练图像和500个测试图像。

特征提取器

我们的神经网络要做的是发现将灰度图像与其彩色版本相链接的特征。

试想，你必须给黑白图像上色，但一次只能看到9个像素。你可以从左上角到右下角扫描每个图像，并尝试预测每个像素应该是什么颜色。

例如，这9个像素就是上面那张女性人脸照片上鼻孔的边缘。要很好的着色几乎是不可能的，所以你必须把它分解成好几个步骤。

首先，寻找简单的模式：对角线，所有黑色像素等。在每个滤波器的扫描方块中寻找相同的精确的模式，并删除不匹配的像素。这样，就可以从64个迷你滤波器生成64个新图像。

如果再次扫描图像，你会看到已经检测到的相同的模式。要获得对图像更高级别的理解，你可以将图像尺寸减小一半。

你仍然只有3×3个滤波器来扫描每个图像。但是，通过将新的9个像素与较低级别的滤波器相结合，可以检测更复杂的图案。一个像素组合可能形成一个半圆，一个小点或一条线。再一次地，你从图像中反复提取相同的图案。这次，你会生成128个新的过滤图像。

经过几个步骤，生成的过滤图像可能看起来像这样：

这个过程就像大多数处理视觉的神经网络，也即卷积神经网络的行为。结合几个过滤图像了解图像中的上下文。

以下为Beta版本的FloydHub代码：

# Get imagesX = []for filename in os.listdir('../Train/'):
    X.append(img_to_array(load_img('../Train/'+filename)))

X = np.array(X, dtype=float)

# Set up training and test data

split = int(0.95*len(X))Xtrain = X[:split]Xtrain = 1.0/255*Xtrain
#Design the neural network

model = Sequential()model.add(InputLayer(input_shape=(256, 256, 1)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', strides=2))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same', strides=2))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same', strides=2))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
model.add(UpSampling2D((2, 2)))

#Finish model

model.compile(optimizer='rmsprop', loss='mse')

# Image transformer

datagen = ImageDataGenerator(
        shear_range=0.2,
        zoom_range=0.2,
        rotation_range=20,
        horizontal_flip=True)

# Generate training data

batch_size = 50def image_a_b_gen(batch_size):
    for batch in datagen.flow(Xtrain, batch_size=batch_size):
        lab_batch = rgb2lab(batch)
        X_batch = lab_batch[:,:,:,0]
        Y_batch = lab_batch[:,:,:,1:] / 128
        yield (X_batch.reshape(X_batch.shape+(1,)), Y_batch)

# Train model

TensorBoard(log_dir='/output')
model.fit_generator(image_a_b_gen(batch_size), steps_per_epoch=10000, epochs=1)

# Test images

Xtest = rgb2lab(1.0/255*X[split:])[:,:,:,0]
Xtest = Xtest.reshape(Xtest.shape+(1,))
Ytest = rgb2lab(1.0/255*X[split:])[:,:,:,1:]
Ytest = Ytest / 128

print model.evaluate(Xtest, Ytest, batch_size=batch_size)

# Load black and white images

color_me = []for filename in os.listdir('../Test/'):
        color_me.append(img_to_array(load_img('../Test/'+filename)))
color_me = np.array(color_me, dtype=float)
color_me = rgb2lab(1.0/255*color_me)[:,:,:,0]
color_me = color_me.reshape(color_me.shape+(1,))

# Test model

output = model.predict(color_me)
output = output * 128

# Output colorizations

for i in range(len(output)):
        cur = np.zeros((256, 256, 3))
        cur[:,:,0] = color_me[i][:,:,0]
        cur[:,:,1:] = output[i]
        imsave("result/img_"+str(i)+".png", lab2rgb(cur))

运行Beta网络的FloydHub指令是：

floyd run --data emilwallner/datasets/colornet/2:data --mode jupyter --tensorboar

最终版本

我们的最终版本着色神经网络有四个组成部分。我们将之前的网络拆分成编码器和解码器，在这之间使用了一个融合层。

与编码器相并列的是当今最强大的一个分类器——Inception Resnet v2，经过1.2M图像训练的网络。我们提取了分类层，并将其与编码器的输出进行合并。

通过将学习从分类器转移到着色网络，网络可以了解图片中的内容。因此，使网络能够将对象表示与着色方案相匹配。

以下是一些验证图像，仅使用20张图像来训练网络。

大多数图像变得很差，但是由于大量验证/测试集（2,500张图像），我设法找到了一些看上去还不错的图像。在更多的图像上进行训练可以获得更加一致的结果，但是大部分都是棕色的。这里是我运行的实验的完整列表，包括验证图像。

# Get images

X = []for filename in os.listdir('/data/images/Train/'):
    X.append(img_to_array(load_img('/data/images/Train/'+filename)))
X = np.array(X, dtype=float)Xtrain = 1.0/255*X

#Load weights

inception = InceptionResNetV2(weights=None, include_top=True)
inception.load_weights('/data/inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5')
inception.graph = tf.get_default_graph()embed_input = Input(shape=(1000,))

#Encoder

encoder_input = Input(shape=(256, 256, 1,))
encoder_output = Conv2D(64, (3,3), activation='relu', padding='same', strides=2)(encoder_input)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same', strides=2)(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same', strides=2)(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)

#Fusion

fusion_output = RepeatVector(32 * 32)(embed_input) 
fusion_output = Reshape(([32, 32, 1000]))(fusion_output)
fusion_output = concatenate([encoder_output, fusion_output], axis=3)
fusion_output = Conv2D(256, (1, 1), activation='relu', padding='same')(fusion_output) 

#Decoder

decoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(fusion_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = Conv2D(16, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=[encoder_input, embed_input], outputs=decoder_output)

#Create embedding

def create_inception_embedding(grayscaled_rgb):
    grayscaled_rgb_resized = []
    for i in grayscaled_rgb:
        i = resize(i, (299, 299, 3), mode='constant')
        grayscaled_rgb_resized.append(i)
    grayscaled_rgb_resized = np.array(grayscaled_rgb_resized)
    grayscaled_rgb_resized = preprocess_input(grayscaled_rgb_resized)
    with inception.graph.as_default():
        embed = inception.predict(grayscaled_rgb_resized)
    return embed# Image transformerdatagen = ImageDataGenerator(
        shear_range=0.4,
        zoom_range=0.4,
        rotation_range=40,
        horizontal_flip=True)#Generate training databatch_size = 20def image_a_b_gen(batch_size):
    for batch in datagen.flow(Xtrain, batch_size=batch_size):
        grayscaled_rgb = gray2rgb(rgb2gray(batch))
        embed = create_inception_embedding(grayscaled_rgb)
        lab_batch = rgb2lab(batch)
        X_batch = lab_batch[:,:,:,0]
        X_batch = X_batch.reshape(X_batch.shape+(1,))
        Y_batch = lab_batch[:,:,:,1:] / 128
        yield ([X_batch, create_inception_embedding(grayscaled_rgb)], Y_batch)

#Train model      

tensorboard = TensorBoard(log_dir="/output")
model.compile(optimizer='adam', loss='mse')
model.fit_generator(image_a_b_gen(batch_size), callbacks=[tensorboard], epochs=1000, steps_per_epoch=20)

#Make a prediction on the unseen images

color_me = []for filename in os.listdir('../Test/'):
    color_me.append(img_to_array(load_img('../Test/'+filename)))
color_me = np.array(color_me, dtype=float)
color_me = 1.0/255*color_me
color_me = gray2rgb(rgb2gray(color_me))
color_me_embed = create_inception_embedding(color_me)
color_me = rgb2lab(color_me)[:,:,:,0]
color_me = color_me.reshape(color_me.shape+(1,))

# Test model

output = model.predict([color_me, color_me_embed])
output = output * 128

# Output colorizations

for i in range(len(output)):
    cur = np.zeros((256, 256, 3))
    cur[:,:,0] = color_me[i][:,:,0]
    cur[:,:,1:] = output[i]
    imsave("result/img_"+str(i)+".png", lab2rgb(cur))

运行最终版网络的 FloydHub指令：

floyd run --data emilwallner/datasets/colornet/2:data --mode jupyter --tensorboard

要点总结

先小批量做很多实验以后再放大。在30次实验以后，我仍然发现而来很多错误。程序能运行并不等同于能用。调试神经网络有多复杂不用我多说。
多样化的数据集让着色效果呈现棕色（brownish）。
每幅图片的大小都需要确定，比例一致。
创建数据集：禁用.DS_Store文件，这简直把我弄疯了；大胆创新，我最后用Chrome控制台脚本和扩展程序来下载文件；复制原始文档， structure清洗脚本。
从公开的项目中多看多学。为了得到一个粗略的代码，我浏览了Github上关于着色的50-100项目。
事情并不总是像预期的那样工作。一开始，我的网络只能创建红色和黄色的颜色。起初，我有一个Relu激活功能进行最终激活，因为它只将数字映射为正数，因此无法创建负值，蓝色和绿色光谱。后来我添加了一个Tanh激活函数并映射Y值修复这个问题。
多问人，给他们写邮件！

如果你还想了解更多，这里是着色机器人Alpha版本的Jupyter Notebook（https://www.floydhub.com/emilwallner/projects/color/43/code/Alpha-version/alpha_version.ipynb）。你还可以查看FloydHub（https://www.floydhub.com/emilwallner/projects/color/43/code）和GitHub上的这三个版本（https://github.com/emilwallner/Coloring-greyscale-images-in-Keras），以及我在FloydHub的GPU云上运行的所有实验的代码（https://www.floydhub.com/emilwallner/projects/color/jobs）。

原文链接：https://blog.floydhub.com/colorizing-b&w-photos-with-neural-networks/

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2017-10-15，如有侵权请联系 cloudcommunity@tencent.com 删除

深度学习