你画我猜

故事尾音

发布于 2019-12-18 16:10:15

9200

发布于 2019-12-18 16:10:15

文章被收录于专栏：NLP算法工程师之路

介绍

Quick Draw 数据集是一个包含5000万张图画的集合，分成了345个类别，这些图画都来自于Quick, Draw! 游戏的玩家。

资源

数据集地址：https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/?pli=1
数据集官网：https://quickdraw.withgoogle.com/data
Quick, Draw! 在线体验：https://quickdraw.withgoogle.com
AutoDraw 在线体验：https://www.autodraw.com
相关论文：https://arxiv.org/abs/1704.03477

模型压缩

自从AlexNet一举夺得ILSVRC 2012 ImageNet图像分类竞赛的冠军后，卷积神经网络（CNN）的热潮便席卷了整个计算机视觉领域。CNN模型火速替代了传统人工设计（hand-crafted）特征和分类器，不仅提供了一种端到端的处理方法，还大幅度地刷新了各个图像竞赛任务的精度，更甚者超越了人眼的精度（LFW人脸识别任务）。CNN模型在不断逼近计算机视觉任务的精度极限的同时，其深度和尺寸也在成倍增长。所有模型压缩方法的核心思想是——在保证精度的同时使用最少的参数。

下面是几种经典模型的尺寸和参数数量对比：

Model	Model Size(MB)	参数(百万)
AlexNet	>200	60
VGG16	>500	138
GoogleNet	~50	6.8
Inception-v3	90~100	23.2

随之而来的是一个很尴尬的场景：如此巨大的模型只能在有限的平台下使用，根本无法移植到移动端和嵌入式芯片当中。就算想通过网络传输，但较高的带宽占用也让很多用户望而生畏。另一方面，大尺寸的模型也对设备功耗和运行速度带来了巨大的挑战。因此这样的模型距离实用还有一段距离。

在这样的情形下，模型小型化与加速成了亟待解决的问题。其实早期就有学者提出了一系列CNN模型压缩方法，包括权值剪值（prunning）和矩阵SVD分解等，但压缩率和效率还远不能令人满意。

近年来，关于模型小型化的算法从压缩角度上可以大致分为两类：从模型权重数值角度压缩和从网络架构角度压缩。另一方面，从兼顾计算速度方面，又可以划分为：仅压缩尺寸和压缩尺寸的同时提升速度。

GAP替换全连接

Golbal Average Pooling 第一次出现在论文Network in Network中，后来又很多工作延续使用了GAP，实验证明：Global Average Pooling确实可以提高CNN效果。

Fully Connected layer

很长一段时间以来，全连接网络一直是CNN分类网络的标配结构。一般在全连接后会有激活函数来做分类，假设这个激活函数是一个多分类softmax，那么全连接网络的作用就是将最后一层卷积得到的feature map stretch成向量，对这个向量做乘法，最终降低其维度，然后输入到softmax层中得到对应的每个类别的得分。

全连接层如此的重要，以至于全连接层过多的参数重要到会造成过拟合，所以也会有一些方法专门用来解决过拟合，比如dropout。

Global Average Pooling

既然全连接网络可以使feature map的维度减少，进而输入到softmax，但是又会造成过拟合，是不是可以用pooling来代替全连接。

答案是肯定的，Network in Network工作使用GAP来取代了最后的全连接层，直接实现了降维，更重要的是极大地减少了网络的参数(CNN网络中占比最大的参数其实后面的全连接层)。Global average pooling的结构如下图所示:

GAP的意义是对整个网络从结构上做正则化防止过拟合。既要参数少避免全连接带来的过拟合风险，又要能达到全连接一样的转换功能，怎么做呢？直接从feature map的通道上下手，如果我们最终有1000类，那么最后一层卷积输出的feature map就只有1000个channel，然后对这个feature map应用全局池化，输出长度为1000的向量，这就相当于剔除了全连接层黑箱子操作的特征，直接赋予了每个channel实际的类别意义。

实验证明，这种方法是非常有效的，这样做还有另外一个好处：不用在乎网络输入的图像尺寸。同时需要注意的是，使用gap也有可能造成收敛变慢。

Reference

SqueezeNet

SqueezeNet是F. N. Iandola,S.Han等人于2016年的论文《SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size》中提出的一个小型化的网络模型结构，该网络能在保证不损失精度的同时，将原始AlexNet压缩至原来的510倍左右（< 0.5MB）。

SqueezeNet提出了3点网络结构设计策略：

策略 1.将3x3卷积核替换为1x1卷积核。这一策略很好理解，因为1个1x1卷积核的参数是3x3卷积核参数的1/9，这一改动理论上可以将模型尺寸压缩9倍。
策略 2.减小输入到3x3卷积核的输入通道数。我们知道，对于一个采用3x3卷积核的卷积层，该层所有卷积参数的数量（不考虑偏置）为： \begin{equation} P=N*C*3*3 \end{equation}式中，N是卷积核的数量，也即输出通道数，C是输入通道数。因此，为了保证减小网络参数，不仅仅需要减少3x3卷积核的数量，还需减少输入到3x3卷积核的输入通道数量，即式中C的数量。
策略 3.尽可能的将降采样放在网络后面的层中。在卷积神经网络中，每层输出的特征图（feature map）是否下采样是由卷积层的步长或者池化层决定的。而一个重要的观点是：分辨率越大的特征图（延迟降采样）可以带来更高的分类精度，而这一观点从直觉上也可以很好理解，因为分辨率越大的输入能够提供的信息就越多。

下面举一个例子，假如输入为28×28×192，输出feature map通道数为128。那么，直接接3×3卷积，参数量为3×3×192×128=221184。

如果先用1×1卷积进行降维到96个通道，然后再用3×3升维到128，则参数量为：1×1×192×96+3×3×96×128=129024，参数量减少一半。虽然参数量减少不是很明显，但是如果1×1输出维度降低到48呢？则参数量又减少一半。

实验结果：

总结一句，可以先使用1x1的卷积降低通道数，然后再用3x3卷积升维，参数量可以大大减小。

实际建模

游戏网址在这里。

简单的CNN模型

模型架构如下：

训练结果：(参数：110,052 Test accuarcy: 92.92% 大小：401KB)

TensorBoard	训练集	验证集
误差
正确率

先1后3

模型架构如下：

训练结果:（参数：106,236 Test accuarcy: 92.64% 大小：387KB）

TensorBoard	训练集	验证集
误差
正确率

GAP

模型架构如下：

训练结果:（参数：29,796 Test accuarcy: 88.82%(20轮：90.88%) 大小：110KB）

TensorBoard	训练集	验证集
误差
正确率

总结对比

下面是各方法在相同参数情况下，10轮训练后的表现：在简单CNN上：

在LeNet上：

可以看出GAP的压缩比最高，但是也是收敛速度最慢的；K1K3压缩表现不佳，主要原因是两个基本模型的Feature Map的数量都不够多，如果卷积层数达到100层以上可能效果会非常明显。

代码

import os
import numpy as np
from tqdm import tqdm
import keras
from keras import layers
import tensorflow as tf
from joblib import dump,load
from keras.callbacks import TensorBoard
from keras.applications import MobileNetV2
from keras import layers
from keras.models import Model
from keras.optimizers import SGD

root = '/media/sunyan/文档/data'


def load_data(root, vfold_ratio=0.2, max_items_per_class=4000):
    all_files = os.listdir(root)
    files_paths = [os.path.join(root,i) for i in all_files]
    # initialize variables
    x = np.empty([0, 784])
    y = np.empty([0])
    class_names = []

    # load each data file
    for idx, file in enumerate(tqdm(files_paths)):
        data = np.load(file)
        data = data[0: max_items_per_class, :]
        labels = np.full(data.shape[0], idx)

        x = np.concatenate((x, data), axis=0)
        y = np.append(y, labels)

        class_name, ext = os.path.splitext(os.path.basename(file))
        class_names.append(class_name)

    # randomize the dataset
    permutation = np.random.permutation(y.shape[0])
    x = x[permutation, :]
    y = y[permutation]

    # separate into training and testing
    vfold_size = int(x.shape[0] / 100 * (vfold_ratio * 100))

    x_test = x[0:vfold_size, :]
    y_test = y[0:vfold_size]

    x_train = x[vfold_size:x.shape[0], :]
    y_train = y[vfold_size:y.shape[0]]
    return x_train, y_train, x_test, y_test, class_names


def build_model():
    # Define model
    model = keras.Sequential()
    model.add(layers.Convolution2D(16, (3, 3),
                                   padding='same',
                                   input_shape=x_train.shape[1:], activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model


def gap_model():
    # Define model
    model = keras.Sequential()
    model.add(layers.Convolution2D(16, (3, 3),
                                   padding='same',
                                   input_shape=x_train.shape[1:], activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model

def one2three_model():
    model = keras.Sequential()
    model.add(layers.Convolution2D(16, (3, 3),padding='same',
                                   input_shape=x_train.shape[1:], activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(24, (1, 1), padding='same', activation='relu'))
    model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model


def lenet():
    model = keras.Sequential()
    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
                                         padding='same', input_shape=x_train.shape[1:], activation='relu'))
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Layer 2: Convolutional. Output = 10x10x16.
    model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
                                         padding='valid', activation='relu'))
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Flatten. Input = 5x5x16. Output = 400.
    model.add(keras.layers.Flatten())
    # Layer 3: Fully Connected. Input = 400. Output = 300.
    model.add(keras.layers.Dense(300, activation='relu'))
    # Layer 4: Fully Connected. Input = 300. Output = 200.
    model.add(keras.layers.Dense(200, activation='relu'))
    # Layer 5: Fully Connected. Input = 200. Output = 100.
    model.add(keras.layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model


def lenet_one2three():
    model = keras.Sequential()
    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
                                         padding='same', input_shape=x_train.shape[1:], activation='relu'))
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Layer 2: Convolutional. Output = 10x10x16.
    model.add(layers.Convolution2D(3, kernel_size=(1, 1), strides=(1, 1),padding='same', activation='relu'))
    model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
                                         padding='valid', activation='relu'))
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Flatten. Input = 5x5x16. Output = 400.
    model.add(keras.layers.Flatten())
    # Layer 3: Fully Connected. Input = 400. Output = 300.
    model.add(keras.layers.Dense(300, activation='relu'))
    # Layer 4: Fully Connected. Input = 300. Output = 200.
    model.add(keras.layers.Dense(200, activation='relu'))
    # Layer 5: Fully Connected. Input = 200. Output = 100.
    model.add(keras.layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())
    return model

def lenet_gap():
    model = keras.Sequential()
    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
                                         padding='same', input_shape=x_train.shape[1:], activation='relu'))
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Layer 2: Convolutional. Output = 10x10x16.
    model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
                                         padding='valid', activation='relu'))
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    model.add(layers.GlobalAveragePooling2D())
    # Layer 5: Fully Connected. Input = 200. Output = 100.
    model.add(keras.layers.Dense(64,activation='relu'))
    model.add(keras.layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())
    return model

def pickle_load():
    with open('x_train.pkl', 'rb') as f:
        x_train= load(f)

    with open('y_train.pkl', 'rb') as f:
        y_train = load(f)

    with open('x_test.pkl', 'rb') as f:
        x_test = load(f)

    with open('y_test.pkl', 'rb') as f:
        y_test = load(f)

    with open('class_names.pkl', 'rb') as f:
        class_names = load(f)

    return x_train, y_train, x_test, y_test, class_names


if __name__ == '__main__':
    # x_train, y_train, x_test, y_test, class_names = load_data(root=root)

    # with open('x_train.pkl', 'wb') as f:
    #     dump(x_train, f)
    #
    # with open('y_train.pkl', 'wb') as f:
    #     dump(y_train, f)
    #
    # with open('x_test.pkl', 'wb') as f:
    #     dump(x_test, f)
    #
    # with open('y_test.pkl', 'wb') as f:
    #     dump(y_test, f)
    #
    # with open('class_names.pkl', 'wb') as f:
    #     dump(class_names, f)

    x_train, y_train, x_test, y_test, class_names = pickle_load()

    num_classes = len(class_names)
    image_size = 28
    print(len(x_train))

    # import matplotlib.pyplot as plt
    # from random import randint
    #
    # idx = randint(0, len(x_train))
    # plt.imshow(x_train[idx].reshape(28, 28))
    # print(class_names[int(y_train[idx].item())])

    # Reshape and normalize
    x_train = x_train.reshape(x_train.shape[0], image_size, image_size, 1).astype('float32')
    x_test = x_test.reshape(x_test.shape[0], image_size, image_size, 1).astype('float32')

    x_train /= 255.0
    x_test /= 255.0

    # Convert class vectors to class matrices
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes)

    model = lenet_one2three()

    model.fit(x=x_train, y=y_train, validation_split=0.2, batch_size=256, verbose=2, epochs=10,callbacks=[TensorBoard(log_dir='log')])

    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test accuarcy: {:0.2f}%'.format(score[1] * 100))

    model.save('keras.h5')

    with open('class_names.txt', 'w') as file_handler:
        for item in class_names:
            file_handler.write("{}\n".format(item))

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018-11-01，如有侵权请联系 cloudcommunity@tencent.com 删除

机器学习