Quick Draw 数据集是一个包含5000万张图画的集合,分成了345个类别,这些图画都来自于Quick, Draw! 游戏的玩家。
自从AlexNet一举夺得ILSVRC 2012 ImageNet图像分类竞赛的冠军后,卷积神经网络(CNN)的热潮便席卷了整个计算机视觉领域。CNN模型火速替代了传统人工设计(hand-crafted)特征和分类器,不仅提供了一种端到端的处理方法,还大幅度地刷新了各个图像竞赛任务的精度,更甚者超越了人眼的精度(LFW人脸识别任务)。CNN模型在不断逼近计算机视觉任务的精度极限的同时,其深度和尺寸也在成倍增长。 所有模型压缩方法的核心思想是——在保证精度的同时使用最少的参数。
下面是几种经典模型的尺寸和参数数量对比:
Model | Model Size(MB) | 参数(百万) |
---|---|---|
AlexNet | >200 | 60 |
VGG16 | >500 | 138 |
GoogleNet | ~50 | 6.8 |
Inception-v3 | 90~100 | 23.2 |
随之而来的是一个很尴尬的场景:如此巨大的模型只能在有限的平台下使用,根本无法移植到移动端和嵌入式芯片当中。就算想通过网络传输,但较高的带宽占用也让很多用户望而生畏。另一方面,大尺寸的模型也对设备功耗和运行速度带来了巨大的挑战。因此这样的模型距离实用还有一段距离。
在这样的情形下,模型小型化与加速成了亟待解决的问题。其实早期就有学者提出了一系列CNN模型压缩方法,包括权值剪值(prunning)和矩阵SVD分解等,但压缩率和效率还远不能令人满意。
近年来,关于模型小型化的算法从压缩角度上可以大致分为两类:从模型权重数值角度压缩和从网络架构角度压缩。另一方面,从兼顾计算速度方面,又可以划分为:仅压缩尺寸和压缩尺寸的同时提升速度。
Golbal Average Pooling 第一次出现在论文Network in Network中,后来又很多工作延续使用了GAP,实验证明:Global Average Pooling确实可以提高CNN效果。
很长一段时间以来,全连接网络一直是CNN分类网络的标配结构。一般在全连接后会有激活函数来做分类,假设这个激活函数是一个多分类softmax,那么全连接网络的作用就是将最后一层卷积得到的feature map stretch成向量,对这个向量做乘法,最终降低其维度,然后输入到softmax层中得到对应的每个类别的得分。
全连接层如此的重要,以至于全连接层过多的参数重要到会造成过拟合,所以也会有一些方法专门用来解决过拟合,比如dropout。
既然全连接网络可以使feature map的维度减少,进而输入到softmax,但是又会造成过拟合,是不是可以用pooling来代替全连接。
答案是肯定的,Network in Network工作使用GAP来取代了最后的全连接层,直接实现了降维,更重要的是极大地减少了网络的参数(CNN网络中占比最大的参数其实后面的全连接层)。Global average pooling的结构如下图所示:
GAP的意义是对整个网络从结构上做正则化防止过拟合。既要参数少避免全连接带来的过拟合风险,又要能达到全连接一样的转换功能,怎么做呢?直接从feature map的通道上下手,如果我们最终有1000类,那么最后一层卷积输出的feature map就只有1000个channel,然后对这个feature map应用全局池化,输出长度为1000的向量,这就相当于剔除了全连接层黑箱子操作的特征,直接赋予了每个channel实际的类别意义。
实验证明,这种方法是非常有效的,这样做还有另外一个好处:不用在乎网络输入的图像尺寸。同时需要注意的是,使用gap也有可能造成收敛变慢。
SqueezeNet是F. N. Iandola,S.Han等人于2016年的论文《SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size》中提出的一个小型化的网络模型结构,该网络能在保证不损失精度的同时,将原始AlexNet压缩至原来的510倍左右(< 0.5MB)。
SqueezeNet提出了3点网络结构设计策略:
下面举一个例子,假如输入为28×28×192,输出feature map通道数为128。那么,直接接3×3卷积,参数量为3×3×192×128=221184。
如果先用1×1卷积进行降维到96个通道,然后再用3×3升维到128,则参数量为:1×1×192×96+3×3×96×128=129024,参数量减少一半。虽然参数量减少不是很明显,但是如果1×1输出维度降低到48呢?则参数量又减少一半。
实验结果:
总结一句,可以先使用1x1的卷积降低通道数,然后再用3x3卷积升维,参数量可以大大减小。
游戏网址在这里。
模型架构如下:
训练结果:(参数:110,052 Test accuarcy: 92.92% 大小:401KB)
TensorBoard | 训练集 | 验证集 |
---|---|---|
误差 | ||
正确率 |
模型架构如下:
训练结果:(参数:106,236 Test accuarcy: 92.64% 大小:387KB)
TensorBoard | 训练集 | 验证集 |
---|---|---|
误差 | ||
正确率 |
模型架构如下:
训练结果:(参数:29,796 Test accuarcy: 88.82%(20轮:90.88%) 大小:110KB)
TensorBoard | 训练集 | 验证集 |
---|---|---|
误差 | ||
正确率 |
下面是各方法在相同参数情况下,10轮训练后的表现: 在简单CNN上:
在LeNet上:
可以看出GAP的压缩比最高,但是也是收敛速度最慢的;K1K3压缩表现不佳,主要原因是两个基本模型的Feature Map的数量都不够多,如果卷积层数达到100层以上可能效果会非常明显。
import os
import numpy as np
from tqdm import tqdm
import keras
from keras import layers
import tensorflow as tf
from joblib import dump,load
from keras.callbacks import TensorBoard
from keras.applications import MobileNetV2
from keras import layers
from keras.models import Model
from keras.optimizers import SGD
root = '/media/sunyan/文档/data'
def load_data(root, vfold_ratio=0.2, max_items_per_class=4000):
all_files = os.listdir(root)
files_paths = [os.path.join(root,i) for i in all_files]
# initialize variables
x = np.empty([0, 784])
y = np.empty([0])
class_names = []
# load each data file
for idx, file in enumerate(tqdm(files_paths)):
data = np.load(file)
data = data[0: max_items_per_class, :]
labels = np.full(data.shape[0], idx)
x = np.concatenate((x, data), axis=0)
y = np.append(y, labels)
class_name, ext = os.path.splitext(os.path.basename(file))
class_names.append(class_name)
# randomize the dataset
permutation = np.random.permutation(y.shape[0])
x = x[permutation, :]
y = y[permutation]
# separate into training and testing
vfold_size = int(x.shape[0] / 100 * (vfold_ratio * 100))
x_test = x[0:vfold_size, :]
y_test = y[0:vfold_size]
x_train = x[vfold_size:x.shape[0], :]
y_train = y[vfold_size:y.shape[0]]
return x_train, y_train, x_test, y_test, class_names
def build_model():
# Define model
model = keras.Sequential()
model.add(layers.Convolution2D(16, (3, 3),
padding='same',
input_shape=x_train.shape[1:], activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(100, activation='softmax'))
# Train model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['top_k_categorical_accuracy'])
print(model.summary())
return model
def gap_model():
# Define model
model = keras.Sequential()
model.add(layers.Convolution2D(16, (3, 3),
padding='same',
input_shape=x_train.shape[1:], activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(100, activation='softmax'))
# Train model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['top_k_categorical_accuracy'])
print(model.summary())
return model
def one2three_model():
model = keras.Sequential()
model.add(layers.Convolution2D(16, (3, 3),padding='same',
input_shape=x_train.shape[1:], activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(24, (1, 1), padding='same', activation='relu'))
model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(100, activation='softmax'))
# Train model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['top_k_categorical_accuracy'])
print(model.summary())
return model
def lenet():
model = keras.Sequential()
# Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
padding='same', input_shape=x_train.shape[1:], activation='relu'))
# Pooling. Input = 28x28x6. Output = 14x14x6.
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
# Layer 2: Convolutional. Output = 10x10x16.
model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
padding='valid', activation='relu'))
# Pooling. Input = 10x10x16. Output = 5x5x16.
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
# Flatten. Input = 5x5x16. Output = 400.
model.add(keras.layers.Flatten())
# Layer 3: Fully Connected. Input = 400. Output = 300.
model.add(keras.layers.Dense(300, activation='relu'))
# Layer 4: Fully Connected. Input = 300. Output = 200.
model.add(keras.layers.Dense(200, activation='relu'))
# Layer 5: Fully Connected. Input = 200. Output = 100.
model.add(keras.layers.Dense(100, activation='softmax'))
# Train model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['top_k_categorical_accuracy'])
print(model.summary())
return model
def lenet_one2three():
model = keras.Sequential()
# Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
padding='same', input_shape=x_train.shape[1:], activation='relu'))
# Pooling. Input = 28x28x6. Output = 14x14x6.
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
# Layer 2: Convolutional. Output = 10x10x16.
model.add(layers.Convolution2D(3, kernel_size=(1, 1), strides=(1, 1),padding='same', activation='relu'))
model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
padding='valid', activation='relu'))
# Pooling. Input = 10x10x16. Output = 5x5x16.
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
# Flatten. Input = 5x5x16. Output = 400.
model.add(keras.layers.Flatten())
# Layer 3: Fully Connected. Input = 400. Output = 300.
model.add(keras.layers.Dense(300, activation='relu'))
# Layer 4: Fully Connected. Input = 300. Output = 200.
model.add(keras.layers.Dense(200, activation='relu'))
# Layer 5: Fully Connected. Input = 200. Output = 100.
model.add(keras.layers.Dense(100, activation='softmax'))
# Train model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['top_k_categorical_accuracy'])
print(model.summary())
return model
def lenet_gap():
model = keras.Sequential()
# Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
padding='same', input_shape=x_train.shape[1:], activation='relu'))
# Pooling. Input = 28x28x6. Output = 14x14x6.
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
# Layer 2: Convolutional. Output = 10x10x16.
model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
padding='valid', activation='relu'))
# Pooling. Input = 10x10x16. Output = 5x5x16.
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
model.add(layers.GlobalAveragePooling2D())
# Layer 5: Fully Connected. Input = 200. Output = 100.
model.add(keras.layers.Dense(64,activation='relu'))
model.add(keras.layers.Dense(100, activation='softmax'))
# Train model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['top_k_categorical_accuracy'])
print(model.summary())
return model
def pickle_load():
with open('x_train.pkl', 'rb') as f:
x_train= load(f)
with open('y_train.pkl', 'rb') as f:
y_train = load(f)
with open('x_test.pkl', 'rb') as f:
x_test = load(f)
with open('y_test.pkl', 'rb') as f:
y_test = load(f)
with open('class_names.pkl', 'rb') as f:
class_names = load(f)
return x_train, y_train, x_test, y_test, class_names
if __name__ == '__main__':
# x_train, y_train, x_test, y_test, class_names = load_data(root=root)
# with open('x_train.pkl', 'wb') as f:
# dump(x_train, f)
#
# with open('y_train.pkl', 'wb') as f:
# dump(y_train, f)
#
# with open('x_test.pkl', 'wb') as f:
# dump(x_test, f)
#
# with open('y_test.pkl', 'wb') as f:
# dump(y_test, f)
#
# with open('class_names.pkl', 'wb') as f:
# dump(class_names, f)
x_train, y_train, x_test, y_test, class_names = pickle_load()
num_classes = len(class_names)
image_size = 28
print(len(x_train))
# import matplotlib.pyplot as plt
# from random import randint
#
# idx = randint(0, len(x_train))
# plt.imshow(x_train[idx].reshape(28, 28))
# print(class_names[int(y_train[idx].item())])
# Reshape and normalize
x_train = x_train.reshape(x_train.shape[0], image_size, image_size, 1).astype('float32')
x_test = x_test.reshape(x_test.shape[0], image_size, image_size, 1).astype('float32')
x_train /= 255.0
x_test /= 255.0
# Convert class vectors to class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = lenet_one2three()
model.fit(x=x_train, y=y_train, validation_split=0.2, batch_size=256, verbose=2, epochs=10,callbacks=[TensorBoard(log_dir='log')])
score = model.evaluate(x_test, y_test, verbose=0)
print('Test accuarcy: {:0.2f}%'.format(score[1] * 100))
model.save('keras.h5')
with open('class_names.txt', 'w') as file_handler:
for item in class_names:
file_handler.write("{}\n".format(item))