本博客旨在给经典的GoogleNet网络进行详解与代码实现,如有不足或者其他的见解,请在本博客下面留言。
在本篇博客中,我们将实现一个类似于InceptionV2的结构,并用VOC2012的数据集进行网络的训练,验证,与测试。为了快速开发,本次我们把Keras作为代码的框架。
Pascal VOC为图像识别,检测与分割提供了一整套标准化的优秀的数据集,每一年都会举办一次图像识别竞赛。下面是VOC2012,训练集(包括验证集)的下载地址。
VOC2012里面有20类物体的图片,图片总共有1.7万张。我把数据集分成了3个部分,训练集,验证集,测试集,比例为8:1:1。 下面是部分截图:
接着我们使用keras代码来使用这个数据集,代码如下:
IM_WIDTH=224 #图片宽度
IM_HEIGHT=224 #图片高度
batch_size=32 #批的大小
#train data
train_datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
featurewise_center=True
)
train_generator = train_datagen.flow_from_directory(
train_root,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
)
#vaild data
vaild_datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
featurewise_center=True
)
vaild_generator = train_datagen.flow_from_directory(
vaildation_root,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
)
#test data
test_datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
featurewise_center=True
)
test_generator = train_datagen.flow_from_directory(
test_root,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
)
我使用了3个ImageDataGenerator,分别来使用训练集,验证集与测试集的数据。使用ImageDataGenerator需要导入相应的模块,==from keras.preprocessing.image import ImageDataGenerator==。ImageDataGenrator可以用来做数据增强,提高模型的鲁棒性.它里面提供了许多变换,包括图片旋转,对称,平移等等操作。里面的flow_from_directory方法可以从相应的目录里面批量获取图片,这样就可以不用一次性读取所有图片(防止内存不足)。
要想理解Googlenet的结构,第一步必须先知道Inception的结构,因为它是由多个Inception的结构组合而成的。如下图Fig.2所示,(a)表示朴素的版本的inception v1示意图,(b)表示降维版本的Inception v1示意图。
Inception的主要思想基于——一个卷积网络里面的局部稀疏最优结构往往可以由简单可复用的密集组合来近似或者替代。就像(a)里面,1×1,3×3,5×5的卷积层,与3×3的池化层的组合一个inception。这样做的几点说明:
但是这么做,问题又来了,如果提高大卷积核的比例,那么这会意味着计算复杂度的飙升。为此,google的工程师们又提出(b)的这个Inception结构。
下面的Table.1给出了Googlenet是怎么由Inception模块和一些传统的卷积层与池化层构成的。比较Inception(3a)和Inception(5b),我们可以看到大卷积核的滤波器的个数的比例已经提高了。最后需要注意两点,该网络的使用了avg pool来替代第一层全连接层,大大降低了参数的个数。后面在avg pool后面加入全连接层则是为了方便微调的操作。
根据上面的表格我们可以画出这样的一幅图。
#-*- coding: UTF-8 -*-
from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D
from keras.layers import Flatten, Dense, Dropout,BatchNormalization
from keras.layers import Input, concatenate
from keras.models import Model,load_model
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import plot_model,np_utils
from keras import regularizers
import keras.metrics as metric
import os
# Global Constants
NB_CLASS=20
LEARNING_RATE=0.01
MOMENTUM=0.9
ALPHA=0.0001
BETA=0.75
GAMMA=0.1
DROPOUT=0.4
WEIGHT_DECAY=0.0005
LRN2D_NORM=True
DATA_FORMAT='channels_last' # Theano:'channels_first' Tensorflow:'channels_last'
USE_BN=True
IM_WIDTH=224
IM_HEIGHT=224
EPOCH=50
train_root='/home/faith/keras/dataset/traindata/'
vaildation_root='/home/faith/keras/dataset/vaildationdata/'
test_root='/home/faith/keras/dataset/testdata/'
IM_WIDTH=224
IM_HEIGHT=224
batch_size=32
#train data
train_datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
featurewise_center=True
)
train_generator = train_datagen.flow_from_directory(
train_root,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
)
#vaild data
vaild_datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
featurewise_center=True
)
vaild_generator = train_datagen.flow_from_directory(
vaildation_root,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
)
#test data
test_datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
featurewise_center=True
)
test_generator = train_datagen.flow_from_directory(
test_root,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
)
#normalization
def conv2D_lrn2d(x,filters,kernel_size,strides=(1,1),padding='same',data_format=DATA_FORMAT,dilation_rate=(1,1),activation='relu',use_bias=True,kernel_initializer='glorot_uniform',bias_initializer='zeros',kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None,kernel_constraint=None,bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=WEIGHT_DECAY):
#l2 normalization
if weight_decay:
kernel_regularizer=regularizers.l2(weight_decay)
bias_regularizer=regularizers.l2(weight_decay)
else:
kernel_regularizer=None
bias_regularizer=None
x=Conv2D(filters=filters,kernel_size=kernel_size,strides=strides,padding=padding,data_format=data_format,dilation_rate=dilation_rate,activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
if lrn2d_norm:
#batch normalization
x=BatchNormalization()(x)
return x
def inception_module(x,params,concat_axis,padding='same',data_format=DATA_FORMAT,dilation_rate=(1,1),activation='relu',use_bias=True,kernel_initializer='glorot_uniform',bias_initializer='zeros',kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None,kernel_constraint=None,bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=None):
(branch1,branch2,branch3,branch4)=params
if weight_decay:
kernel_regularizer=regularizers.l2(weight_decay)
bias_regularizer=regularizers.l2(weight_decay)
else:
kernel_regularizer=None
bias_regularizer=None
#1x1
pathway1=Conv2D(filters=branch1[0],kernel_size=(1,1),strides=1,padding=padding,data_format=data_format,dilation_rate=dilation_rate,activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
#1x1->3x3
pathway2=Conv2D(filters=branch2[0],kernel_size=(1,1),strides=1,padding=padding,data_format=data_format,dilation_rate=dilation_rate,activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
pathway2=Conv2D(filters=branch2[1],kernel_size=(3,3),strides=1,padding=padding,data_format=data_format,dilation_rate=dilation_rate,activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway2)
#1x1->5x5
pathway3=Conv2D(filters=branch3[0],kernel_size=(1,1),strides=1,padding=padding,data_format=data_format,dilation_rate=dilation_rate,activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
pathway3=Conv2D(filters=branch3[1],kernel_size=(5,5),strides=1,padding=padding,data_format=data_format,dilation_rate=dilation_rate,activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway3)
#3x3->1x1
pathway4=MaxPooling2D(pool_size=(3,3),strides=1,padding=padding,data_format=DATA_FORMAT)(x)
pathway4=Conv2D(filters=branch4[0],kernel_size=(1,1),strides=1,padding=padding,data_format=data_format,dilation_rate=dilation_rate,activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway4)
return concatenate([pathway1,pathway2,pathway3,pathway4],axis=concat_axis)
def create_model():
#Data format:tensorflow,channels_last;theano,channels_last
if DATA_FORMAT=='channels_first':
INP_SHAPE=(3,224,224)
img_input=Input(shape=INP_SHAPE)
CONCAT_AXIS=1
elif DATA_FORMAT=='channels_last':
INP_SHAPE=(224,224,3)
img_input=Input(shape=INP_SHAPE)
CONCAT_AXIS=3
else:
raise Exception('Invalid Dim Ordering')
x=conv2D_lrn2d(img_input,64,(7,7),2,padding='same',lrn2d_norm=False)
x=MaxPooling2D(pool_size=(3,3),strides=2,padding='same',data_format=DATA_FORMAT)(x)
x=BatchNormalization()(x)
x=conv2D_lrn2d(x,64,(1,1),1,padding='same',lrn2d_norm=False)
x=conv2D_lrn2d(x,192,(3,3),1,padding='same',lrn2d_norm=True)
x=MaxPooling2D(pool_size=(3,3),strides=2,padding='same',data_format=DATA_FORMAT)(x)
x=inception_module(x,params=[(64,),(96,128),(16,32),(32,)],concat_axis=CONCAT_AXIS) #3a
x=inception_module(x,params=[(128,),(128,192),(32,96),(64,)],concat_axis=CONCAT_AXIS) #3b
x=MaxPooling2D(pool_size=(3,3),strides=2,padding='same',data_format=DATA_FORMAT)(x)
x=inception_module(x,params=[(192,),(96,208),(16,48),(64,)],concat_axis=CONCAT_AXIS) #4a
x=inception_module(x,params=[(160,),(112,224),(24,64),(64,)],concat_axis=CONCAT_AXIS) #4b
x=inception_module(x,params=[(128,),(128,256),(24,64),(64,)],concat_axis=CONCAT_AXIS) #4c
x=inception_module(x,params=[(112,),(144,288),(32,64),(64,)],concat_axis=CONCAT_AXIS) #4d
x=inception_module(x,params=[(256,),(160,320),(32,128),(128,)],concat_axis=CONCAT_AXIS) #4e
x=MaxPooling2D(pool_size=(3,3),strides=2,padding='same',data_format=DATA_FORMAT)(x)
x=inception_module(x,params=[(256,),(160,320),(32,128),(128,)],concat_axis=CONCAT_AXIS) #5a
x=inception_module(x,params=[(384,),(192,384),(48,128),(128,)],concat_axis=CONCAT_AXIS) #5b
x=AveragePooling2D(pool_size=(7,7),strides=1,padding='valid',data_format=DATA_FORMAT)(x)
x=Flatten()(x)
x=Dropout(DROPOUT)(x)
x=Dense(output_dim=NB_CLASS,activation='linear')(x)
x=Dense(output_dim=NB_CLASS,activation='softmax')(x)
return x,img_input,CONCAT_AXIS,INP_SHAPE,DATA_FORMAT
def check_print():
# Create the Model
x,img_input,CONCAT_AXIS,INP_SHAPE,DATA_FORMAT=create_model()
# Create a Keras Model
model=Model(input=img_input,output=[x])
model.summary()
# Save a PNG of the Model Build
plot_model(model,to_file='GoogLeNet.png')
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc',metric.top_k_categorical_accuracy])
print 'Model Compiled'
return model
if __name__=='__main__':
if os.path.exists('inception_1.h5'):
model=load_model('inception_1.h5')
else:
model=check_print()
model.fit_generator(train_generator,validation_data=vaild_generator,epochs=EPOCH,steps_per_epoch=train_generator.n/batch_size
,validation_steps=vaild_generator.n/batch_size)
model.save('inception_1.h5')
model.metrics=['acc',metric.top_k_categorical_accuracy]
loss,acc,top_acc=model.evaluate_generator(test_generator,steps=test_generator.n/batch_size)
print 'Test result:loss:%f,acc:%f,top_acc:%f'%(loss,acc,top_acc)
dataset | loss | acc | top5-acc |
---|---|---|---|
Training set | 1.85 | 39.9% | 85.3% |
Vaildation set | 2.01 | 36.6% | 82.0% |
Testing set | 2.08 | 35.7% | 78.1% |
我们可以发现模型最后在测试集上的效果与训练集上的效果有一定程度上的差距,模型出现了一点过拟合。以下是个人对此问题的一些分析:由于采用了L2正则化,BatchNormalization,Dropout等等方法,但是还是出现了过拟合,可能是由于VOC数据比较少,20类物体才1.7万张,相当于每个物体850张左右,要想取得比较好的效果,每个类别物体的图片至少得几千张。所以可以通过降低网络层数,增加数据量,增加训练次数等手段来提高网络的性能。
以下是本博客的引用,再次本人对每个引用的作者表示感谢。读者如果对googlenet这个网络仍然存在一些疑虑,或者想要有更深的理解,可以参考以下的引用。
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/170450.html原文链接:https://javaforall.cn