cifar-10数据集

CIFAR-10分类问题是机器学习领域的一个通用基准，由60000张32*32的RGB彩色图片构成，共10个分类。50000张用于训练集，10000张用于测试集。其问题是将32X32像素的RGB图像分类成10种类别:飞机，手机，鸟，猫，鹿，狗，青蛙，马，船和卡车。更多信息可以参考CIFAR-10和Alex Krizhevsky的演讲报告。常见的还有cifar-100，分类物体达到100类，以及ILSVRC比赛的100类。

网络结构

代码实现

1.网络结构：simple_cnn.py

#coding:utf-8

'''

Created by huxiaoman 2017.11.27

simple_cnn.py:自己设计的一个简单的cnn网络结构

'''

import os

from PIL import Image

import numpy as np

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def simple_cnn(img):

input=img,

filter_size=5,

num_filters=20,

num_channel=3,

pool_size=2,

pool_stride=2,

input=conv_pool_1,

filter_size=5,

num_filters=50,

num_channel=20,

pool_size=2,

pool_stride=2,

2.训练程序：train_simple_cnn.py

#coding:utf-8

'''

Created by huxiaoman 2017.11.27

train_simple—_cnn.py:训练simple_cnn对cifar10数据集进行分类

'''

import sys, os

from simple_cnn import simple_cnn

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def main():

datadim = 3 * 32 * 32

classdim = 10

# option 1. resnet

# net = resnet_cifar10(image, depth=32)

# option 2. vgg

net = simple_cnn(image)

# Create parameters

# Create optimizer

momentum=0.9,

learning_rate=0.1 / 128.0,

learning_rate_decay_a=0.1,

learning_rate_decay_b=50000 * 100,

learning_rate_schedule='discexp')

# End batch and end pass event handler

def event_handler(event):

if event.batch_id % 100 == 0:

print "\nPass %d, Batch %d, Cost %f, %s" % (

event.pass_id, event.batch_id, event.cost, event.metrics)

else:

# save parameters

with open('params_pass_%d.tar' % event.pass_id, 'w') as f:

parameters.to_tar(f)

result = trainer.test(

feeding={'image': 0,

'label': 1})

print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)

# Create trainer

cost=cost, parameters=parameters, update_equation=momentum_optimizer)

# Save the inference topology to protobuf.

with open("inference_topology.pkl", 'wb') as f:

inference_topology.serialize_for_inference(f)

trainer.train(

batch_size=128),

num_passes=200,

event_handler=event_handler,

feeding={'image': 0,

'label': 1})

# inference

from PIL import Image

import numpy as np

import os

im = Image.open(file)

im = im.resize((32, 32), Image.ANTIALIAS)

im = np.array(im).astype(np.float32)

# The storage order of the loaded image is W(widht),

# the CHW order, so transpose them.

im = im.transpose((2, 0, 1)) # CHW

# In the training phase, the channel order of CIFAR

# image is B(Blue), G(green), R(Red). But PIL open

# image in RGB mode. It must swap the channel order.

im = im[(2, 1, 0), :, :] # BGR

im = im.flatten()

im = im / 255.0

return im

test_data = []

# users can remove the comments and change the model name

# with open('params_pass_50.tar', 'r') as f:

output_layer=out, parameters=parameters, input=test_data)

lab = np.argsort(-probs) # probs and lab are the results of one batch data

print "Label of image/dog.png is: %d" % lab[0][0]

if __name__ == '__main__':

main()

3.结果输出

I1128 21:44:30.218085 14733 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7

[INFO 2017-11-28 21:44:35,874 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 28, w = 28, size = 15680

[INFO 2017-11-28 21:44:35,874 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 14, w = 14, size = 3920

[INFO 2017-11-28 21:44:35,875 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 10, w = 10, size = 5000

[INFO 2017-11-28 21:44:35,876 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 5, w = 5, size = 1250

I1128 21:44:35.928449 14733 GradientMachine.cpp:85] Initing parameters..

I1128 21:44:36.056259 14733 GradientMachine.cpp:92] Init parameters done.

Pass 0, Batch 0, Cost 2.302628, {'classification_error_evaluator': 0.9296875}

................................................................................

```

Pass 199, Batch 200, Cost 0.869726, {'classification_error_evaluator': 0.3671875}

...................................................................................................

Pass 199, Batch 300, Cost 0.801396, {'classification_error_evaluator': 0.3046875}

Label of image/dog.png is: 9

LeNet-5网络结构

Lenet-5网络结构来源于Yan LeCun提出的,原文为《Gradient-based learning applied to document recognition》，论文里使用的是mnist手写数字作为输入数据（32 * 32）进行验证。我们来看一下网络结构。

‍‍　LeNet-5一共有8层: 1个输入层+3个卷积层(C1、C3、C5)+2个下采样层(S2、S4)+1个全连接层(F6)+1个输出层，每层有多个feature map(自动提取的多组特征)。‍‍‍‍‍‍

Input输入层

cifar10 数据集，每一张图片尺寸：32 * 32

C1 卷积层

6个feature_map，卷积核大小 5 * 5 ，feature_map尺寸：28 * 28

S2 下采样层（池化层）

6个14*14的feature_map，pooling大小 2* 2

C3 卷积层

S4 下采样层

C5卷积层

F6 全连接层

1.网络结构lenet.py‍‍‍‍‍‍‍‍

#coding:utf-8

'''

Created by huxiaoman 2017.11.27

lenet.py:LeNet-5

'''

import os

from PIL import Image

import numpy as np

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def lenet(img):

input=img,

filter_size=5,

num_filters=6,

num_channel=3,

pool_size=2,

pool_stride=2,

input=conv_pool_1,

filter_size=5,

num_filters=16,

pool_size=2,

pool_stride=2,

conv_3 = img_conv_layer(

input = conv_pool_2,

filter_size = 1,

num_filters = 120,

stride = 1)

return fc

2.训练代码train_lenet.py

#coding:utf-8

'''

Created by huxiaoman 2017.11.27

train_lenet.py:训练LeNet-5对cifar10数据集进行分类

'''

import sys, os

from lenet import lenet

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def main():

datadim = 3 * 32 * 32

classdim = 10

# option 1. resnet

# net = resnet_cifar10(image, depth=32)

# option 2. vgg

net = lenet(image)

# Create parameters

# Create optimizer

momentum=0.9,

learning_rate=0.1 / 128.0,

learning_rate_decay_a=0.1,

learning_rate_decay_b=50000 * 100,

learning_rate_schedule='discexp')

# End batch and end pass event handler

def event_handler(event):

if event.batch_id % 100 == 0:

print "\nPass %d, Batch %d, Cost %f, %s" % (

event.pass_id, event.batch_id, event.cost, event.metrics)

else:

# save parameters

with open('params_pass_%d.tar' % event.pass_id, 'w') as f:

parameters.to_tar(f)

result = trainer.test(

feeding={'image': 0,

'label': 1})

print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)

# Create trainer

cost=cost, parameters=parameters, update_equation=momentum_optimizer)

# Save the inference topology to protobuf.

with open("inference_topology.pkl", 'wb') as f:

inference_topology.serialize_for_inference(f)

trainer.train(

batch_size=128),

num_passes=200,

event_handler=event_handler,

feeding={'image': 0,

'label': 1})

# inference

from PIL import Image

import numpy as np

import os

im = Image.open(file)

im = im.resize((32, 32), Image.ANTIALIAS)

im = np.array(im).astype(np.float32)

# The storage order of the loaded image is W(widht),

# the CHW order, so transpose them.

im = im.transpose((2, 0, 1)) # CHW

# In the training phase, the channel order of CIFAR

# image is B(Blue), G(green), R(Red). But PIL open

# image in RGB mode. It must swap the channel order.

im = im[(2, 1, 0), :, :] # BGR

im = im.flatten()

im = im / 255.0

return im

test_data = []

# users can remove the comments and change the model name

# with open('params_pass_50.tar', 'r') as f:

output_layer=out, parameters=parameters, input=test_data)

lab = np.argsort(-probs) # probs and lab are the results of one batch data

print "Label of image/dog.png is: %d" % lab[0][0]

if __name__ == '__main__':

main()

3.结果输出

I1129 14:52:44.314946 15153 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7

[INFO 2017-11-29 14:52:50,490 layers.py:2539] output for __conv_pool_0___conv: c = 6, h = 28, w = 28, size = 4704

[INFO 2017-11-29 14:52:50,491 layers.py:2667] output for __conv_pool_0___pool: c = 6, h = 14, w = 14, size = 1176

[INFO 2017-11-29 14:52:50,491 layers.py:2539] output for __conv_pool_1___conv: c = 16, h = 10, w = 10, size = 1600

[INFO 2017-11-29 14:52:50,492 layers.py:2667] output for __conv_pool_1___pool: c = 16, h = 5, w = 5, size = 400

[INFO 2017-11-29 14:52:50,493 layers.py:2539] output for __conv_0__: c = 120, h = 5, w = 5, size = 3000

I1129 14:52:50.545882 15153 GradientMachine.cpp:85] Initing parameters..

I1129 14:52:50.651103 15153 GradientMachine.cpp:92] Init parameters done.

Pass 0, Batch 0, Cost 2.331898, {'classification_error_evaluator': 0.9609375}

```

......

Pass 199, Batch 300, Cost 0.004373, {'classification_error_evaluator': 0.0}

Label of image/dog.png is: 7

LeNet-5的Tensorflow实现

tensorflow版本LeNet-5版本可以参照：

models/tutorials/image/cifar10/(https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10)的步骤来训练，不过这里面的代码包含了很多数据处理、权重衰减以及正则化的一些方法防止过拟合。按照官方写的,batch_size=128时在Tesla K40上迭代10w次需要4小时，准确率能达到86%。不过如果不对数据做处理，直接跑的话，效果应该没有这么好。不过可以仔细借鉴cifar10_inputs.py里的distorted_inouts函数对数据预处理增大数据集的思想，以及cifar10.py里对于权重和偏置的衰减设置等。目前迭代到1w次左右，cost是0.98，acc是78.4%

1.LeNet-5论文：《Gradient-based learning applied to document recognition》

2.可视化CNN：http://shixialiu.com/publications/cnnvis/demo/

• 发表于:
• 原文链接http://kuaibao.qq.com/s/20180108G0JG6100?refer=cp_1026
• 腾讯「云+社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
• 如有侵权，请联系 yunjia_community@tencent.com 删除。

2020-09-27

2020-09-27

2018-06-11

2018-05-22

2018-04-26

2020-09-27

2020-09-27

2020-09-27