前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >[Kaggle] Digit Recognizer 手写数字识别(卷积神经网络)

[Kaggle] Digit Recognizer 手写数字识别(卷积神经网络)

作者头像
Michael阿明
发布2021-02-19 10:54:33
1.3K0
发布2021-02-19 10:54:33
举报
文章被收录于专栏:Michael阿明学习之路

文章目录

Digit Recognizer 练习地址

相关博文:

[Hands On ML] 3. 分类(MNIST手写数字预测)

[Kaggle] Digit Recognizer 手写数字识别

[Kaggle] Digit Recognizer 手写数字识别(简单神经网络)

04.卷积神经网络 W1.卷积神经网络

上一篇的简单神经网络,将28*28的图片展平了,每个像素在空间上的位置关系是没有考虑的,空间的信息丢失。

1. 使用 LeNet 预测

LeNet神经网络 参考博文

1.1 导入包

代码语言:javascript
复制
from keras import backend as K # 兼容不同后端的代码
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Dense
from keras.layers.core import Flatten
from keras.utils import np_utils
from keras.optimizers import SGD, Adam, RMSprop

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd

1.2 建立 LeNet 模型

代码语言:javascript
复制
# 图片格式问题
# K.image_data_format() == 'channels_last' 
# 默认是last是通道  K.set_image_dim_ordering("tf")
# K.image_data_format() == 'channels_first' #  K.set_image_dim_ordering("th")

class LeNet:
    @staticmethod
    def build(input_shape, classes):
        model = Sequential()
        model.add(Conv2D(20,kernel_size=5,padding='same',
                         input_shape=input_shape,activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))

        model.add(Conv2D(50,kernel_size=5,padding='same',activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))

        model.add(Flatten())
        model.add(Dense(500, activation='relu'))

        model.add(Dense(classes,activation='softmax'))
        return model

1.3 读入数据

代码语言:javascript
复制
train = pd.read_csv('train.csv')
y_train_full = train['label']
X_train_full = train.drop(['label'], axis=1)
X_test_full = pd.read_csv('test.csv')
代码语言:javascript
复制
X_train_full.shape

输出:

代码语言:javascript
复制
(42000, 784)
  • 数据格式转换,增加一个通道维度
代码语言:javascript
复制
X_train = np.array(X_train_full).reshape(-1,28,28) / 255.0
X_test = np.array(X_test_full).reshape(-1,28,28)/255.0
y_train = np_utils.to_categorical(y_train_full, 10) # 转成oh编码

X_train = X_train[:, :, :, np.newaxis]
# m,28,28 -->  m, 28, 28, 1(单通道)
X_test = X_test[:, :, :, np.newaxis]

1.4 定义模型

代码语言:javascript
复制
model = LeNet.build(input_shape=(28, 28, 1), classes=10)
  • 定义优化器,配置模型
代码语言:javascript
复制
opt = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss="categorical_crossentropy",
              optimizer=opt, metrics=["accuracy"])

注意:标签不采用 one-hot 编码的话,这里使用 loss="sparse_categorical_crossentropy"

1.5 训练

代码语言:javascript
复制
history = model.fit(X_train, y_train, epochs=20, batch_size=128,
                    validation_split=0.2)
代码语言:javascript
复制
Epoch 1/20
263/263 [==============================] - 26s 98ms/step - 
loss: 0.2554 - accuracy: 0.9235 - 
val_loss: 0.0983 - val_accuracy: 0.9699
Epoch 2/20
263/263 [==============================] - 27s 103ms/step - 
loss: 0.0806 - accuracy: 0.9761 - 
val_loss: 0.0664 - val_accuracy: 0.9787
...
...
Epoch 20/20
263/263 [==============================] - 25s 97ms/step - 
loss: 0.0182 - accuracy: 0.9953 - 
val_loss: 0.0405 - val_accuracy: 0.9868

可以看见第2轮迭代结束,训练集准确率就 97.6%了,效果比之前的简单神经网络好很多

  • 模型总结
代码语言:javascript
复制
model.summary()
代码语言:javascript
复制
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 20)        520       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 20)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 50)        25050     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 50)          0         
_________________________________________________________________
flatten (Flatten)            (None, 2450)              0         
_________________________________________________________________
dense (Dense)                (None, 500)               1225500   
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5010      
=================================================================
Total params: 1,256,080
Trainable params: 1,256,080
Non-trainable params: 0
_________________________________________________________________
  • 绘制模型结构图
代码语言:javascript
复制
from keras.utils import plot_model
plot_model(model, './model.png', show_shapes=True)

1.6 绘制训练曲线

代码语言:javascript
复制
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1) # set the vertical range to [0-1]
plt.show()

1.7 预测提交

代码语言:javascript
复制
y_pred = model.predict(X_test)
pred = y_pred.argmax(axis=1).reshape(-1)
print(pred.shape)

image_id = pd.Series(range(1,len(pred)+1))
output = pd.DataFrame({'ImageId':image_id, 'Label':pred})
output.to_csv("submission_NN.csv",  index=False)

LeNet 模型得分 0.98607,比上一篇的简单NN模型(得分 0.97546),好了 1.061%

2. 使用 VGG16 迁移学习

VGG16 help 文档:

代码语言:javascript
复制
Help on function VGG16 in module tensorflow.python.keras.applications.vgg16:

VGG16(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation='softmax')
    Instantiates the VGG16 model.
    
    Reference paper:
    - [Very Deep Convolutional Networks for Large-Scale Image Recognition](
    https://arxiv.org/abs/1409.1556) (ICLR 2015)
    
    By default, it loads weights pre-trained on ImageNet. Check 'weights' for
    other options.
    
    This model can be built both with 'channels_first' data format
    (channels, height, width) or 'channels_last' data format
    (height, width, channels).
    
    The default input size for this model is 224x224.
    
    Caution: Be sure to properly pre-process your inputs to the application.
    Please see `applications.vgg16.preprocess_input` for an example.
    
    Arguments:
        include_top: whether to include the 3 fully-connected
            layers at the top of the network.
        weights: one of `None` (random initialization),
              'imagenet' (pre-training on ImageNet),
              or the path to the weights file to be loaded.
        input_tensor: optional Keras tensor
            (i.e. output of `layers.Input()`)
            to use as image input for the model.
        input_shape: optional shape tuple, only to be specified
            if `include_top` is False (otherwise the input shape
            has to be `(224, 224, 3)`
            (with `channels_last` data format)
            or `(3, 224, 224)` (with `channels_first` data format).
            It should have exactly 3 input channels,
            and width and height should be no smaller than 32.
            E.g. `(200, 200, 3)` would be one valid value.
        pooling: Optional pooling mode for feature extraction
            when `include_top` is `False`.
            - `None` means that the output of the model will be
                the 4D tensor output of the
                last convolutional block.
            - `avg` means that global average pooling
                will be applied to the output of the
                last convolutional block, and thus
                the output of the model will be a 2D tensor.
            - `max` means that global max pooling will
                be applied.
        classes: optional number of classes to classify images
            into, only to be specified if `include_top` is True, and
            if no `weights` argument is specified.
        classifier_activation: A `str` or callable. The activation function to use
            on the "top" layer. Ignored unless `include_top=True`. Set
            `classifier_activation=None` to return the logits of the "top" layer.
    
    Returns:
      A `keras.Model` instance.
    
    Raises:
      ValueError: in case of invalid argument for `weights`,
        or invalid input shape.
      ValueError: if `classifier_activation` is not `softmax` or `None` when
        using a pretrained top layer.

2.1 导入包

代码语言:javascript
复制
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import cv2
from keras.optimizers import Adam
from keras.models import Model
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Input
from keras.layers import Dropout
from keras.applications.vgg16 import VGG16

2.2 定义模型

代码语言:javascript
复制
vgg16 = VGG16(weights='imagenet',include_top=False,
              input_shape=(32, 32, 3))
# VGG16 模型在include_top=False时,可以自定义输入大小,至少32x32,通道必须是3

mylayer = vgg16.output
mylayer = Flatten()(mylayer)
mylayer = Dense(128, activation='relu')(mylayer)
mylayer = Dropout(0.3)(mylayer)
mylayer = Dense(10, activation='softmax')(mylayer)

model = Model(inputs=vgg16.inputs, outputs=mylayer)

for layer in vgg16.layers:
    layer.trainable = False # vgg16的各个层不训练

2.3 数据处理

代码语言:javascript
复制
train = pd.read_csv('train.csv')
y_train_full = train['label']
X_train_full = train.drop(['label'], axis=1)
X_test_full = pd.read_csv('test.csv')
  • 将单通道的数据,复制成3通道的(vgg16要求3通道的),再resize成 32*32的,vgg16 要求图片最低分辨率是 32*32
代码语言:javascript
复制
def process(data):
    data = np.array(data).reshape(-1,28,28)
    output = np.zeros((data.shape[0], 32, 32, 3))
    for i in range(data.shape[0]):
        img = data[i]
        rgb_array = np.zeros((img.shape[0], img.shape[1], 3), "uint8")
        rgb_array[:, :, 0], rgb_array[:, :, 1], rgb_array[:, :, 2] = img, img, img
        pic = cv2.resize(rgb_array, (32, 32), interpolation=cv2.INTER_LINEAR)
        output[i] = pic
    output = output.astype('float32')/255.0
    return output
代码语言:javascript
复制
y_train = np_utils.to_categorical(y_train_full, 10)
X_train = process(X_train_full)
X_test = process(X_test_full)

print(X_train.shape)
print(X_test.shape)

输出:

代码语言:javascript
复制
(42000, 32, 32, 3)
(28000, 32, 32, 3)
  • 看一看处理后的图片
代码语言:javascript
复制
img = X_train[0]
plt.imshow(img)
np.set_printoptions(threshold=np.inf)# 全部显示矩阵
# print(X_train[0])

2.4 配置模型、训练

代码语言:javascript
复制
opt = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss="categorical_crossentropy",
              optimizer=opt, metrics=["accuracy"])
代码语言:javascript
复制
history = model.fit(X_train, y_train, epochs=50, batch_size=128,
                    validation_split=0.2)

输出:

代码语言:javascript
复制
Epoch 1/50
263/263 [==============================] - 101s 384ms/step - 
loss: 0.9543 - accuracy: 0.7212 - 
val_loss: 0.5429 - val_accuracy: 0.8601
...
Epoch 10/50
263/263 [==============================] - 110s 417ms/step - 
loss: 0.3284 - accuracy: 0.9063 - 
val_loss: 0.2698 - val_accuracy: 0.9263
...
Epoch 40/50
263/263 [==============================] - 114s 433ms/step - 
loss: 0.2556 - accuracy: 0.9254 - 
val_loss: 0.2121 - val_accuracy: 0.9389
...
Epoch 50/50
263/263 [==============================] - 110s 420ms/step - 
loss: 0.2466 - accuracy: 0.9272 - 
val_loss: 0.2058 - val_accuracy: 0.9406
代码语言:javascript
复制
model.summary()

输出:

代码语言:javascript
复制
Model: "functional_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_23 (InputLayer)        [(None, 32, 32, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten_19 (Flatten)         (None, 512)               0         
_________________________________________________________________
dense_28 (Dense)             (None, 128)               65664     
_________________________________________________________________
dropout_9 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_29 (Dense)             (None, 10)                1290      
=================================================================
Total params: 14,781,642
Trainable params: 66,954
Non-trainable params: 14,714,688
_________________________________________________________________
  • 绘制模型结构
代码语言:javascript
复制
from keras.utils import plot_model
plot_model(model, './model.png', show_shapes=True)

2.5 预测提交

代码语言:javascript
复制
y_pred = model.predict(X_test)
pred = y_pred.argmax(axis=1).reshape(-1)
print(pred.shape)
print(pred)
代码语言:javascript
复制
image_id = pd.Series(range(1,len(pred)+1))
output = pd.DataFrame({'ImageId':image_id, 'Label':pred})
output.to_csv("submission_NN.csv",  index=False)

预测得分:0.93696

可能是由于 VGG16模型是用 224*224 的图片训练的权重,我们使用的是 28*28 的图片,可能不能很好的使用VGG16已经训练好的权重


本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020/10/13 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 文章目录
  • 1. 使用 LeNet 预测
    • 1.1 导入包
      • 1.2 建立 LeNet 模型
        • 1.3 读入数据
          • 1.4 定义模型
            • 1.5 训练
              • 1.6 绘制训练曲线
                • 1.7 预测提交
                • 2. 使用 VGG16 迁移学习
                  • 2.1 导入包
                    • 2.2 定义模型
                      • 2.3 数据处理
                        • 2.4 配置模型、训练
                          • 2.5 预测提交
                          领券
                          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档