我试着写我自己的神经网络作为学习练习。具体来说,我试图创建一个神经网络来识别手写数字。我正在使用sklearn的数字数据集,但我自己编写了神经网络。
简单的测试是成功的,比如OR门或门和门,所以我相信反向传播已经成功实现了,但我发现,经过培训后,当处理手写数字的8x8像素图像时,网络仍然工作得很差。I目前有64个输入(8x8图像)和10个输出(每个数字有一个),每个4大小的隐藏层有2个。我怀疑是多个输出导致问题,因为网络通常会达到0.1,0.1,0.1的激活。
可能的想法:
( 1)多个输出是否造成问题?
( 2)是否需要更好的误差函数?
3)我是否只需要以较低的学习率来训练这个系统更长时间?
显示1的预测的图像(即训练后的输出应为~[0,1,0,0,0,0,0,0,0,0,0,0])
有没有人遇到过类似的问题?或者能告诉我哪里出了问题?谢谢你的耐心,如果这已经问过,但我没有找到它!代码如下:
编辑:和charlesreid1都是正确的,这是因为我的网络体系结构太简单,无法处理这个任务。更确切地说,我有两层4个神经元,每个层试图处理64个输入。将我的隐藏层转换为3层100个神经元,这样我就可以达到90%的准确率分数(假设输出大于0.7被认为是一个积极的结果)。
# Import our dependencies
import numpy as np
from sklearn import datasets
class Neural_Network():
#Initalising function
def __init__(self, input_size, output_size, niteration = 100000):
np.random.seed(1)
self.niteration = niteration
self.layer_sizes = np.array([input_size, output_size])
self.weights = list()
self.error = np.array([])
# initialise random weights
self._recreate_weights()
def _recreate_weights(self):
# Recreate the weights after adding a hidden layer
self.weights = list()
for i in np.arange(len(self.layer_sizes) - 1):
weights = np.random.rand(self.layer_sizes[i], self.layer_sizes[i+1]) * 2 - 1
self.weights.append(weights)
self.momentum = [i * 0 for i in self.weights]
def add_hidden_layer(self,size):
# Add a new hidden layer to our neural network
self.layer_sizes = np.insert(self.layer_sizes, -1, size)
self._recreate_weights()
def _sigmoid(self, x, deriv=False):
if deriv:
return self._sigmoid(x, deriv=False)*(1-self._sigmoid(x, deriv=False))
else:
return 1.0/(1+np.exp(-x))
def predict(self, input_single, deriv=False, layer_output = False):
data_current_layer = input_single
output_list = list()
output_list.append(np.array([data_current_layer]))
for i in np.arange(len(self.layer_sizes) - 1):
data_current_layer = self._sigmoid(np.dot(data_current_layer, self.weights[i]), deriv)
output_list.append(np.array([data_current_layer]))
return(output_list)
def train2(self, input_training_data, input_training_labels):
for iterations in np.arange(self.niteration):
# Loop over all training sets niteration times
updates = [i * 0 for i in network.weights] # Used for storing the update to the weights
mean_error = np.array([]) # used for calculating the mean error
for i in np.arange(len(input_training_data)): # For each training example
activations = list() # Store all my activations in a list
activations.append(np.array([input_training_data[i]]))
for j in np.arange(len(self.layer_sizes) - 1):
# Calculate all the activations for every layer
z = np.dot(activations[-1], self.weights[j])
a = self._sigmoid(z, deriv = False)
activations.append(a)
error = list()
error.append(a[-1] - np.array([input_training_labels[i]]))
for j in np.arange(len(self.layer_sizes) - 2):
# Calculate the error term for each layer
j2 = (-1 * j) - 1
j3 = j2 - 1
d = np.dot(error[j], self.weights[j2].T) * activations[j3] * (1 - activations[j3])
error.append(d)
for j in np.arange(len(self.layer_sizes) - 1):
# calculate the gradient for the error with respect to the weights
j2 = (-1 * j) - 1
updates[j] += np.dot(activations[j].T, error[j2])
mean_error = np.append(mean_error, np.sum(np.abs(error[0])))
updates = [0.001*i/len(input_training_data) for i in updates] # Add in a learning rate
self.error = np.append(self.error,np.mean(mean_error))
for i in np.arange(len(self.weights)):
# update using a momentum term
self.momentum[i] -= updates[i]
self.weights[i] += self.momentum[i]
self.momentum[i] *= 0.9
if np.mod(iterations, 1000) == 0:
# Visually keep track of the error
print(iterations, self.error[-1])
# Main Loop
# Read in the dataset and divide into a training and test set
data = datasets.load_digits()
images = data.images
labels = data.target
targets = data.target_names
training_images = images[:int(len(labels*0.8))]
training_labels = labels[:int(len(labels*0.8))]
training_images = images[:10]
training_labels = labels[:10]
test_images = images[int(len(labels*0.8)):]
test_labels = labels[int(len(labels*0.8)):]
# Flatten the training and test images using ravel. CAN PROBABLY DO THIS BEFORE DIVIDING THEM UP.
training_images_list = list()
for i in training_images:
training_images_list.append(np.ravel(i))
test_images_list = list()
for i in test_images:
test_images_list.append(np.ravel(i))
# Change the training and test labels into a more usable format.
training_labels_temp=np.zeros([np.size(training_labels), 10])
for i in np.arange(np.size(training_labels)):
training_labels_temp[i, training_labels[i]] = 1
training_labels = training_labels_temp
test_labels_temp=np.zeros([np.size(test_labels), 10])
for i in np.arange(np.size(test_labels)):
test_labels_temp[i, test_labels[i]] = 1
test_labels = test_labels_temp
# Build a 3 layered neural network, input - hidden - output
if True:
network = Neural_Network(input_size=64, output_size=10)
network.add_hidden_layer(size=4)
network.add_hidden_layer(size=4)
network.add_hidden_layer(size=4)
# Train the network on our training set
#print(network.weights)
network.train2(input_training_data = training_images_list, input_training_labels = training_labels)
#print(network.weights)
# Calculate the error on our test set
#network.calculate_error(test_set = test_images, test_labels = test_labels)发布于 2017-09-25 17:45:53
问题肯定在于你的网络架构--特别是第一个隐藏层。你把8x8的输入输入到一个有4个神经元的隐藏层。首先,没有足够的神经元,64像素中包含的信息仅通过四个神经元就被冲走了。另一个问题(可能与足够的神经元一起消失)是,由于您的predict()函数使用了点积,每个神经元都完全连接到输入端。
识别手写数字的任务本质上与像素的空间配置有关,因此您的网络应该利用这些知识。您应该将输入图像的不同部分提供给第一层的不同神经元。这为这些神经元提供了根据图像像素排列放大或减弱较弱信号的机会(例如,如果你在角落看到一个大信号,它不太可能是1,如果你在中央看到一个大信号,它不可能是0,等等)。
推广这一理念是卷积神经网络的全部内容,也是为什么它们在图像识别任务中如此有效的原因。还有O‘really出版商写的另一个很好的文章,叫做不是另一个MNIST教程,这的确不是另一个教程,但是展示了一些非常有用的视觉化来理解正在发生的事情。
它的长短是这样的:和/OR是一个非常简单的任务,但是您已经跳到了一个非常复杂的任务--您的神经网络体系结构应该具有相应的复杂性跳跃所需的体系结构。卷积神经网络通常遵循一种结构模式:
对于更复杂的任务,更大的CNN将把这些层组合成更大的嵌套体系结构和子网络。了解要使用的层组合是一门艺术,需要进行大量的实验(因此GPU的流行使其迭代和实验速度更快)。但是对于灰度手写体数字,只要利用你已经知道的关于手边任务的信息,你就会看到一个很大的进步--也就是说,它应该利用空间结构。
https://stackoverflow.com/questions/46406559
复制相似问题