A handwriting recognition system.
Some images of handwriting numbers(only digits 0-9) were processed through image-processing software to make them all the same size and color.They are all 32x32 black and white. And the binary images were converted to text format like the following two text files:
The first one shows digit 0 and the second one shows digit 2
There are 2000 training examples similar to the figures above and there are roughly 200 samples from each digit.
Our goalis to recognize the number in a newly given image by applying KNN algorithm.
(1) Converting images(text files as shown above) into listswhich can be used in our classifier(because we are going to use the simple KNN algorithm built at the beginning of this article).
(2) Get the label setincluding labels for each training example. The label of each training example is the number it represents. We can get the label of each training example from the name of the file that stores this example. For example, file 9_45.txt stores an example, so the label of this example is 9. The following code segment completes two tasks: get the array of training examples and get the label set.
Annotations about function of python:
In this code segment weuse the function listdir(directory) in module os, so that we can see names of files in the given directory.
(3) Testing our classifier and calculate the error rate.
We have about 900 test examples similar to the training examples to test the error rate of our classifier. Since we are going to test the classifier, it’s not necessary to store all the test examples in a big array, we can just iterate all the test examples and test them individually with our simple KNN algorithm.
the whole function to test the classifier is:
(4) In short,if we are given a new image,first, we convert this image to a matrix, and then we convert this matrix to a vector that can be used in our simple KNN algorithm, then by applying the KNN algorithm, we can get a predication about this new image.
(5) Noticethat even though some images represent the same number, the number may have various shapes (handwriting number). And so our goal is to train the algorithm to recognize different numbers and same number of different shapes.