# Softmax Classifier

softmax分类器和logistics regression有点像，softmax其实就是从logistics发张过来的。由于是多分类了，需要走更多的概率来表示每一个分类。softmax的公式：

？而是绕这么大的一圈最后还是求最大值。①我们需要的其实就是max，但是这个max有一个缺点，就是不可导。所以我们需要一个函数来模拟max，exp是指数函数，数值大的增长的速度就会更块，这样就可以把最大的区分出来。同时也是可导的，这样设计也可以使得特征对概率的影响是乘性的。②softmax是从logistics发展过来的，自然就用到了交叉熵损失函数，

，目标类

，这个形式非常简洁，而且与线性回归（采用最小均方误差目标函数）、两类分类（采用cross-entropy目标函数）时的形式一致。 主要实现流程： 首先就是exp的归一化操作，得到当前样本属于每一个类别的概率，

### 代码实现

class DataPrecessing(object):
(x_train, x_target_tarin), (x_test, x_target_test) = mnist.load_data()
x_train = x_train.astype('float32')/255.0
x_test = x_test.astype('float32')/255.0
x_train = x_train.reshape(len(x_train), np.prod(x_train.shape[1:]))
x_test = x_test.reshape(len(x_test), np.prod(x_test.shape[1:]))
x_train = np.mat(x_train)
x_test = np.mat(x_test)
x_target_tarin = np.mat(x_target_tarin)
x_target_test = np.mat(x_target_test)
return x_train, x_target_tarin, x_test, x_target_test

def Calculate_accuracy(self, target, prediction):
score = 0
for i in range(len(target)):
if target[i] == prediction[i]:
score += 1
return score/len(target)

def predict(self, test, weights):
h = test * weights
return h.argmax(axis=1)

def gradientAscent(feature_data, label_data, k, maxCycle, alpha):
input:feature_data(mat) feature
label_data(mat) target
k(int) number of classes
maxCycle(int) max iterator
alpha(float) learning rate
'''
Dataprecessing = DataPrecessing()
x_train, x_target_tarin, x_test, x_target_test = Dataprecessing.loadFile()
x_target_tarin = x_target_tarin.tolist()[0]
x_target_test = x_target_test.tolist()[0]
m, n = np.shape(feature_data)
weights = np.mat(np.ones((n, k)))
i = 0
while i <= maxCycle:
err = np.exp(feature_data*weights)
if i % 100 == 0:
print('cost score : ', cost(err, label_data))
train_predict = Dataprecessing.predict(x_train, weights)
test_predict = Dataprecessing.predict(x_test, weights)
print('Train_accuracy : ', Dataprecessing.Calculate_accuracy(x_target_tarin, train_predict))
print('Test_accuracy : ', Dataprecessing.Calculate_accuracy(x_target_test, test_predict))
rowsum = -err.sum(axis = 1)
rowsum = rowsum.repeat(k, axis = 1)
err = err / rowsum
for x in range(m):
err[x, label_data[x]] += 1
weights = weights + (alpha/m) * feature_data.T * err
i += 1
return weights

def cost(err, label_data):
m = np.shape(err)[0]
sum_cost = 0.0
for i in range(m):
if err[i, label_data[i]] / np.sum(err[i, :]) > 0:
sum_cost -= np.log(err[i, label_data[i]] / np.sum(err[i, :]))
else:
sum_cost -= 0
return sum_cost/m

    Dataprecessing = DataPrecessing()
x_train, x_target_tarin, x_test, x_target_test = Dataprecessing.loadFile()
x_target_tarin = x_target_tarin.tolist()[0]
gradientAscent(x_train, x_target_tarin, 10, 100000, 0.001)

68 篇文章14 人订阅

0 条评论

## 相关文章

### 机器学习必须熟悉的算法之word2vector

word2vector已经成为NLP领域的基石算法。作为一名AI 从业者，如果不能主动去熟悉该算法，应该感到脸红。本文是一篇翻译的文章，原文链接是：http:/...

95915

2555

4507

50212

3468

3965

6898

### SVM笔记

SVM（Support Vector Machine）是一种寻求最大分类间隔的机器学习方法，广泛应用于各个领域，许多人把SVM当做首选方法，它也被称之为最优分类...

2093

4092

3595