首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >第五篇:《机器学习之逻辑回归(下)》

第五篇:《机器学习之逻辑回归(下)》

作者头像
ACM算法日常
发布2018-08-07 19:47:29
3470
发布2018-08-07 19:47:29
举报
文章被收录于专栏:ACM算法日常ACM算法日常

第五篇:《机器学习之逻辑回归(下)》

看不懂的同学可以看看先前的文章掌握基础概念

“用逻辑回归分类器识别猫”

(简化版未包含正则化计算)

先配置好Python环境(建议使用python3版本)

操作步骤如下

1.准备工作2.数据处理3.函数编写

4.训练模型5.测试数据6.继续优化

这个是你们要用到的资料

https://pan.baidu.com/s/1tnMHvLWB_qXyuoPiBgnhaQ

1.准备工作

#说明

·numpy Python里常用的科学计算库

·matplotlib 用来绘制图像

·h5py 处理H5格式的文件

·PIL和scipy测试你的模型

#导入必要的库和文件

import numpy as np

import matplotlib.pyplot as plt

import h5py

import scipy

import PIL

from PIL import Image

from scipy import ndimage

from lr_utils import load_dataset    #需要添加文件lr_utils.py

%matplotlib inline

2.数据处理

# 加载数据(猫/非猫)

train_set_x_orig,train_set_y,test_set_x_orig, test_set_y,classes = load_dataset()



# 显示其中一个训练样本

index=16

plt.imshow(train_set_x_orig[index])
m_train=train_set_x_orig.shape[0]

m_test=test_set_x_orig.shape[0]

num_px=train_set_x_orig.shape[1]



train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T

test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T

3.函数编写

这里就是将我们上一篇里的数学公式转成了Python代码

def sigmoid(x):

    return 1.0/(1.0+1.0/np.exp(x))

def initialize_with_zeros(dim):

    w=np.zeros((dim,1))

    b=0

    #assert(w.shape == (dim, 1))

    #assert(isinstance(b, float) or isinstance(b, int))

    return w,b


print (sigmoid([0,2]))

dim=4

w,b=initialize_with_zeros(dim)

print ("w="+str(w)+"\nb="+str(b))

#对于图像输入,W的形状为(num_px * num_px * 3, 1)

def propagate(w,b,X,Y):

    m = X.shape[1]

    A = sigmoid(np.dot(w.T,X) + b)            # compute activation

    cost = -(np.sum(np.dot(Y,np.log(A).T)+np.dot((1-Y),np.log(1-A).T)))/m 

    dw = (np.dot(X,(A-Y).T))/m

    db = (np.sum(A-Y))/m

    #assert(dw.shape == w.shape)

    #assert(db.dtype == float)

    cost = np.squeeze(cost)

    #assert(cost.shape == ())

    grads = {"dw": dw,
             "db": db}

    return grads, cost

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):

    """
    num_iterations:你要优化的次数(模型训练次数)

    """
    costs=[]

    a=learning_rate

    for i in range(num_iterations):

        # 前向与反向传播

        grads,cost=propagate(w,b,X,Y)

        # 得到 偏导数

        dw=grads["dw"]

        db=grads["db"]

        # 更新参数

        w=w-a*dw

        b=b-a*db

        # 做一些记录

        if i % 100 == 0:

            costs.append(cost)

        if print_cost and i % 100 == 0:

            print ("Cost after iteration %i: %f" %(i, cost))

    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs

def predict(w, b, X):

    m = X.shape[1]

    Y_prediction = np.zeros((1,m))

    w = w.reshape(X.shape[0], 1)

    A = sigmoid(np.dot(w.T,X) + b)

    for i in range(A.shape[1]):

            if A[:,i] >= 0.5:

                Y_prediction[:, i] = 1

            else:

                Y_prediction[:, i] = 0

    #assert(Y_prediction.shape == (1, m))

    return Y_prediction

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):

    # 初始化参数

    w, b = initialize_with_zeros(X_train.shape[0])

    # 梯度下降

    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost=True)

    # 从parameters中取出参数

    w = parameters["w"]

    b = parameters["b"]

    # 预测 训练/测试 数据集

    Y_prediction_test = predict(w, b, X_test)

    Y_prediction_train = predict(w, b, X_train)

    # 打印 训练/测试 准确度

    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))

    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}

    return d

4.模型训练

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

经过了2000次的训练

模型对训练数据拟合度为99%

对于测试数据的拟合度为70%

5.测试数据

# 一个分类的图片的例子。

index = 5

plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))

print ("y = " + str(test_set_y[0,index]) )

print (", you predicted that it is a \"")

print(classes[int(np.squeeze(test_set_y[0,index]))].decode("utf-8"))

print ("\" picture.")

6.继续优化

# 绘图学习曲线(带成本)

costs = np.squeeze(d['costs'])

plt.plot(costs)

plt.ylabel('cost')

plt.xlabel('iterations (per hundreds)')

plt.title("Learning rate =" + str(d["learning_rate"]))

plt.show()
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-08-03,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 ACM算法日常 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档