# 逻辑回归

## 逻辑回归模型

y=\sigma(f(\boldsymbol{x}))=\sigma\left(\boldsymbol{w}^{T} \boldsymbol{x}\right)=\frac{1}{1+e^{-\boldsymbol{w}^{T} \boldsymbol{x}}}

## 损失函数

P(y | \boldsymbol{x})=\left\{\begin{array}{c}{p, y=1} \\ {1-p, y=0}\end{array}\right.

P\left(y_{i} | \boldsymbol{x}_{i}\right)=p^{y_{i}}(1-p)^{1-y_{i}}

P =\prod_{n=1}^{N} p^{y_{n}}(1-p)^{1-y_{n}}

\begin{array}{l}{L=\sum_{n=1}^{N} \ln \left(p^{y_{n}}(1-p)^{1-y_{n}}\right)} \\ {=\sum_{n=1}^{N}\left(y_{n} \ln (p)+\left(1-y_{n}\right) \ln (1-p)\right)}\end{array}

## 梯度计算

p=\frac{1}{1+e^{-\boldsymbol{w}^{T} \boldsymbol{x}}}

1-p的公式：

1-p=\frac{e^{-\boldsymbol{w}^{T} \boldsymbol{x}}}{1+e^{-\boldsymbol{w}^{T} \boldsymbol{x}}}

p的导数如下：

p^{\prime}=p(1-p) \boldsymbol{x}

1-p的导数如下：

(1-p)^{\prime}=-p(1-p) \boldsymbol{x}

\begin{aligned} \nabla F(\boldsymbol{w}) &=\nabla\left(\sum_{n=1}^{N}\left(y_{n} \ln (p)+\left(1-y_{n}\right) \ln (1-p)\right)\right) \\ &=\sum\left(y_{n} \ln ^{\prime}(p)+\left(1-y_{n}\right) \ln ^{\prime}(1-p)\right) \\ &=\sum\left(\left(y_{n} \frac{1}{p} p^{\prime}\right)+\left(1-y_{n}\right) \frac{1}{1-p}(1-p)^{\prime}\right) \\ &=\sum_{N}\left(y_{n}(1-p) \boldsymbol{x}_{n}-\left(1-y_{n}\right) p \boldsymbol{x}_{n}\right) \\ &=\sum_{n=1}^{N}\left(y_{n}-p\right) \boldsymbol{x}_{n} \end{aligned}

## 逻辑回归的决策边界

\frac{1}{1+e^{-\boldsymbol{w}^{T} \boldsymbol{x}}}=0.5

e^{-\boldsymbol{w}^{T} \boldsymbol{x}}=1=e^{0}

-\boldsymbol{w}^{T} \boldsymbol{x}=0

## 代码

class LogisticRegression():
""" A simple logistic regression model with L2 regularization (zero-mean
Gaussian priors on parameters). """

def __init__(self, x_train=None, y_train=None, x_test=None, y_test=None,
alpha=.1, synthetic=False):
# Set L2 regularization strength
self.alpha = alpha
# Set the data.
self.set_data(x_train, y_train, x_test, y_test)
# Initialize parameters to zero, for lack of a better choice.
self.betas = np.zeros(self.x_train.shape[1])

def negative_lik(self, betas):
return -1 * self.lik(betas)

def lik(self, betas):
""" Likelihood of the data under the current settings of parameters. """
# Data likelihood
l = 0
for i in range(self.n):
l += log(sigmoid(self.y_train[i] *
np.dot(betas, self.x_train[i,:])))
# Prior likelihood
for k in range(1, self.x_train.shape[1]):
l -= (self.alpha / 2.0) * self.betas[k]**2
return l

def train(self):
""" Define the gradient and hand it off to a scipy gradient-based
optimizer. """
# Define the derivative of the likelihood with respect to beta_k.
# Need to multiply by -1 because we will be minimizing.
dB_k = lambda B, k : (k > 0) * self.alpha * B[k] - np.sum([
self.y_train[i] * self.x_train[i, k] *
sigmoid(-self.y_train[i] * np.dot(B, self.x_train[i,:]))
for i in range(self.n)])

# The full gradient is just an array of componentwise derivatives
dB = lambda B : np.array([dB_k(B, k)
for k in range(self.x_train.shape[1])])
# Optimize
self.betas = fmin_bfgs(self.negative_lik, self.betas, fprime=dB)

def set_data(self, x_train, y_train, x_test, y_test):
""" Take data that's already been generated. """
self.x_train = x_train
self.y_train = y_train
self.x_test = x_test
self.y_test = y_test
self.n = y_train.shape[0]

def training_reconstruction(self):
p_y1 = np.zeros(self.n)
for i in range(self.n):
p_y1[i] = sigmoid(np.dot(self.betas, self.x_train[i,:]))
return p_y1

def test_predictions(self):
p_y1 = np.zeros(self.n)
for i in range(self.n):
p_y1[i] = sigmoid(np.dot(self.betas, self.x_test[i,:]))
return p_y1

def plot_training_reconstruction(self):
plot(np.arange(self.n), .5 + .5 * self.y_train, 'bo')
plot(np.arange(self.n), self.training_reconstruction(), 'rx')
ylim([-.1, 1.1])

def plot_test_predictions(self):
plot(np.arange(self.n), .5 + .5 * self.y_test, 'yo')
plot(np.arange(self.n), self.test_predictions(), 'rx')
ylim([-.1, 1.1])

if __name__ == "__main__":
from pylab import *
# Create 20 dimensional data set with 25 points -- this will be
# susceptible to overfitting.
data = SyntheticClassifierData(25, 20)

# Run for a variety of regularization strengths
alphas = [0, .001, .01, .1]
for j, a in enumerate(alphas):
# Create a new learner, but use the same data for each run
lr = LogisticRegression(x_train=data.X_train, y_train=data.Y_train,
x_test=data.X_test, y_test=data.Y_test, alpha=a)

print "Initial likelihood:"
print lr.lik(lr.betas)

# Train the model
lr.train()

# Display execution info
print "Final betas:"
print lr.betas
print "Final lik:"
print lr.lik(lr.betas)

# Plot the results
subplot(len(alphas), 2, 2*j + 1)
lr.plot_training_reconstruction()
ylabel("Alpha=%s" % a)
if j == 0:
title("Training set reconstructions")

subplot(len(alphas), 2, 2*j + 2)
lr.plot_test_predictions()
if j == 0:
title("Test set predictions")
show()

## Softmax

h_{\theta}(x)=\left[\begin{array}{c}{P(y=1 | x ; \theta)} \\ {P(y=2 | x ; \theta)} \\ {\vdots} \\ {P(y=K | x ; \theta)}\end{array}\right]=\frac{1}{\sum_{j=1}^{K} \exp \left(\theta_{j}^{T} x\right)}\left[\begin{array}{c}{\exp \left(\theta_{1}^{T} x\right)} \\ {\exp \left(\theta_{2}^{T} x\right)} \\ {\vdots} \\ {\exp \left(\theta_{K}^{T} x\right)}\end{array}\right]

J(\theta)=-\left[\sum_{i=1}^{n} \sum_{k=1}^{K} \mathbf{1}\left\{y^{(i)}=k\right\} \ln \frac{\exp \left(\theta_{k}^{T} x_{i}\right)}{\sum_{j=1}^{K} \exp \left(\theta_{j}^{T} x_{i}\right)}\right]

0 条评论

• ### 朴素贝叶斯

假设现在有一些评论数据，需要识别出这篇文本属于正向评论还是负面评论，也就是对文本进行分类。用数学语言描述就是： 假设已经有分好类的N篇文档：(d1,c1)、(d...

• ### K近邻分类

核心思想：基于距离的模板匹配 KNN是一种判别模型，即支持分类问题，也支持回归问题，是一种非线性模型，天然支持多分类，而且没有训练过程。

• ### Quora Insincere Questions Classification

Quora平台，简单的来说就是美国版的知乎。最近Quora拿出25,000美元作为奖金，举办了一场Kaggle比赛:Quora Insincere Questi...

• ### python实现PID

最近捣鼓ROS的时候，发现github上有人用python实现了PID，虽然可能执行效率不高，但是用python写工具的时候还是很方便的。从github上把代码...

• ### 探索学习率设置技巧以提高Keras中模型性能 | 炼丹技巧

学习率是一个控制每次更新模型权重时响应估计误差而调整模型程度的超参数。学习率选取是一项具有挑战性的工作，学习率设置的非常小可能导致训练过程过长甚至训练进程被卡住...

• ### 一篇值得收藏的ML数据预处理原理与实践文章

数据缺失，在现实生活中是十分常见的，原因也是非常复杂的，在我们进行建模的过程中，如果我们不对这些缺失值进行适当的处理，出来的模型恐怕也效果不太好，其重要性这里就...

• ### Machine Learning-数据预处理教程学习

数据缺失，在现实生活中是十分常见的，原因也是非常复杂的，在我们进行建模的过程中，如果我们不对这些缺失值进行适当的处理，出来的模型恐怕也效果不太好，其重要性这里就...

• ### Python-并发下载-多线程实现-上

② 同时启动多个采集线程，每个线程都从网页页码队列 pageQueue 中取出一个要访问的页码，构建网址，访问网址并爬取数据。操作完一个网页后再从网页页码队列中...

• ### Python MFCC算法

MFCC(梅尔倒谱系数)的算法思路 读取波形文件 汉明窗 分帧 傅里叶变换 回归离散数据 取得特征数据 Python示例代...

Bigo推荐算法工程师