# Coursera吴恩达《神经网络与深度学习》课程笔记（2）-- 神经网络基础之逻辑回归

### 2. Logistic Regression

\hat y = w^Tx+b

Sigmoid函数是一种非线性的S型函数，输出被限定在[0,1]之间，通常被用在神经网络中当作激活函数（Activation function）使用。Sigmoid函数的表达式和曲线如下所示： Sigmoid(z)=\frac{1}{1+e^{-z}}

\sigma'(z)=\sigma(z)(1-\sigma(z))

### 3. Logistic Regression Cost Function

Loss function的原则和目的就是要衡量预测输出\hat y与真实样本输出y的接近程度。平方错误其实也可以，只是它是non-convex的，不利于使用梯度下降算法来进行全局优化。因此，我们可以构建另外一种Loss function，且是convex的，如下所示： L(\hat y,y)=-(ylog\ \hat y+(1-y)log\ (1-\hat y))

J(w,b)=\frac1m\sum_{i=1}^mL(\hat y^{(i)},y^{(i)})=-\frac1m\sum_{i=1}^m[y^{(i)}log\ \hat y^{(i)}+(1-y^{(i)})log\ (1-\hat y^{(i)})]

Cost function已经推导出来了，Cost function是关于待求系数w和b的函数。我们的目标就是迭代计算出最佳的w和b值，最小化Cost function，让Cost function尽可能地接近于零。

w:=w-\alpha\frac{\partial J(w,b)}{\partial w}

b:=b-\alpha\frac{\partial J(w,b)}{\partial b}

## 6. More Derivative Examples

Andrew给出了更加复杂的求导数的例子，略。

### 8. Derivatives with a Computation Graph

\frac{\partial J}{\partial a}=\frac{\partial J}{\partial v}\cdot \frac{\partial v}{\partial a}=3\cdot 1=3

### 9. Logistic Regression Gradient Descent

da=\frac{\partial L}{\partial a}=-\frac ya+\frac{1-y}{1-a}

dz=\frac{\partial L}{\partial z}=\frac{\partial L}{\partial a}\cdot \frac{\partial a}{\partial z}=(-\frac ya+\frac{1-y}{1-a})\cdot a(1-a)=a-y

dw_1=\frac{\partial L}{\partial w_1}=\frac{\partial L}{\partial z}\cdot \frac{\partial z}{\partial w_1}=x_1\cdot dz=x_1(a-y)

dw_2=\frac{\partial L}{\partial w_2}=\frac{\partial L}{\partial z}\cdot \frac{\partial z}{\partial w_2}=x_2\cdot dz=x_2(a-y)

db=\frac{\partial L}{\partial b}=\frac{\partial L}{\partial z}\cdot \frac{\partial z}{\partial b}=1\cdot dz=a-y

w_1:=w_1-\alpha\ dw_1

w_2:=w_2-\alpha\ dw_2

b:=b-\alpha\ db

### 10. Gradient descent on m examples

z^{(i)}=w^Tx^{(i)}+b \hat y^{(i)}=a^{(i)}=\sigma(z^{(i)})

J(w,b)=\frac1m\sum_{i=1}^mL(\hat y^{(i)},y^{(i)})=-\frac1m\sum_{i=1}^m[y^{(i)}log\ \hat y^{(i)}+(1-y^{(i)})log\ (1-\hat y^{(i)})]

Cost function关于w和b的偏导数可以写成和平均的形式：

dw_1=\frac1m\sum_{i=1}^mx_1^{(i)}(a^{(i)}-y^{(i)})

dw_2=\frac1m\sum_{i=1}^mx_2^{(i)}(a^{(i)}-y^{(i)})

db=\frac1m\sum_{i=1}^m(a^{(i)}-y^{(i)})

J=0; dw1=0; dw2=0; db=0;
for i = 1 to m
z(i) = wx(i)+b;
a(i) = sigmoid(z(i));
J += -[y(i)log(a(i))+(1-y(i)）log(1-a(i));
dz(i) = a(i)-y(i);
dw1 += x1(i)dz(i);
dw2 += x2(i)dz(i);
db += dz(i);
J /= m;
dw1 /= m;
dw2 /= m;
db /= m;

w_1:=w_1-\alpha\ dw_1

w_2:=w_2-\alpha\ dw_2

b:=b-\alpha\ db

