吴恩达《深度学习》L1W2作业1

原创

Sparkle^

修改于 2022-06-26 09:35:52

6910

修改于 2022-06-26 09:35:52

文章被收录于专栏：知识锦囊知识锦囊

学习目标

使用numpy，包括函数调用及向量矩阵运算
理解“广播”的概念
向量化代码

用到的重要函数:

math.exp()

np.exp()

numpyarray.reshape() :重塑数组

np.linalg.norm(x, axis = 1, keepdims = True) : 计算每一行的范数

np.outer() :求外积

np.dot()

np.multiply()

np.abs()

1. 使用numpy构建基本函数

需要记住的内容：

-np.exp（x）适用于任何np.array x并将指数函数应用于每个坐标

-sigmoid函数及其梯度

-image2vector通常用于深度学习

-np.reshape被广泛使用。保持矩阵/向量尺寸不变有助于我们消除许多错误。

-numpy具有高效的内置功能

-broadcasting非常有用

1.1 sigmoid函数和np.exp()

目的：理解为什么实现sigmoid函数时，np.exp()比math.exp（）更可取？

提示：sigmoid函数有时也称为逻辑函数，是一种非线性函数，可用于机器学习（逻辑回归），也能用于深度学习。

（1）使用math.exp()构建sigmoid函数，x只能是实数：

import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1/(1 + math.exp(-x))
    ### END CODE HERE ###
    
    return s
    
# 函数输入实数
basic_sigmoid(3)

输出：

0.9525741268224334

x = [1, 2, 3]
# 函数输入矩阵或向量
basic_sigmoid(x)

输出：

TypeError: bad operand type for unary -: 'list'

（2）使用np.exp()构建sigmoid函数，x可以是实数、向量或者矩阵：

import numpy as np # this means you can access numpy functions by writing np.function() instead of numpy.function()

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1 / (1 + np.exp(-x))     # np.exp(x)将指数函数应用于x的每个元素，(1 + np.exp(-x)应用了广播机制
    ### END CODE HERE ###
    
    return s

x = np.array([1, 2, 3])
sigmoid(x)

输出：

array([0.73105858, 0.88079708, 0.95257413])

1.2 Sigmoid的梯度

练习：创建函数sigmoid_grad（）计算sigmoid函数相对于其输入x的梯度。公式为：

sigmoid_derivative(x)=σ′(x)=σ(x)(1−σ(x)) 通常分两步编写此函数代码： 1.将s设为x的sigmoid。你可能会发现sigmoid（x）函数很方便。 2.计算σ′(x)=s(1−s)

def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    s = sigmoid(x)       
    ds = s * (1 - s)               
    ### END CODE HERE ###
    
    return ds
x = np.array([1, 2, 3])
print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))

输出：

sigmoid_derivative(x) = [0.19661193 0.10499359 0.04517666]1。3

1.3 重塑数组

深度学习中两个常用的numpy函数是np.shape和np.reshape()。

-X.shape用于获取矩阵/向量X的shape（维度）。

-X.reshape（...）用于将X重塑为其他尺寸。

例如，在计算机科学中，图像由shape为(length,height,depth=3)的3D数组表示。但是，当你读取图像作为算法的输入时，会将其转换为维度为(length∗height∗3,1)的向量。换句话说，将3D阵列“展开”或重塑为1D向量。

def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)
    ### END CODE HERE ###
    
    return v
    
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = " + str(image2vector(image)))

输出：

image2vector(image) = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]

1.4 行标准化

对数据进行标准化后，梯度下降的收敛速度更快，通常会表现出更好的效果。

通过归一化，也就是将x更改为x / ‖x‖（将x的每个行向量除以其范数）。

def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    x_norm = np.linalg.norm(x, axis = 1, keepdims = True)    # 对水平方向求范数，即对每一行求范数，shape=(2,1)
    
    # Divide x by its norm.
    x = x / x_norm      # 对每个元素标准化，shape=(2,3)，这里应用了广播机制
    ### END CODE HERE ###

    return x

x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = " + str(normalizeRows(x)))

输出：

normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]

1.5 广播和softmax函数

numpy中的“广播”机制对于在不同形状的数组之间执行数学运算非常有用。

softmax函数可以理解为算法需要对两个或多个类进行分类时使用的标准化函数。

def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (n, m).

    Argument:
    x -- A numpy matrix of shape (n,m)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,m)
    """
    
    ### START CODE HERE ### (≈ 3 lines of code)
    # Apply exp() element-wise to x. Use np.exp(...).
    x_exp = np.exp(x)

    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    
    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    s = x_exp / x_sum

    ### END CODE HERE ###
    
    return s
    
x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(x)))

输出：

softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]

2. 向量化——为确保代码的高效计算

需要记住的内容：

-向量化在深度学习中非常重要，它保证了计算的效率和清晰度。

-了解L1和L2损失函数。

-掌握诸多numpy函数，例如np.sum，np.dot，np.multiply，np.maximum等。

练习：尝试区分用非向量化的方法和向量化的方法实现点/外部/元素乘积的区别。

（1）非向量化的方法：

import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

输出：

dot = 278
 ----- Computation time = 0.08447300000002933ms
outer = [[81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [63. 14. 14. 63.  0. 63. 14. 35.  0.  0. 63. 14. 35.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
 ----- Computation time = 0.22404599999992225ms
elementwise multiplication = [81.  4. 10.  0.  0. 63. 10.  0.  0.  0. 81.  4. 25.  0.  0.]
 ----- Computation time = 0.10582699999994727ms
gdot = [26.19713459 12.20793127 23.40980652]
 ----- Computation time = 0.15482099999997168ms

（2）向量化的方法：

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

输出：

dot = 278
 ----- Computation time = 0.0ms
outer = [[81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [63 14 14 63  0 63 14 35  0  0 63 14 35  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]]
 ----- Computation time = 0.0ms
elementwise multiplication = [81  4 10  0  0 63 10  0  0  0 81  4 25  0  0]
 ----- Computation time = 0.0ms
gdot = [ 21.57937154  22.58814194  13.70092277]
 ----- Computation time = 0.0ms

总结：向量化的实现更加简洁高效，对于更大的向量/矩阵，运行时间的差异变得更大。

2.1 实现L1和L2损失函数的Numpy向量化版本

# GRADED FUNCTION: L1

def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.sum(np.abs(y - yhat))
    ### END CODE HERE ###
    
    return loss

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat,y)))

输出：

L1 = 1.1

# GRADED FUNCTION: L2

def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.dot((y - yhat),(y - yhat).T)   # 这里y和yhat看作是矩阵，所以np.dot()执行的是矩阵乘法，故使用转置
    ### END CODE HERE ###
    
    return loss

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(L2(yhat,y)))

输出：

L2 = 0.43

我的疑问：为什么loss = np.dot((y - yhat),(y - yhat).T) 这一行代码需要将(y - yhat)转置？去掉转置后发现结果一样，这是为什么？

查阅资料：np.dot()函数主要有两个功能：1.向量点积 2.矩阵乘法

1. 向量点积（两个都是一维数组，会自动转化为一维矩阵进行计算）：

import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,8,9,10])
print(np.dot(a,b))

输出：

np.dot(a,b)，其中a 是一维的向量，b是一维的向量，所以做的是向量点积运算。

2. 矩阵乘法（若有一个是矩阵，便执行矩阵乘法运算）：

import numpy as np 
a = np.random.randint(0,10, size = (5,5))
b = np.array([1,2,3,4,5])
print("the shape of a is " + str(a.shape))
print("the shape of b is " + str(b.shape))
print(np.dot(a, b))

输出:

output:
the shape of a is (5, 5)
the shape of b is (5,)
[42 85 50 81 76]

因此，如果代码为loss = np.dot((y - yhat),(y - yhat)，看作执行向量点积运算；如果代码为loss = np.dot((y - yhat),(y - yhat).T)，看作执行矩阵乘法运算。两个代码风格存在理解上的差别，但结果一样。

参考资料：

吴恩达《深度学习》L1W2作业1 - Heywhale.com

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

深度学习

python

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

深度学习

python

登录后参与评论

0 条评论

热度