前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >斯坦福CS231n项目实战(四):浅层神经网络

斯坦福CS231n项目实战(四):浅层神经网络

作者头像
红色石头
发布2019-05-25 22:45:04
6490
发布2019-05-25 22:45:04
举报

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/red_stone1/article/details/79310173

我的网站:红色石头的机器学习之路 我的CSDN:红色石头的专栏 我的知乎:红色石头 我的微博:RedstoneWill的微博 我的GitHub:RedstoneWill的GitHub 我的微信公众号:红色石头的机器学习之路(ID:redstonewill)

神经网络(Neural Network)是一种非线性分类器,有别于之前介绍的线性分类器SVM和Softmax,更复杂一些。最简单的多分类浅层神经网络结构示例如下:

神经网络整个过程分为:正向传播和反向传播。正向传播时,隐藏层每个神经元都类似一个线性分类器,经过ReLU激活函数,所有隐藏层的神经元再进入到输出层,经过Softmax选择概率最大的标签作为预测分类值。反向传播时,利用梯度下降算法,计算Loss对应的各个层下权重W和常数项b的梯度。一般方法是根据计算图(Computional Graph),利用链式法则计算梯度:

∂f∂x=∂f∂q⋅∂q∂x∂f∂x=∂f∂q⋅∂q∂x

\frac{\partial f}{\partial x}=\frac{\partial f}{\partial q}\cdot \frac{\partial q}{\partial x}

以函数f(x)=11+exp(−(w0x0+w1x1+w2))f(x)=11+exp(−(w0x0+w1x1+w2))f(x)=\frac{1}{1+exp(-(w_0x_0+w_1x_1+w_2))}为例,介绍一下计算图:

计算图中,绿色表示正向传播过程,红色表示反向传播求解梯度过程。

关于浅层神经网络,我在之前的笔记有过详细介绍:Coursera吴恩达《神经网络与深度学习》课程笔记(4)– 浅层神经网络

这里就不赘述了。

下面是浅层神经网络的实例代码,本文详细代码请见我的:

1. Download the CIFAR10 datasets, and load it

代码语言:javascript
复制
# Load the raw CIFAR-10 data.
cifar10_dir = 'CIFAR10/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
代码语言:javascript
复制
Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

Show some CIFAR10 images

代码语言:javascript
复制
classes = ['plane', 'car', 'bird', 'cat', 'dear', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
num_each_class = 7

for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, num_each_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + (y + 1)
        plt.subplot(num_each_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()

Subsample the data for more efficient code execution

代码语言:javascript
复制
# Split the data into train, val, test sets and dev sets
num_train = 49000
num_val = 1000
num_test = 1000
num_dev = 500

# Validation set
mask = range(num_train, num_train + num_val)
X_val = X_train[mask]
y_val = y_train[mask]

# Train set
mask = range(num_train)
X_train = X_train[mask]
y_train = y_train[mask]

# Test set
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
代码语言:javascript
复制
Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

2. Preprocessing

Reshape the images data into rows

代码语言:javascript
复制
# Preprocessing: reshape the images data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))

print('Train data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
代码语言:javascript
复制
Train data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)

Subtract the mean images

代码语言:javascript
复制
# Processing: subtract the mean images
mean_image = np.mean(X_train, axis=0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image

3. Define a two-layer neural network

代码语言:javascript
复制
class TwoLayerNet(object):
    def __init__(self, input_size, hidden_size, output_size, std = 1e-4):
        """
        Initialize the model weights
        W1: First layer weights; has shape (D, H)
        b1: First layer biases; has shape (H,)
        W2: Second layer weights; has shape (H, C)
        b2: Second layer biases; has shape (C,)
        Inputs:
        - input_size: The dimension D of the input data.
        - hidden_size: The number of neurons H in the hidden layer.
        - output_size: The number of classes C.
        """
        self.params = {}
        self.params['W1'] = std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

    def loss(self, X, y, reg = 0.0):
        """
        Two-layer network loss function, vectorized implementation (without loops).
        Inputs:
        - X: A numpy array of shape (num_train, D) contain the training data
          consisting of num_train samples each of dimension D
        - y: A numpy array of shape (num_train,) contain the training labels,
          where y[i] is the label of X[i]
        - reg: float, regularization strength
        Return:
        - loss: the loss value between predict value and ground truth
        - grads: Dictionary mapping parameter names to gradients of those parameters
          with respect to the loss function; has the same keys as self.params.
          Contain 'dW1', 'db1', 'dW2', 'db2'
        """
        N, dim = X.shape
        grads = {}

        # input layer ==> hidden layer ==> ReLU ==> output layer ==> Softmax
        W1 = self.params['W1']
        b1 = self.params['b1']
        W2 = self.params['W2']
        b2 = self.params['b2']

        # input layer==> hidden layer
        Z1 = np.dot(X, W1) + b1
        # hidden layer ==> ReLU
        A1 = np.maximum(0, Z1)    # ReLU function
        # ReLU ==> output layer
        scores = np.dot(A1, W2) + b2
        # output layer ==> Softmax
        scores_shift = scores - np.max(scores, axis=1).reshape(-1, 1)
        Softmax_output = np.exp(scores_shift) / np.sum(np.exp(scores_shift), axis=1).reshape(-1, 1)
        loss = -np.sum(np.log(Softmax_output[range(N), list(y)]))
        loss /= N
        loss += 0.5 * reg * np.sum(W1 * W1) + 0.5 * reg * np.sum(W2 * W2)

        # grads
        dS = Softmax_output.copy()
        dS[range(N), list(y)] += -1
        dS /= N
        dW2 = np.dot(A1.T, dS)
        db2 = np.sum(dS, axis=0)
        dA1 = np.dot(dS, W2.T)
        dZ1 = dA1 * (A1 > 0)
        dW1 = np.dot(X.T, dZ1)
        db1 = np.sum(dZ1, axis=0)
        dW2 += reg * W2
        dW1 += reg * W1

        grads['W1'] = dW1
        grads['b1'] = db1
        grads['W2'] = dW2
        grads['b2'] = db2

        return loss, grads

    def predict(self, X):
        """
        Use the trained weights to predict data labels
        Inputs:
        - X: A numpy array of shape (num_test, D) contain the test data
        Outputs:
        - y_pred: A numpy array, predicted labels for the data in X
        """
        W1 = self.params['W1']
        b1 = self.params['b1']
        W2 = self.params['W2']
        b2 = self.params['b2']

        Z1 = np.dot(X, W1) + b1
        A1 = np.maximum(0, Z1)    # ReLU function
        scores = np.dot(A1, W2) + b2
        y_pred = np.argmax(scores, axis=1)

        return y_pred

    def train(self, X, y, X_val, y_val, learning_rate=1e-3, learning_rate_decay=0.95,
            reg=5e-6, num_iters=100, batch_size=200, print_flag=False):
        """
        Train Two-layer neural network classifier using SGD
        Inputs:
        - X: A numpy array of shape (num_train, D) contain the training data
          consisting of num_train samples each of dimension D
        - y: A numpy array of shape (num_train,) contain the training labels,
          where y[i] is the label of X[i], y[i] = c, 0 <= c <= C
        - X_val: A numpy array of shape (num_val, D) contain the validation data
          consisting of num_val samples each of dimension D
        - y_val: A numpy array of shape (num_val,) contain the validation labels,
          where y_val[i] is the label of X_val[i], y_val[i] = c, 0 <= c <= C
        - learning rate: (float) learning rate for optimization
        - learning_rate_decay: Scalar giving factor used to decay the learning rate
          after each epoch.
        - reg: (float) regularization strength
        - num_iters: (integer) numbers of steps to take when optimization
        - batch_size: (integer) number of training examples to use at each step
        - print_flag: (boolean) If true, print the progress during optimization
        Outputs:
        - a dictionary contains the loss_history, train_accuracy_history and val_accuracy_history
        """
        num_train = X.shape[0]
        iterations_per_epoch = max(num_train / batch_size, 1)
        loss_history = []
        train_accuracy_history = []
        val_accuracy_history = []

        for t in range(num_iters):
            idx_batch = np.random.choice(num_train, batch_size, replace=True)
            X_batch = X[idx_batch]
            y_batch = y[idx_batch]
            loss, grads = self.loss(X_batch, y_batch, reg)
            loss_history.append(loss)
            self.params['W1'] += -learning_rate * grads['W1']
            self.params['b1'] += -learning_rate * grads['b1']
            self.params['W2'] += -learning_rate * grads['W2']
            self.params['b2'] += -learning_rate * grads['b2']

            # Every epoch, check train and val accuracy and decay learning rate.
            if t % iterations_per_epoch == 0:
                train_accuracy = np.mean(self.predict(X_batch) == y_batch)
                val_accuracy = np.mean(self.predict(X_val) == y_val)
                train_accuracy_history.append(train_accuracy)
                val_accuracy_history.append(val_accuracy)

                # Decay learning rate
                learning_rate *= learning_rate_decay

            # print the progress during optimization
            if print_flag and t%100 == 0:
                print('iteration %d / %d: loss %f' % (t, num_iters, loss))

        return {
            'loss_history': loss_history,
            'train_accuracy_history': train_accuracy_history,
            'val_accuracy_history': val_accuracy_history,
        }

4. Stochastic Gradient Descent

代码语言:javascript
复制
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=1000, batch_size=200,
            learning_rate=1e-4, learning_rate_decay=0.95,
            reg=0.25, print_flag=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print('Validation accuracy: ', val_acc)
代码语言:javascript
复制
iteration 0 / 1000: loss 2.302782
iteration 100 / 1000: loss 2.302411
iteration 200 / 1000: loss 2.299242
iteration 300 / 1000: loss 2.272836
iteration 400 / 1000: loss 2.207163
iteration 500 / 1000: loss 2.081039
iteration 600 / 1000: loss 2.102869
iteration 700 / 1000: loss 2.099161
iteration 800 / 1000: loss 2.041299
iteration 900 / 1000: loss 2.010855
Validation accuracy:  0.286
代码语言:javascript
复制
# Plot the loss_history
plt.figure(figsize=(10,12))
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('loss history')
plt.xlabel('Iteration number')
plt.ylabel('loss value')

# Plot the train_accuracy_history & val_accuracy_history
plt.subplot(2, 1, 2)
plt.plot(stats['train_accuracy_history'], label='train')
plt.plot(stats['val_accuracy_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.legend()
plt.show()

5. Tune your hyperparameters

代码语言:javascript
复制
# Hyperparameters
learning_rates = [1e-4, 5e-4, 9e-4, 13e-4, 15e-4]
regularization_strengths = [0.25, 0.5, 0.75, 1.0]
num_iters = 3000
batch_size = 200
learning_rate_decay = 0.98

# Net structure
input_size = 32 * 32 * 3
hidden_size = [50, 100, 150]
num_classes = 10

# Initialization
best_net = None
best_hidden_size = None
best_val = -1
best_lr = None
best_reg = None
results = {}

# Train the two layers network
for i in range(len(hidden_size)):
    for lr in learning_rates:
        for reg in regularization_strengths:
            net = TwoLayerNet(input_size, hidden_size[i], num_classes)
            stats = net.train(X_train, y_train, X_val, y_val,
                num_iters=num_iters, batch_size=batch_size,
                learning_rate=lr, learning_rate_decay=learning_rate_decay,
                reg=reg, print_flag=False)
            train_accuracy = stats['train_accuracy_history'][-1]
            val_accuracy = stats['val_accuracy_history'][-1]
            if val_accuracy > best_val:
                best_lr = lr
                best_reg = reg
                best_val = val_accuracy
                best_net = net
                best_hidden_size = hidden_size[i]
            results[(lr, reg)] = train_accuracy, val_accuracy
            print('hidden_size: %d lr: %e reg: %e train accuracy: %f val accuracy: %f' %
                  (hidden_size[i], lr, reg, results[(lr, reg)][0], results[(lr, reg)][1]))
print('Best hidden_size: %d\nBest lr: %e\nBest reg: %e\ntrain accuracy: %f\nval accuracy: %f' %
     (hidden_size[i], best_lr, best_reg, results[(best_lr, best_reg)][0], results[(best_lr, best_reg)][1]))
代码语言:javascript
复制
hidden_size: 50 lr: 1.000000e-04 reg: 2.500000e-01 train accuracy: 0.410000 val accuracy: 0.407000
hidden_size: 50 lr: 1.000000e-04 reg: 5.000000e-01 train accuracy: 0.415000 val accuracy: 0.399000
hidden_size: 50 lr: 1.000000e-04 reg: 7.500000e-01 train accuracy: 0.370000 val accuracy: 0.400000
hidden_size: 50 lr: 1.000000e-04 reg: 1.000000e+00 train accuracy: 0.420000 val accuracy: 0.403000
hidden_size: 50 lr: 5.000000e-04 reg: 2.500000e-01 train accuracy: 0.595000 val accuracy: 0.494000
hidden_size: 50 lr: 5.000000e-04 reg: 5.000000e-01 train accuracy: 0.595000 val accuracy: 0.505000
hidden_size: 50 lr: 5.000000e-04 reg: 7.500000e-01 train accuracy: 0.495000 val accuracy: 0.483000
hidden_size: 50 lr: 5.000000e-04 reg: 1.000000e+00 train accuracy: 0.600000 val accuracy: 0.475000
hidden_size: 50 lr: 9.000000e-04 reg: 2.500000e-01 train accuracy: 0.670000 val accuracy: 0.496000
hidden_size: 50 lr: 9.000000e-04 reg: 5.000000e-01 train accuracy: 0.675000 val accuracy: 0.510000
hidden_size: 50 lr: 9.000000e-04 reg: 7.500000e-01 train accuracy: 0.610000 val accuracy: 0.485000
hidden_size: 50 lr: 9.000000e-04 reg: 1.000000e+00 train accuracy: 0.635000 val accuracy: 0.487000
hidden_size: 50 lr: 1.300000e-03 reg: 2.500000e-01 train accuracy: 0.700000 val accuracy: 0.486000
hidden_size: 50 lr: 1.300000e-03 reg: 5.000000e-01 train accuracy: 0.590000 val accuracy: 0.500000
hidden_size: 50 lr: 1.300000e-03 reg: 7.500000e-01 train accuracy: 0.615000 val accuracy: 0.466000
hidden_size: 50 lr: 1.300000e-03 reg: 1.000000e+00 train accuracy: 0.620000 val accuracy: 0.481000
hidden_size: 50 lr: 1.500000e-03 reg: 2.500000e-01 train accuracy: 0.660000 val accuracy: 0.485000
hidden_size: 50 lr: 1.500000e-03 reg: 5.000000e-01 train accuracy: 0.585000 val accuracy: 0.466000
hidden_size: 50 lr: 1.500000e-03 reg: 7.500000e-01 train accuracy: 0.655000 val accuracy: 0.483000
hidden_size: 50 lr: 1.500000e-03 reg: 1.000000e+00 train accuracy: 0.630000 val accuracy: 0.484000
hidden_size: 100 lr: 1.000000e-04 reg: 2.500000e-01 train accuracy: 0.325000 val accuracy: 0.413000
hidden_size: 100 lr: 1.000000e-04 reg: 5.000000e-01 train accuracy: 0.340000 val accuracy: 0.416000
hidden_size: 100 lr: 1.000000e-04 reg: 7.500000e-01 train accuracy: 0.375000 val accuracy: 0.421000
hidden_size: 100 lr: 1.000000e-04 reg: 1.000000e+00 train accuracy: 0.480000 val accuracy: 0.409000
hidden_size: 100 lr: 5.000000e-04 reg: 2.500000e-01 train accuracy: 0.605000 val accuracy: 0.496000
hidden_size: 100 lr: 5.000000e-04 reg: 5.000000e-01 train accuracy: 0.580000 val accuracy: 0.510000
hidden_size: 100 lr: 5.000000e-04 reg: 7.500000e-01 train accuracy: 0.605000 val accuracy: 0.496000
hidden_size: 100 lr: 5.000000e-04 reg: 1.000000e+00 train accuracy: 0.540000 val accuracy: 0.508000
hidden_size: 100 lr: 9.000000e-04 reg: 2.500000e-01 train accuracy: 0.720000 val accuracy: 0.509000
hidden_size: 100 lr: 9.000000e-04 reg: 5.000000e-01 train accuracy: 0.665000 val accuracy: 0.507000
hidden_size: 100 lr: 9.000000e-04 reg: 7.500000e-01 train accuracy: 0.630000 val accuracy: 0.512000
hidden_size: 100 lr: 9.000000e-04 reg: 1.000000e+00 train accuracy: 0.610000 val accuracy: 0.497000
hidden_size: 100 lr: 1.300000e-03 reg: 2.500000e-01 train accuracy: 0.720000 val accuracy: 0.495000
hidden_size: 100 lr: 1.300000e-03 reg: 5.000000e-01 train accuracy: 0.775000 val accuracy: 0.524000
hidden_size: 100 lr: 1.300000e-03 reg: 7.500000e-01 train accuracy: 0.640000 val accuracy: 0.503000
hidden_size: 100 lr: 1.300000e-03 reg: 1.000000e+00 train accuracy: 0.665000 val accuracy: 0.478000
hidden_size: 100 lr: 1.500000e-03 reg: 2.500000e-01 train accuracy: 0.650000 val accuracy: 0.516000
hidden_size: 100 lr: 1.500000e-03 reg: 5.000000e-01 train accuracy: 0.675000 val accuracy: 0.499000
hidden_size: 100 lr: 1.500000e-03 reg: 7.500000e-01 train accuracy: 0.635000 val accuracy: 0.493000
hidden_size: 100 lr: 1.500000e-03 reg: 1.000000e+00 train accuracy: 0.600000 val accuracy: 0.493000
hidden_size: 150 lr: 1.000000e-04 reg: 2.500000e-01 train accuracy: 0.410000 val accuracy: 0.420000
hidden_size: 150 lr: 1.000000e-04 reg: 5.000000e-01 train accuracy: 0.475000 val accuracy: 0.415000
hidden_size: 150 lr: 1.000000e-04 reg: 7.500000e-01 train accuracy: 0.385000 val accuracy: 0.422000
hidden_size: 150 lr: 1.000000e-04 reg: 1.000000e+00 train accuracy: 0.375000 val accuracy: 0.425000
hidden_size: 150 lr: 5.000000e-04 reg: 2.500000e-01 train accuracy: 0.605000 val accuracy: 0.524000
hidden_size: 150 lr: 5.000000e-04 reg: 5.000000e-01 train accuracy: 0.550000 val accuracy: 0.512000
hidden_size: 150 lr: 5.000000e-04 reg: 7.500000e-01 train accuracy: 0.565000 val accuracy: 0.511000
hidden_size: 150 lr: 5.000000e-04 reg: 1.000000e+00 train accuracy: 0.580000 val accuracy: 0.501000
hidden_size: 150 lr: 9.000000e-04 reg: 2.500000e-01 train accuracy: 0.635000 val accuracy: 0.526000
hidden_size: 150 lr: 9.000000e-04 reg: 5.000000e-01 train accuracy: 0.615000 val accuracy: 0.512000
hidden_size: 150 lr: 9.000000e-04 reg: 7.500000e-01 train accuracy: 0.655000 val accuracy: 0.523000
hidden_size: 150 lr: 9.000000e-04 reg: 1.000000e+00 train accuracy: 0.585000 val accuracy: 0.507000
hidden_size: 150 lr: 1.300000e-03 reg: 2.500000e-01 train accuracy: 0.745000 val accuracy: 0.519000
hidden_size: 150 lr: 1.300000e-03 reg: 5.000000e-01 train accuracy: 0.655000 val accuracy: 0.501000
hidden_size: 150 lr: 1.300000e-03 reg: 7.500000e-01 train accuracy: 0.625000 val accuracy: 0.530000
hidden_size: 150 lr: 1.300000e-03 reg: 1.000000e+00 train accuracy: 0.645000 val accuracy: 0.506000
hidden_size: 150 lr: 1.500000e-03 reg: 2.500000e-01 train accuracy: 0.660000 val accuracy: 0.478000
hidden_size: 150 lr: 1.500000e-03 reg: 5.000000e-01 train accuracy: 0.675000 val accuracy: 0.485000
hidden_size: 150 lr: 1.500000e-03 reg: 7.500000e-01 train accuracy: 0.670000 val accuracy: 0.512000
hidden_size: 150 lr: 1.500000e-03 reg: 1.000000e+00 train accuracy: 0.655000 val accuracy: 0.495000
Best hidden_size: 150
Best lr: 1.300000e-03
Best reg: 7.500000e-01
train accuracy: 0.625000
val accuracy: 0.530000
代码语言:javascript
复制
from vis_utils import visualize_grid

# Visualize the weights of the network
def show_net_weights(net):
    W1 = net.params['W1']
    W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
    plt.gca().axis('off')
    plt.show()

show_net_weights(net)

Run on the test set

代码语言:javascript
复制
test_acc = (best_net.predict(X_test) == y_test).mean()
print('Test accuracy: ', test_acc)
代码语言:javascript
复制
Test accuracy:  0.504

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018年02月11日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. Download the CIFAR10 datasets, and load it
    • Show some CIFAR10 images
      • Subsample the data for more efficient code execution
      • 2. Preprocessing
        • Reshape the images data into rows
          • Subtract the mean images
          • 3. Define a two-layer neural network
          • 4. Stochastic Gradient Descent
          • 5. Tune your hyperparameters
            • Run on the test set
            领券
            问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档