作者 | Thomas Ciha
译者 | 刘旭坤
编辑 | Jane
出品 | AI科技大本营
【导读】一般来说机器学习模型的优化没什么捷径可循。用什么架构,选择什么优化算法和参数既取决于我们对数据集的理解,也要不断地试错和修正。所以快速构建和测试模型的能力对于项目的推进就显得至关重要了。本文我们就来构建一条生产模型的流水线,帮助大家实现参数的快速优化。
对深度学习模型来说,有下面这几个可控的参数:
隐藏层的个数
各层节点的数量
激活函数
优化算法
学习效率
正则化的方法
正则化的参数
我们先把这些参数都写到一个存储模型参数信息的字典 model_info 中:
1model_info = {}
2model_info['Hidden layers'] = [100] *6
3model_info['Input size'] = og_one_hot.shape[1] -1
4model_info['Activations'] = ['relu'] *6
5model_info['Optimization'] ='adadelta'
6model_info["Learning rate"] =.005
7model_info["Batch size"] =32
8model_info["Preprocessing"] ='Standard'
9model_info["Lambda"] =
10model_2['Regularization'] ='l2'
11model_2['Reg param'] =0.0005
这里我们想实现对数据集的二元分类,大家可以从下面的链接中下载CSV格式的数据文件。
https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
了解一个数据集最直观的方法就是把数据用可视化的方法呈现出来,降维方法我用了 PCA 和 t-SNE,不过从下面图片中看来,t-SNE 能实现数据的最大区分。(其实我个人认为处理数据用 scikit-learn 带的 StandardScaler 就挺好)
接下来我们就可以用 model_info 中的参数来构建一个深度学习模型。下面这个 build_nn 函数根据输入的 model_info 中的参数构建,并返回一个深度学习模型:
1defbuild_nn(model_info):
2"""
3This function builds and compiles a NN given a hash table of the model's parameters.
4:param model_info:
5:return:
6"""
7
8try:
9ifmodel_info["Regularization"] =="l2":# if we're using L2 regularization
10lambda_ = model_info['Reg param']# get lambda parameter
11batch_norm, keep_prob =False,False# set other regularization tactics
12
13elifmodel_info['Regularization'] =='Batch norm':# batch normalization regularization
14lambda_ =
15batch_norm = model_info['Reg param']# get param
16keep_prob =False
17ifbatch_normnotin['before','after']:# ensure we have a valid reg param
18raiseValueError
19
20elifmodel_info['Regularization'] =='Dropout':# Dropout regularization
21lambda_, batch_norm =,False
22keep_prob = model_info['Reg param']
23except:
24lambda_, batch_norm, keep_prob =,False,False# if no regularization is being used
25
26hidden, acts = model_info['Hidden layers'], model_info['Activations']
27model = Sequential(name=model_info['Name'])
28model.add(InputLayer((model_info['Input size'],)))# create input layer
29first_hidden =True
30
31forlay, act, iinzip(hidden, acts, range(len(hidden))):# create all the hidden layers
32iflambda_ >:# if we're doing L2 regularization
33ifnotfirst_hidden:
34model.add(Dense(lay, activation=act, W_regularizer=l2(lambda_), input_shape=(hidden[i -1],)))# add additional layers
35else:
36model.add(Dense(lay, activation=act, W_regularizer=l2(lambda_), input_shape=(model_info['Input size'],)))
37first_hidden =False
38else:# if we're not regularizing
39ifnotfirst_hidden:
40model.add(Dense(lay, input_shape=(hidden[i-1], )))# add un-regularized layers
41else:
42model.add(Dense(lay, input_shape=(model_info['Input size'],)))# if its first layer, connect it to the input layer
43first_hidden =False
44
45ifbatch_norm =='before':
46model.add(BatchNormalization(input_shape=(lay,)))# add batch normalization layer
47
48model.add(Activation(act))# activation layer is part of the hidden layer
49
50ifbatch_norm =='after':
51model.add(BatchNormalization(input_shape=(lay,)))# add batch normalization layer
52
53ifkeep_prob:
54model.add(Dropout(keep_prob, input_shape=(lay,)))# dropout layer
55
56# --------- Adding Output Layer -------------
57model.add(Dense(1, input_shape=(hidden[-1], )))# add output layer
58ifbatch_norm =='before':# if we're using batch norm regularization
59model.add(BatchNormalization(input_shape=(hidden[-1],)))
60model.add(Activation('sigmoid'))# apply output layer activation
61ifbatch_norm =='after':
62model.add(BatchNormalization(input_shape=(hidden[-1],)))# adding batch norm layer
63
64ifmodel_info['Optimization'] =='adagrad':# setting an optimization method
65opt = optimizers.Adagrad(lr = model_info["Learning rate"])
66elifmodel_info['Optimization'] =='rmsprop':
67opt = optimizers.RMSprop(lr = model_info["Learning rate"])
68elifmodel_info['Optimization'] =='adadelta':
69opt = optimizers.Adadelta()
70elifmodel_info['Optimization'] =='adamax':
71opt = optimizers.Adamax(lr = model_info["Learning rate"])
72else:
73opt = optimizers.Nadam(lr = model_info["Learning rate"])
74model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])# compile model
75
76returnmodel
有了这个 build_nn 函数我们就可以传不同的 model_info 给它,从而快速创建模型。下面我用了五个不同的隐藏层数目来实验不同模型架构的分类效果。
1defcreate_five_nns(input_size, hidden_size, act = None):
2"""
3Creates 5 neural networks to be used as a baseline in determining the influence model depth & width has on performance.
4:param input_size: input layer size
5:param hidden_size: list of hidden layer sizes
6:param act: activation function to use for each layer
7:return: list of model_info hash tables
8"""
9act = ['relu']ifnotactelse[act]# default activation = 'relu'
10nns = []# list of model info hash tables
11model_info = {}# hash tables storing model information
12model_info['Hidden layers'] = [hidden_size]
13model_info['Input size'] = input_size
14model_info['Activations'] = act
15model_info['Optimization'] ='adadelta'
16model_info["Learning rate"] =.005
17model_info["Batch size"] =32
18model_info["Preprocessing"] ='Standard'
19model_info2, model_info3, model_info4, model_info5 = model_info.copy(), model_info.copy(), model_info.copy(), model_info.copy()
20
21model_info["Name"] ='Shallow NN'# build shallow nn
22nns.append(model_info)
23
24model_info2['Hidden layers'] = [hidden_size] *3# build medium nn
25model_info2['Activations'] = act *3
26model_info2["Name"] ='Medium NN'
27nns.append(model_info2)
28
29model_info3['Hidden layers'] = [hidden_size] *6# build deep nn
30model_info3['Activations'] = act *6
31model_info3["Name"] ='Deep NN 1'
32nns.append(model_info3)
33
34model_info4['Hidden layers'] = [hidden_size] *11# build really deep nn
35model_info4['Activations'] = act *11
36model_info4["Name"] ='Deep NN 2'
37nns.append(model_info4)
38
39model_info5['Hidden layers'] = [hidden_size] *20# build realllllly deep nn
40model_info5['Activations'] = act *20
41model_info5["Name"] ='Deep NN 3'
42nns.append(model_info5)
43returnnns
可能是因为我们的数据比较非线性,我发现隐藏层的数量和节点个数与测试的结果成正比,隐藏层越多效果越好。这里每组参数构建出的模型我都用了五折交叉验证。五折交叉验证简单说就是说把数据集分成五份,四份用来训练模型,一份用来测试模型。这样轮换测试五次,五份中每一份都会当一次测试数据。然后我们取这五次测试结果的均值作为这个模型的测试结果。这里我们测试了正确率和 AUC,测试结果如下图:
如果嫌交叉验证费时间,但是数据够用的话,我们也可以像下面的代码这样直接把数据集分成训练和测试两个子数据集:
1defquick_nn_test(model_info, data_dict, save_path):
2model = build_nn(model_info)# use model info to build and compile a nn
3stop = EarlyStopping(patience=5, monitor='acc', verbose=1)# maintain a max accuracy for a sliding window of 5 epochs. If we cannot breach max accuracy after 15 epochs, cut model off and move on.
4tensorboard_path =save_path + model_info['Name']# create path for tensorboard callback
5tensorboard = TensorBoard(log_dir=tensorboard_path, histogram_freq=, write_graph=True, write_images=True)# create tensorboard callback
6save_model = ModelCheckpoint(filepath= save_path + model_info['Name'] +'\\'+ model_info['Name'] +'_saved_'+'.h5')# save model after every epoch
7
8
9model.fit(data_dict['Training data'], data_dict['Training labels'], epochs=150,# fit model
10batch_size=model_info['Batch size'], callbacks=[save_model, stop, tensorboard])# evaluate train accuracy
11train_acc = model.evaluate(data_dict['Training data'], data_dict['Training labels'],
12batch_size=model_info['Batch size'], verbose =)
13test_acc = model.evaluate(data_dict['Test data'], data_dict['Test labels'],# evaluate test accuracy
14batch_size=model_info['Batch size'], verbose =)
15
16
17# Get Train AUC
18y_pred = model.predict(data_dict['Training data']).ravel()# predict on training data
19fpr, tpr, thresholds = roc_curve(data_dict['Training labels'], y_pred)# compute fpr and tpr
20auc_train = auc(fpr, tpr)# compute AUC metric
21# Get Test AUC
22y_pred = model.predict(data_dict['Test data']).ravel()# same as above with test data
23fpr, tpr, thresholds = roc_curve(data_dict['Test labels'], y_pred)# compute AUC
24auc_test = auc(fpr, tpr)
25
26
27returntrain_acc, test_acc, auc_train, auc_test
有的书上可能会讲到用网格搜索来实现超参数的优化,但网格搜索其实就是穷举法,现实中是很少能用到的。我们更常会用到的是优化思路:由粗到精,逐步收窄最优参数的范围。
1"""This section of code allows us to create and test many neural networks and save the results of a quick
2test into a CSV file. Once that CSV file has been created, we will continue to add results onto the existing
3file."""
4
5rapid_testing_path ='YOUR PATH HERE'
6data_path ='YOUR DATA PATH'
7
8try:# try to load existing csv
9rapid_mlp_results = pd.read_csv(rapid_testing_path +'Results.csv')
10index = rapid_mlp_results.shape[1]
11except:# if no csv exists yet, create a DF
12rapid_mlp_results = pd.DataFrame(columns=['Model','Train Accuracy','Test Accuracy','Train AUC','Test AUC',
13'Preprocessing','Batch size','Learn Rate','Optimization','Activations',
14'Hidden layers','Regularization'])
15index =
16
17og_one_hot = np.array(pd.read_csv(data_path))# load one hot data
18
19model_info = {}# create model_info dicts for all the models we want to test
20model_info['Hidden layers'] = [100] *6# specifies the number of hidden units per layer
21model_info['Input size'] = og_one_hot.shape[1] -1# input data size
22model_info['Activations'] = ['relu'] *6# activation function for each layer
23model_info['Optimization'] ='adadelta'# optimization method
24model_info["Learning rate"] =.005# learning rate for optimization method
25model_info["Batch size"] =32
26model_info["Preprocessing"] ='Standard'# specifies the preprocessing method to be used
27
28model_0 = model_info.copy()# create model 0
29model_0['Name'] ='Model0'
30
31model_1 = model_info.copy()# create model 1
32model_1['Hidden layers'] = [110] *3
33model_1['Name'] ='Model1'
34
35model_2 = model_info.copy()# try best model so far with several regularization parameter values
36model_2['Hidden layers'] = [110] *6
37model_2['Name'] ='Model2'
38model_2['Regularization'] ='l2'
39model_2['Reg param'] =0.0005
40
41model_3 = model_info.copy()
42model_3['Hidden layers'] = [110] *6
43model_3['Name'] ='Model3'
44model_3['Regularization'] ='l2'
45model_3['Reg param'] =0.05
46
47# .... create more models ....
48
49#-------------- REGULARIZATION OPTIONS -------------
50# L2 Regularization: Regularization: 'l2', Reg param: lambda value
51# Dropout: Regularization: 'Dropout', Reg param: keep_prob
52# Batch normalization: Regularization: 'Batch norm', Reg param: 'before' or 'after'
53
54
55models = [model_0, model_1, model_2]# make a list of model_info hash tables
56
57column_list = ['Model','Train Accuracy','Test Accuracy','Train AUC','Test AUC','Preprocessing',
58'Batch size','Learn Rate','Optimization','Activations','Hidden layers',
59'Regularization','Reg Param']
60
61formodelinmodels:# for each model_info in list of models to test, test model and record results
62train_data, labels = preprocess_data(og_one_hot, model['Preprocessing'],True)# preprocess raw data
63data_dict = split_data(0.9,, np.concatenate((train_data, labels.reshape(29999,1)), axis=1))# split data
64train_acc, test_acc, auc_train, auc_test = quick_nn_test(model, data_dict, save_path=rapid_testing_path)# quickly assess model
65
66try:
67reg = model['Regularization']# set regularization parameters if given
68reg_param = model['Reg param']
69except:
70reg ="None"# else set NULL params
71reg_param ='NA'
72
73val_lis = [model['Name'], train_acc[1], test_acc[1], auc_train, auc_test, model['Preprocessing'],
74model["Batch size"], model["Learning rate"], model["Optimization"], str(model["Activations"]),
75str(model["Hidden layers"]), reg, reg_param]
76
77df_dict = {}
78forcol, valinzip(column_list, val_lis):# create df dict to append to csv file
79df_dict[col] = val
80
81df = pd.DataFrame(df_dict, index=[index])
82rapid_mlp_results = rapid_mlp_results.append(df, ignore_index=False)
83rapid_mlp_results.to_csv(rapid_testing_path +"Results.csv", index=False)
我们先要有一个大致的优化方向和参数的大致范围。这样我们才能在范围内进行参数的随机抽样,然后根据结果进一步收窄参数的范围。下面的代码就在生成模型(其实是用于生成模型的 model_info 字典)的过程中加入了一些随机数:
1defgenerate_random_model():
2optimization_methods = ['adagrad','rmsprop','adadelta','adam','adamax','nadam']# possible optimization methods
3activation_functions = ['sigmoid','relu','tanh']# possible activation functions
4batch_sizes = [16,32,64,128,256,512]# possible batch sizes
5range_hidden_units = range(5,250)# range of possible hidden units
6model_info = {}# create hash table
7same_units = np.random.choice([,1], p=[1/5,4/5])# dictates whether all hidden layers will have the same number of units
8same_act_fun = np.random.choice([,1], p=[1/10,9/10])# will each hidden layer have the same activation function?
9really_deep = np.random.rand()
10range_layers = range(1,10)ifreally_deep
11num_layers = np.random.choice(range_layers, p=[.1,.2,.2,.2,.05,.05,.05,.1,.05])ifreally_deep
12model_info["Activations"] = [np.random.choice(activation_functions, p = [0.25,0.5,0.25])] * num_layersifsame_act_funelse[np.random.choice(activation_functions, p = [0.25,0.5,0.25])for_inrange(num_layers)]# choose activation functions
13model_info["Hidden layers"] = [np.random.choice(range_hidden_units)] * num_layersifsame_unitselse[np.random.choice(range_hidden_units)for_inrange(num_layers)]# create hidden layers
14model_info["Optimization"] = np.random.choice(optimization_methods)# choose an optimization method at random
15model_info["Batch size"] = np.random.choice(batch_sizes)# choose batch size
16model_info["Learning rate"] =10** (-4* np.random.rand())# choose a learning rate on a logarithmic scale
17model_info["Training threshold"] =0.5# set threshold for training
18returnmodel_info
到这里将我们快速优化的思路总结成八个大字就是:自动建模,逐步收窄。自动建模是通过 build_nn 这个函数实现的,逐步收窄则是通过参数区间的判断和随机抽样实现的。只要掌握好这个思路,相信大家都能实现对机器学习尤其是深度学习模型参数的快速优化。
https://towardsdatascience.com/how-to-rapidly-test-dozens-of-deep-learning-models-in-python-cb839b518531
【完】
2018 AI开发者大会
只讲技术,拒绝空谈
2018 AI开发者大会是一场由中美人工智能技术高手联袂打造的AI技术与产业的年度盛会!是一场以技术落地为导向的干货会议!大会设置了10场技术专题论坛,力邀15+硅谷实力讲师团和80+AI领军企业技术核心人物,多位一线经验大咖带你将AI从云端落地。
即刻购票,可享5折优惠票价,10月12日开启8折购票通道。
领取专属 10元无门槛券
私享最新 技术干货