1# naive grid search implementation
 2from sklearn.datasets import load_iris
 3from sklearn.svm import SVC
 4from sklearn.model_selection import train_test_split
 5iris = load_iris()
 6X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
 7print("Size of training set: %d   size of test set: %d" % (X_train.shape[0], X_test.shape[0]))
 8best_score = 0
 9for gamma in [0.001, 0.01, 0.1, 1, 10, 100]:
10    for C in [0.001, 0.01, 0.1, 1, 10, 100]:
11        # for each combination of parameters
12        # train an SVC
13        svm = SVC(gamma=gamma, C=C)
14        svm.fit(X_train, y_train)
15        # evaluate the SVC on the test set 
16        score = svm.score(X_test, y_test)
17        # if we got a better score, store the score and parameters
18        if score > best_score:
19            best_score = score
20            best_parameters = {'C': C, 'gamma': gamma}
21print("best score: ", best_score)
22print("best parameters: ", best_parameters)

output：
Size of training set: 112   size of test set: 38
best score:  0.973684210526
best parameters:  {'C': 100, 'gamma': 0.001}

2. 构建字典暴力检索：

网格搜索的结果获得了指定的最优参数值，c为1

 1from sklearn.svm import SVC
 2from sklearn.model_selection import GridSearchCV
 3pipe_svc = Pipeline([('scl', StandardScaler()),
 4            ('clf', SVC(random_state=1))])
 5param_range = [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]
 6param_grid = [{'clf__C': param_range, 
 7               'clf__kernel': ['linear']},
 8                 {'clf__C': param_range, 
 9                  'clf__gamma': param_range, 
10                  'clf__kernel': ['rbf']}]
11gs = GridSearchCV(estimator=pipe_svc, 
12                  param_grid=param_grid, 
13                  scoring='accuracy', 
14                  cv=10,
15                  n_jobs=-1)
16gs = gs.fit(X_train, y_train)
17print(gs.best_score_)
18print(gs.best_params_)

output：
0.978021978022
{'clf__C': 0.1, 'clf__kernel': 'linear'}

GridSearchCV中param_grid参数是字典构成的列表。对于线性SVM，我们只评估参数C；对于RBF核SVM，我们评估C和gamma。最后，我们通过best_parmas_得到最优参数组合。

接着，我们直接利用最优参数建模(best_estimator_)：

1clf = gs.best_estimator_
2clf.fit(X_train, y_train)
3print('Test accuracy: %.3f' % clf.score(X_test, y_test))

网格搜索虽然不错，但是穷举过于耗时，sklearn中还实现了随机搜索，使用 RandomizedSearchCV类，随机采样出不同的参数组合。

3. 参考文献

1. Python机器学习库sklearn网格搜索与交叉验证

https://blog.csdn.net/cymy001/article/details/78578665

2. python机器学习库sklearn——参数优化（网格搜索GridSearchCV、随机搜索RandomizedSearchCV、hyperopt）

https://blog.csdn.net/luanpeng825485697/article/details/79831703

—End—

Machine Learning-模型评估与调参

Machine Learning-模型评估与调参 ——网格搜索

1. Python机器学习库sklearn网格搜索与交叉验证

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐