首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Optuna/LightGBM的精度测量

Optuna/LightGBM的精度测量
EN

Stack Overflow用户
提问于 2022-09-29 11:18:53
回答 1查看 136关注 0票数 0

尝试使用Optuna来调优我的LightGBM模型的超参数。然而,我不想返回日志丢失,我只想返回平均准确性在每次试验。

如果我添加print('Testing accuracy {:.4f}'.format(model.score(X_test,y_test))),我可以在迭代中看到每个CV折叠的准确性,但是我希望模型平均它们并返回这个度量,而不是日志损失值。这个是可能的吗?

这是我目前的代码:

代码语言:javascript
运行
复制
def objective(trial, X, y):
    param_grid = {
        # "device_type": trial.suggest_categorical("device_type", ['gpu']),
        "n_estimators": trial.suggest_categorical("n_estimators", [10000]),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.05),
        "num_leaves": trial.suggest_int("num_leaves", 2, 50, step=2),
        "max_depth": trial.suggest_int("max_depth", 1, 5),
        "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 5, 100, step=5),
        "lambda_l1": trial.suggest_int("lambda_l1", 0, 100, step=5),
        "lambda_l2": trial.suggest_int("lambda_l2", 0, 100, step=5),
        "min_gain_to_split": trial.suggest_float("min_gain_to_split", 0, 15),
        "bagging_fraction": trial.suggest_float(
            "bagging_fraction", 0.2, 0.90, step=0.1
        ),
        "bagging_freq": trial.suggest_categorical("bagging_freq", [1]),
        "feature_fraction": trial.suggest_float(
            "feature_fraction", 0.2, 0.90, step=0.1
        ),
    }

    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1121218)

    cv_scores = np.empty(5)

    for idx, (train_idx, test_idx) in enumerate(cv.split(X, y)):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]

        model = lg.LGBMClassifier(objective="multiclass",num_classes=3, **param_grid)
        model.fit(
            X_train,
            y_train,
            eval_set=[(X_test, y_test)],
            eval_metric="multi_logloss",
            early_stopping_rounds=100,
            callbacks=[
                LightGBMPruningCallback(trial, "multi_logloss")
            ],  # Add a pruning callback
        )
        preds = model.predict_proba(X_test)
        cv_scores[idx] = log_loss(y_test, preds)
        print('Testing accuracy {:.4f}'.format(model.score(X_test,y_test)))


    return np.mean(cv_scores)

study = optuna.create_study(direction="minimize", study_name="LGBM Classifier")
func = lambda trial: objective(trial, X_train, y_train)
study.optimize(func, n_trials=20)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-09-29 22:18:02

  1. Replace

代码语言:javascript
运行
复制
cv_scores[idx] = log_loss(y_test, preds)

使用

代码语言:javascript
运行
复制
cv_scores[idx] = accuracy_score(y_test, preds))

  1. 将方向改为direction="maximize",因为您希望最大限度地提高准确性,而不是minimize,就像log_loss那样。也可以返回负值-accuracy并将方向设置为-accuracy

  1. 您需要确保optuna.integration.LightGBMPruningCallback的度量与研究方向一致。

完整的例子:

代码语言:javascript
运行
复制
import lightgbm as lg
import numpy as np
import optuna
from optuna.integration import LightGBMPruningCallback
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
from sklearn.model_selection import StratifiedKFold


def objective(trial, X, y):
    param_grid = {
        # "device_type": trial.suggest_categorical("device_type", ['gpu']),
        "n_estimators": trial.suggest_categorical("n_estimators", [3, 10, 20]),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.05),
        "num_leaves": trial.suggest_int("num_leaves", 2, 50, step=2),
        "max_depth": trial.suggest_int("max_depth", 1, 5),
        "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 5, 100, step=5),
        "lambda_l1": trial.suggest_int("lambda_l1", 0, 100, step=5),
        "lambda_l2": trial.suggest_int("lambda_l2", 0, 100, step=5),
        "min_gain_to_split": trial.suggest_float("min_gain_to_split", 0, 15),
        "bagging_fraction": trial.suggest_float(
            "bagging_fraction", 0.2, 0.90, step=0.1
        ),
        "bagging_freq": trial.suggest_categorical("bagging_freq", [1]),
        "feature_fraction": trial.suggest_float(
            "feature_fraction", 0.2, 0.90, step=0.1
        ),
    }

    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1121218)
    cv_scores = np.empty(5)

    for idx, (train_idx, test_idx) in enumerate(cv.split(X, y)):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]

        model = lg.LGBMClassifier(objective="multiclass", num_classes=10, **param_grid)
        model.fit(
            X_train,
            y_train,
            eval_set=[(X_test, y_test)],
            eval_metric="auc_mu",
            early_stopping_rounds=100,
            callbacks=[
                LightGBMPruningCallback(trial, "auc_mu")
            ],  # Add a pruning callback
        )
        preds = model.predict(X_test)
        cv_scores[idx] = accuracy_score(y_test, preds)
        print("Testing accuracy {:.4f}".format(cv_scores[idx]))

    return np.mean(cv_scores)


X, y = load_digits(return_X_y=True)
study = optuna.create_study(direction="maximize", study_name="LGBM Classifier")
func = lambda trial: objective(trial, X, y)
study.optimize(func, n_trials=20)

或者你可以替换这些参数:

LightGBMPruningCallback(metric=auc_mu);direction=maximize

  • 返回accuracy

使用

LightGBMPruningCallback(metric=multi_error);direction=minimize

  • 返回-accuracy

此外,您还可以在这里找到官方示例:https://github.com/optuna/optuna-examples/blob/main/lightgbm/lightgbm_integration.py

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73894656

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档