前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >AutoGluon | 用三行代码战胜 90% 的模型

AutoGluon | 用三行代码战胜 90% 的模型

作者头像
生信菜鸟团
发布2021-05-24 10:29:49
8.2K1
发布2021-05-24 10:29:49
举报
文章被收录于专栏:生信菜鸟团生信菜鸟团

近年来,机器学习在各个领域都取得了重大突破,在生命科学、医学领域应用的也越来越多。但想要真正建立一个模型仍费时费力,要花相当一段时间去学习(可参考我之前写的相关笔记)。而且即使是高水平的人工智能专家,在大数据智能分析机器学习建模时,也主要依靠人工经验,建模过程费时费力,缺少有效方法。

为了解决这一突出问题,自动机器学习(AutoML)应运而生,即是将机器学习应用于现实问题的端到端过程自动化的过程。AutoML 使机器学习真正意义上成为可能,即使对于在该领域没有专业知识的人也是如此。

本文介绍的 AutoGluon 是去年亚马逊开源在 GitHub 上的 AutoML 解决方案,它以三年前亚马逊联合微软推出的 Gluon 深度学习库为基础,通过自动调整限制条件内的选择来实现决策的自动化,利用现有资源找到最优的模型。

❝GitHub 地址:https://github.com/awslabs/autogluon ❞

AutoGluon 兼具易用和扩展性,并专注于涵盖图像、文本或表格数据的深度学习和实际应用。AutoGluon 适用于机器学习初学者和专家,能够使他们:

  • 通过几行代码即可快速地为数据构建深度学习原型方案;
  • 利用自动超参数微调、模型选择/架构搜索和数据处理;
  • 无需专家知识即可自动使用深度学习 SOTA 方法;
  • 轻松地提升现有定制模型和数据管道,或者根据用例自定义 AutoGluon。

现在 AutoGluon 已经支持了以下一些应用:

  • 表格预测:基于数据表中一些列的值预测其他列的值;
  • 图像分类:识别图像中的主要对象;
  • 对象检测:借助图像中的边界框检测多个对象;
  • 文本分类:基于文本内容做出预测。

怎样使用 AutoGluon 呢?官方文档中提供了各种应用的示例代码,下面我节选了最常用的 Tablular Prediction。

❝官方文档:https://auto.gluon.ai/stable/index.html ❞

安装

Linux

  • CPU
代码语言:javascript
复制
python3 -m pip install -U pip
python3 -m pip install -U setuptools wheel
python3 -m pip install -U "mxnet<2.0.0"
python3 -m pip install autogluon
  • GPU
代码语言:javascript
复制
python3 -m pip install -U pip
python3 -m pip install -U setuptools wheel

# Here we assume CUDA 10.1 is installed.  You should change the number
# according to your own CUDA version (e.g. mxnet_cu100 for CUDA 10.0).
python3 -m pip install -U "mxnet_cu101<2.0.0"
python3 -m pip install autogluon

Mac

需先安装 XCode,Homebrew,LibOMP。若已经安装了 Homebrew,则可通过以下命令安装 LibOMP:

代码语言:javascript
复制
brew install libomp

代码语言:javascript
复制
python3 -m pip install -U pip
python3 -m pip install -U setuptools wheel
python3 -m pip install -U "mxnet<2.0.0"
python3 -m pip install autogluon

三行代码构建一个绝佳的模型

代码语言:javascript
复制
# 载入包
from autogluon.tabular import TabularDataset, TabularPredictor
# 载入训练数据
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
# 建模
predictor = TabularPredictor(label='class').fit(train_data, time_limit=120)  # Fit models for 120s
# 载入测试数据
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
# 查看模型性能排名
leaderboard = predictor.leaderboard(test_data)

基于表格数据预测

通过一个简单的 fit() 函数,AutoGluon 即可生成高度精确的模型,本例演示了如何使用 AutoGluon 生成一个分类模型来预测一个人的收入是否超过 50,000 美元。

首先,导入 AutoGluon 的 TabularPredictor 和 TabularDataset 类:

代码语言:javascript
复制
from autogluon.tabular import TabularDataset, TabularPredictor

将训练数据从 CSV 文件加载到 AutoGluon Dataset 对象中。这个对象实质上等同于 Pandas DataFrame。

代码语言:javascript
复制
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 500  # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

train_data 中的每一行都对应一个样本,每一列包含各种特征,我们将使用这些特征来预测收入。首先指定 'class' 列为标签:

代码语言:javascript
复制
label = 'class'
print("Summary of class variable: \n", train_data[label].describe())
代码语言:javascript
复制
Summary of class variable:
 count        500
unique         2
top        <=50K
freq         365
Name: class, dtype: object

接下来使用 AutoGluon 自动训练模型:

代码语言:javascript
复制
# 指定储存模型的文件夹
save_path = 'agModels-predictClass'
predictor = TabularPredictor(label=label, path=save_path).fit(train_data)
代码语言:javascript
复制
Beginning AutoGluon training ...
AutoGluon will save models to "agModels-predictClass/"
AutoGluon Version:  0.2.0b20210429
Train Data Rows:    500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    2 unique label values:  [' >50K', ' <=50K']
    If 'binary' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
    Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
    To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
    Available Memory:                    26825.14 MB
    Train Data (Original)  Memory Usage: 0.29 MB (0.0% of available memory)
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    Stage 1 Generators:
            Fitting AsTypeFeatureGenerator...
    Stage 2 Generators:
            Fitting FillNaFeatureGenerator...
    Stage 3 Generators:
            Fitting IdentityFeatureGenerator...
            Fitting CategoryFeatureGenerator...
                    Fitting CategoryMemoryMinimizeFeatureGenerator...
    Stage 4 Generators:
            Fitting DropUniqueFeatureGenerator...
    Types of features in original data (raw dtype, special dtypes):
            ('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
            ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    Types of features in processed data (raw dtype, special dtypes):
            ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
            ('int', [])      : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    0.1s = Fit runtime
    14 features in original data used to generate 14 features in processed data.
    Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.07s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    To change this, specify the eval_metric argument of fit()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
Fitting model: KNeighborsUnif ...
    0.73     = Validation accuracy score
    0.0s     = Training runtime
    0.1s     = Validation runtime
Fitting model: KNeighborsDist ...
    0.65     = Validation accuracy score
    0.0s     = Training runtime
    0.1s     = Validation runtime
Fitting model: LightGBMXT ...
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.7/site-packages/fsspec/__init__.py:47: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
  for spec in entry_points.get("fsspec.specs", []):
    0.83     = Validation accuracy score
    0.77s    = Training runtime
    0.01s    = Validation runtime
Fitting model: LightGBM ...
    0.85     = Validation accuracy score
    0.16s    = Training runtime
    0.01s    = Validation runtime
Fitting model: RandomForestGini ...
    0.84     = Validation accuracy score
    0.51s    = Training runtime
    0.11s    = Validation runtime
Fitting model: RandomForestEntr ...
    0.83     = Validation accuracy score
    0.5s     = Training runtime
    0.11s    = Validation runtime
Fitting model: CatBoost ...
    0.84     = Validation accuracy score
    0.39s    = Training runtime
    0.01s    = Validation runtime
Fitting model: ExtraTreesGini ...
    0.82     = Validation accuracy score
    0.5s     = Training runtime
    0.11s    = Validation runtime
Fitting model: ExtraTreesEntr ...
    0.82     = Validation accuracy score
    0.5s     = Training runtime
    0.11s    = Validation runtime
Fitting model: NeuralNetFastAI ...
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
No improvement since epoch 7: early stopping
    0.83     = Validation accuracy score
    1.15s    = Training runtime
    0.03s    = Validation runtime
Fitting model: XGBoost ...
    0.85     = Validation accuracy score
    0.17s    = Training runtime
    0.01s    = Validation runtime
Fitting model: NeuralNetMXNet ...
    0.84     = Validation accuracy score
    5.57s    = Training runtime
    0.02s    = Validation runtime
Fitting model: LightGBMLarge ...
    0.83     = Validation accuracy score
    0.37s    = Training runtime
    0.01s    = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
    0.85     = Validation accuracy score
    0.35s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 12.68s ...
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("agModels-predictClass/")

下一步,加载独立测试数据集:

代码语言:javascript
复制
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label]  # 提取标签列
test_data_nolab = test_data.drop(columns=[label])  # 删除标签列
test_data_nolab.head()
代码语言:javascript
复制
Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769

用训练好的模型进行预测,评估模型性能:

代码语言:javascript
复制
predictor = TabularPredictor.load(save_path)  # 非必要步骤,载入之前训练好的模型

y_pred = predictor.predict(test_data_nolab)
print("Predictions:  \n", y_pred)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)
代码语言:javascript
复制
Evaluation: accuracy on test data: 0.8397993653393387
Evaluations on test data:
{
    "accuracy": 0.8397993653393387,
    "balanced_accuracy": 0.7437076677780596,
    "mcc": 0.5295565206264157,
    "f1": 0.6242496998799519,
    "precision": 0.7038440714672441,
    "recall": 0.5608283002588438
}
代码语言:javascript
复制
Predictions:
 0        <=50K
1        <=50K
2         >50K
3        <=50K
4        <=50K
         ...
9764     <=50K
9765     <=50K
9766     <=50K
9767     <=50K
9768     <=50K
Name: class, Length: 9769, dtype: object

我们也可以输出所有模型在测试集中的表现:

代码语言:javascript
复制
predictor.leaderboard(test_data, silent=True)

总之,如果想在自己的数据集上使用 AutoGluon,训练一个牛逼的模型实际只需两行代码:

代码语言:javascript
复制
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label=<variable-name>).fit(train_data=<file-name>)

❝上面的例子仅仅使用了 fit() 的默认参数,在下一节中将演示如何通过调整 fit() 参数来最大化预测性能。 ❞

最大限度地提高预测性能

如果你正在对 AutoGluon-Tabular 进行基准测试或希望最大限度地提高模型准确性,其实不应使用 fit() 的默认参数。为了获得 AutoGluon 的最佳预测精度,你通常应该使用下面的几行命令:

代码语言:javascript
复制
time_limit = 60  # 这里设置 60 秒仅用于快速演示代码,在实际中应该设置为你愿意等待的最长时间(以秒为单位)
metric = 'roc_auc'  # 指定模型性能指标,这里为 AUC
predictor = TabularPredictor(label, eval_metric=metric).fit(train_data, time_limit=time_limit, presets='best_quality')
predictor.leaderboard(test_data, silent=True)
代码语言:javascript
复制
No path specified. Models will be saved in: "AutogluonModels/ag-20210429_010727/"
Presets specified: ['best_quality']
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "AutogluonModels/ag-20210429_010727/"
AutoGluon Version:  0.2.0b20210429
Train Data Rows:    500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    2 unique label values:  [' >50K', ' <=50K']
    If 'binary' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
    Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
    To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
    Available Memory:                    26582.18 MB
    Train Data (Original)  Memory Usage: 0.29 MB (0.0% of available memory)
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    Stage 1 Generators:
            Fitting AsTypeFeatureGenerator...
    Stage 2 Generators:
            Fitting FillNaFeatureGenerator...
    Stage 3 Generators:
            Fitting IdentityFeatureGenerator...
            Fitting CategoryFeatureGenerator...
                    Fitting CategoryMemoryMinimizeFeatureGenerator...
    Stage 4 Generators:
            Fitting DropUniqueFeatureGenerator...
    Types of features in original data (raw dtype, special dtypes):
            ('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
            ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    Types of features in processed data (raw dtype, special dtypes):
            ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
            ('int', [])      : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    0.1s = Fit runtime
    14 features in original data used to generate 14 features in processed data.
    Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.07s ...
AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
    This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
    To change this, specify the eval_metric argument of fit()
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 59.93s of the 59.92s of remaining time.
    0.5196   = Validation roc_auc score
    0.0s     = Training runtime
    0.1s     = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 59.82s of the 59.82s of remaining time.
    0.537    = Validation roc_auc score
    0.0s     = Training runtime
    0.1s     = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 59.71s of the 59.71s of remaining time.
    0.8814   = Validation roc_auc score
    0.9s     = Training runtime
    0.05s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 58.74s of the 58.73s of remaining time.
    0.867    = Validation roc_auc score
    0.91s    = Training runtime
    0.05s    = Validation runtime
Fitting model: RandomForestGini_BAG_L1 ... Training model for up to 57.75s of the 57.75s of remaining time.
    0.8847   = Validation roc_auc score
    0.5s     = Training runtime
    0.09s    = Validation runtime
Fitting model: RandomForestEntr_BAG_L1 ... Training model for up to 57.15s of the 57.14s of remaining time.
    0.8863   = Validation roc_auc score
    0.5s     = Training runtime
    0.09s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 56.54s of the 56.54s of remaining time.
    0.8875   = Validation roc_auc score
    2.93s    = Training runtime
    0.04s    = Validation runtime
Fitting model: ExtraTreesGini_BAG_L1 ... Training model for up to 53.55s of the 53.55s of remaining time.
    0.8929   = Validation roc_auc score
    0.6s     = Training runtime
    0.09s    = Validation runtime
Fitting model: ExtraTreesEntr_BAG_L1 ... Training model for up to 52.83s of the 52.83s of remaining time.
    0.8939   = Validation roc_auc score
    0.5s     = Training runtime
    0.09s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 52.22s of the 52.22s of remaining time.
    0.8653   = Validation roc_auc score
    4.69s    = Training runtime
    0.13s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 47.34s of the 47.33s of remaining time.
    0.8666   = Validation roc_auc score
    1.84s    = Training runtime
    0.03s    = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 45.41s of the 45.41s of remaining time.
    Ran out of time, stopping training early. (Stopping on epoch 95)
    0.8331   = Validation roc_auc score
    31.16s   = Training runtime
    0.18s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 14.02s of the 14.02s of remaining time.
    0.8417   = Validation roc_auc score
    1.7s     = Training runtime
    0.05s    = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.93s of the 12.24s of remaining time.
    0.9003   = Validation roc_auc score
    1.26s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 49.04s ...
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20210429_010727/")

这行命令正是用了下面的策略以最大限度地提高模型准确性:

  • 指定参数 presets='best_quality',这使 AutoGluon 可以基于 stacking/bagging 自动构建功能强大的集成模型,如果有足够的训练时间这将大大提高预测性能。这个参数的默认值为 'medium_quality_faster_train',它产生的模型准确度较低,但可以更快地建立一个模型,查看分类性能。总之,使用 presets 参数,我们可以根据用途灵活地建立模型。比如,如果你只是希望快速构建一个基本模型,并不非常注重预测性能,则可以考虑使用:presets=['good_quality_faster_inference_only_refit', 'optimize_for_deployment']
  • eval_metric 参数用于设置模型评估指标,例如:'f1'(二分类),'roc_auc'(二分类),'log_loss'(分类),'mean_absolute_error'(回归),'median_absolute_error'(回归)。除此之外,我们自定义指标函数,具体参见文件夹中的示例:autogluon/core/metrics/
  • 将所有数据都包含在 train_data 中,不额外分配tuning_data(AutoGluon 会更智能地拆分数据以满足其需求)。
  • 不指定 hyperparameter_tune_kwargs 参数(与直觉相反,超参数调整并非是在有限时间内训练模型的最佳方法,集成模型一般会更好)。如果你只是想部署单个模型而不是集成模型,则建议仅使用hyperparameter_tune_kwargs
  • 不指定 hyperparameters 参数(AutoGluon 将自适应选择要使用的模型和超参数)。
  • time_limit 设置的尽可能长。

回归(预测连续性变量)

为了演示 fit() 也可以自动处理回归任务,我们现在尝试根据其他特性预测同一数据集中的年龄变量:

代码语言:javascript
复制
age_column = 'age'
print("Summary of age variable: \n", train_data[age_column].describe())
代码语言:javascript
复制
Summary of age variable:
 count    500.00000
mean      39.65200
std       13.52393
min       17.00000
25%       29.00000
50%       38.00000
75%       49.00000
max       85.00000
Name: age, dtype: float64

我们再次调用 fit(),这次也加了一个时间限制:

代码语言:javascript
复制
predictor_age = TabularPredictor(label=age_column, path="agModels-predictAge").fit(train_data, time_limit=60)
performance = predictor_age.evaluate(test_data)
代码语言:javascript
复制
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "agModels-predictAge/"
AutoGluon Version:  0.2.0b20210429
Train Data Rows:    500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
    Label info (max, min, mean, stddev): (85, 17, 39.652, 13.52393)
    If 'regression' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
    Available Memory:                    26519.77 MB
    Train Data (Original)  Memory Usage: 0.32 MB (0.0% of available memory)
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    Stage 1 Generators:
            Fitting AsTypeFeatureGenerator...
    Stage 2 Generators:
            Fitting FillNaFeatureGenerator...
    Stage 3 Generators:
            Fitting IdentityFeatureGenerator...
            Fitting CategoryFeatureGenerator...
                    Fitting CategoryMemoryMinimizeFeatureGenerator...
    Stage 4 Generators:
            Fitting DropUniqueFeatureGenerator...
    Types of features in original data (raw dtype, special dtypes):
            ('int', [])    : 5 | ['fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
            ('object', []) : 9 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    Types of features in processed data (raw dtype, special dtypes):
            ('category', []) : 9 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
            ('int', [])      : 5 | ['fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
    0.1s = Fit runtime
    14 features in original data used to generate 14 features in processed data.
    Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.07s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
    To change this, specify the eval_metric argument of fit()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
Fitting model: KNeighborsUnif ... Training model for up to 59.93s of the 59.93s of remaining time.
    -15.6869         = Validation root_mean_squared_error score
    0.0s     = Training runtime
    0.1s     = Validation runtime
Fitting model: KNeighborsDist ... Training model for up to 59.82s of the 59.82s of remaining time.
    -15.1801         = Validation root_mean_squared_error score
    0.0s     = Training runtime
    0.1s     = Validation runtime
Fitting model: LightGBMXT ... Training model for up to 59.72s of the 59.72s of remaining time.
    -11.8147         = Validation root_mean_squared_error score
    0.21s    = Training runtime
    0.01s    = Validation runtime
Fitting model: LightGBM ... Training model for up to 59.49s of the 59.48s of remaining time.
    -11.9295         = Validation root_mean_squared_error score
    0.19s    = Training runtime
    0.01s    = Validation runtime
Fitting model: RandomForestMSE ... Training model for up to 59.28s of the 59.28s of remaining time.
    -11.6028         = Validation root_mean_squared_error score
    0.5s     = Training runtime
    0.11s    = Validation runtime
Fitting model: CatBoost ... Training model for up to 58.66s of the 58.66s of remaining time.
    -11.7448         = Validation root_mean_squared_error score
    0.34s    = Training runtime
    0.01s    = Validation runtime
Fitting model: ExtraTreesMSE ... Training model for up to 58.31s of the 58.31s of remaining time.
    -11.4808         = Validation root_mean_squared_error score
    0.5s     = Training runtime
    0.11s    = Validation runtime
Fitting model: NeuralNetFastAI ... Training model for up to 57.69s of the 57.68s of remaining time.
    -40.9923         = Validation root_mean_squared_error score
    0.97s    = Training runtime
    0.03s    = Validation runtime
Fitting model: XGBoost ... Training model for up to 56.68s of the 56.68s of remaining time.
    -12.1743         = Validation root_mean_squared_error score
    0.53s    = Training runtime
    0.01s    = Validation runtime
Fitting model: NeuralNetMXNet ... Training model for up to 56.12s of the 56.12s of remaining time.
    -12.8081         = Validation root_mean_squared_error score
    6.81s    = Training runtime
    0.03s    = Validation runtime
Fitting model: LightGBMLarge ... Training model for up to 49.27s of the 49.27s of remaining time.
    -12.1676         = Validation root_mean_squared_error score
    0.51s    = Training runtime
    0.01s    = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.93s of the 48.14s of remaining time.
    -11.3341         = Validation root_mean_squared_error score
    0.42s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 12.3s ...
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("agModels-predictAge/")
Evaluation: root_mean_squared_error on test data: -10.459079691277005
    Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
    "root_mean_squared_error": -10.459079691277005,
    "mean_squared_error": -109.39234798848308,
    "mean_absolute_error": -8.194572974525316,
    "r2": 0.4152766647850106,
    "pearsonr": 0.6454293362505439,
    "median_absolute_error": -6.8994293212890625
}

这里我们不需要告诉 AutoGluon 这是一个回归问题,它会从数据中自动进行推断,并报告适当的性能指标(默认情况下为 RMSE)。

查看每个模型的性能:

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-05-16,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信菜鸟团 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 安装
    • Linux
      • Mac
      • 三行代码构建一个绝佳的模型
      • 基于表格数据预测
        • 最大限度地提高预测性能
          • 回归(预测连续性变量)
          领券
          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档