首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何使用xgboost获得低选择功能的最高精度?

如何使用xgboost获得低选择功能的最高精度?
EN

Stack Overflow用户
提问于 2020-05-25 18:03:41
回答 2查看 702关注 0票数 0

我一直在寻找几种特征选择方法,并在以下链接( XGBoost,XGBoost特征的重要性与选择)的帮助下找到了关于特征选择的方法。我为我的案例实现了这个方法,结果如下:

  • Thresh= 0.000,n= 11,准确度: 55.56%
  • Thresh= 0.000,n= 11,准确度: 55.56%
  • Thresh= 0.000,n= 11,准确度: 55.56%
  • Thresh= 0.000,n= 11,准确度: 55.56%
  • Thresh= 0.097,n= 7,准确度: 55.56%
  • Thresh= 0.105,n= 6,准确度: 55.56%
  • Thresh= 0.110,n= 5,准确度: 50.00%
  • Thresh= 0.114,n= 4,准确度: 50.00%
  • Thresh= 0.169,n= 3,准确度: 44.44%
  • Thresh= 0.177,n= 2,准确度: 38.89%
  • Thresh= 0.228,n= 1,准确度: 33.33%

所以,我的问题是,在这种情况下,我如何选择高精度和低数量的特征n?代码可以在链接中找到。

编辑1:

多亏了@Mihai,我成功地让它在他的回答中使用了代码。我还有一个问题,假设我从链接中运行了代码,得到了以下信息:

代码语言:javascript
运行
复制
Feature Importance results = [29.205832   5.0182242  0.         0.         0. 6.7736177 16.704327  18.75632    9.529003  14.012676   0.       ]
Features = [ 0  7  6  9  8  5  1 10  4  3  2]
  • Thresh= 0.000,n= 11,准确度: 38.89%
  • Thresh= 0.000,n= 11,准确度: 38.89%
  • Thresh= 0.000,n= 11,准确度: 38.89%
  • Thresh= 0.000,n= 11,准确度: 38.89%
  • Thresh= 0.050,n= 7,准确度: 38.89%
  • Thresh= 0.068,n= 6,准确度: 38.89%
  • Thresh= 0.095,n= 5,准确度: 33.33%
  • Thresh= 0.140,n= 4,准确度: 38.89%
  • Thresh= 0.167,n= 3,准确度: 33.33%
  • Thresh= 0.188,n= 2,准确度: 38.89%
  • Thresh= 0.292,n= 1,准确度: 38.89%

如何删除那些给出零特征重要性的特征,并得到具有特征重要性值的特征?

附带问题:

  1. 我试图找到最好的特征选择,涉及到使用特定的分类模型和最佳的特征,以帮助提供高精度,例如,例如,使用KNN分类器,并希望找到最好的特征,提供高精度。哪些功能选择适合使用?
  2. 在实现多个分类模型时,是最好对每个分类模型进行特征选择,还是需要对每个分类模型进行一次特征选择,然后使用所选的特征对多个分类模型进行分类?
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-26 10:24:05

好吧,那么你链接中的那个人在做什么

代码语言:javascript
运行
复制
thresholds = sort(model.feature_importances_)
for thresh in thresholds:
    # select features using threshold
    selection = SelectFromModel(model, threshold=thresh, prefit=True)
    select_X_train = selection.transform(X_train)
    # train model
    selection_model = XGBClassifier()
    selection_model.fit(select_X_train, y_train)
    # eval model
    select_X_test = selection.transform(X_test)
    predictions = selection_model.predict(select_X_test)
    accuracy = accuracy_score(y_test, predictions)
    print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (thresh, select_X_train.shape[1], accuracy*100.0))

是创建一个已排序的阈值数组,然后为thresholds数组的每个元素训练thresholds

从你的问题中,我认为你只想选择第六种情况,一个特征数量最少,精确度最高的案例。在这种情况下,你会想做这样的事情:

代码语言:javascript
运行
复制
selection = SelectFromModel(model, threshold=threshold[5], prefit=True)
select_X_train = selection.transform(X_train)
selection_model = XGBClassifier()
selection_model.fit(select_X_train, y_train)
select_X_test = selection.transform(X_test)
predictions = selection_model.predict(select_X_test)
accuracy = accuracy_score(y_test, predictions)
print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (threshold[5], select_X_train.shape[1], accuracy*100.0))

如果您想要自动完成整个工作,那么您需要计算精度在循环中达到最大值的最小n,它看起来或多或少如下:

代码语言:javascript
运行
复制
n_min = *your maximum number of used features*
acc_max = 0
thresholds = sort(model.feature_importances_)
obj_thresh = thresholds[0]
for thresh in thresholds:
    selection = SelectFromModel(model, threshold=thresh, prefit=True)
    select_X_train = selection.transform(X_train)
    selection_model = XGBClassifier()
    selection_model.fit(select_X_train, y_train)
    select_X_test = selection.transform(X_test)
    predictions = selection_model.predict(select_X_test)
    accuracy = accuracy_score(y_test, predictions)
    if(select_X_train.shape[1] < n_min) and (accuracy > acc_max):
        n_min = select_X_train.shape[1]
        acc_max = accuracy
        obj_thresh = thresh

selection = SelectFromModel(model, threshold=obj_thresh, prefit=True)
select_X_train = selection.transform(X_train)
selection_model = XGBClassifier()
selection_model.fit(select_X_train, y_train)
select_X_test = selection.transform(X_test)
predictions = selection_model.predict(select_X_test)
accuracy = accuracy_score(y_test, predictions)
print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (obj_thresh, select_X_train.shape[1], accuracy*100.0))
票数 1
EN

Stack Overflow用户

发布于 2020-06-05 00:06:35

我设法解决了。请查找以下代码:

以获得最高精度的最低数量的特性:

代码语言:javascript
运行
复制
# Fit the model:
f_max = 8
f_min = 2
acc_max = accuracy
thresholds = np.sort(model_FS.feature_importances_)
obj_thresh = thresholds[0]
accuracy_list = []
for thresh in thresholds:
    # select features using threshold:
    selection = SelectFromModel(model_FS, threshold=thresh, prefit=True)
    select_X_train = selection.transform(X_train)
    # train model:
    selection_model = xgb.XGBClassifier()
    selection_model.fit(select_X_train, y_train)
    # eval model:
    select_X_test = selection.transform(X_test)
    selection_model_pred = selection_model.predict(select_X_test)
    selection_predictions = [round(value) for value in selection_model_pred]
    accuracy = accuracy_score(y_true=y_test, y_pred=selection_predictions)
    accuracy = accuracy * 100
    print('Thresh= %.3f, n= %d, Accuracy: %.2f%%' % (thresh, select_X_train.shape[1], accuracy))
    accuracy_list.append(accuracy)
    if(select_X_train.shape[1] < f_max) and (select_X_train.shape[1] >= f_min) and (accuracy >= acc_max):
        n_min = select_X_train.shape[1]
        acc_max = accuracy
        obj_thresh = thresh
# select features using threshold:
selection = SelectFromModel(model_FS, threshold=obj_thresh, prefit=True)
select_X_train = selection.transform(X_train)
# train model:
selection_model = xgb.XGBClassifier()
selection_model.fit(select_X_train, y_train)
# eval model:
select_X_test = selection.transform(X_test)
selection_model_pred = selection_model.predict(select_X_test)
accuracy = accuracy_score(y_test, predictions)
selection_predictions = [round(value) for value in selection_model_pred]
accuracy = accuracy_score(y_true=y_test, y_pred=selection_predictions)
print("Selected: Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (obj_thresh, select_X_train.shape[1], accuracy*100.0))
key_list = list(range(X_train.shape[1], 0, -1))
accuracy_dict = dict(zip(key_list, accuracy_list))
optimum_num_feat = n_min
print(optimum_num_feat)

# Printing out the features:
X_train = X_train.iloc[:, optimum_number_features]
X_test = X_test.iloc[:, optimum_number_features]

print('X Train FI: ')
print(X_train)
print('X Test FI: ')
print(X_test)

以获得不带零重要性值的特性:

代码语言:javascript
运行
复制
# Calculate feature importances
importances = model_FS.feature_importances_
print((model_FS.feature_importances_) * 100)

# Organising the feature importance in dictionary:
## The key value depends on your maximum number of features:
key_list = range(0, 11, 1)
feature_importance_dict = dict(zip(key_list, importances))
sort_feature_importance_dict = dict(sorted(feature_importance_dict.items(), key=lambda x: x[1], reverse=True))
print('Feature Importnace Dictionary (Sorted): ', sort_feature_importance_dict)

# Removing the features that have value zero in feature importance:
filtered_feature_importance_dict = {x:y for x,y in sort_feature_importance_dict.items() if y!=0}
print('Filtered Feature Importnace Dictionary: ', filtered_feature_importance_dict)
f_indices = list(filtered_feature_importance_dict.keys())
f_indices = np.asarray(f_indices)
print(f_indices)

X_train = X_train.loc[:, f_indices]
X_test = X_test.loc[:, f_indices]

print('X Train FI: ')
print(X_train)
print('X Test FI: ')
print(X_test)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62008129

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档