↑↑↑关注后"星标"炼丹笔记
炼丹笔记干货
作者:杰少
Kaggle GM分享自研特征重要性工具包--LOFO
简 介
LOFO是Kaggle GM自研的一种特征重要性绘制的方案,相较于其它的特征重要性方法,其特点在于:
LOFO和我们平时的建模策略思路是非常类似的,作者也是Kaggle GM,非常值得学习一下。
LOFO
01
基本思路
LOFO(Leave one Feature Out)的重要性通过:
02
基本步骤
LOFO的基本步骤为:
注意,如果我们不传入任何模型,LOFO默认运行的模型是LightGBM。
03
FastLOFO
因为枚举的关系,LOFO工具包会相对耗时间,如果希望快速得到特征重要性,可以使用Fast LOFO.
The permutations on a feature's values are done within groups, where groups are obtained by grouping the validation set by k=2 features. These k features are chosen at random n=10 times, and the mean and standard deviation of the FLOFO importance are calculated based on these n runs. The reason this grouping makes the measure of importance better is that permuting a feature's value is no longer completely random.
In fact, the permutations are done within groups of similar samples, so the permutations are equivalent to noising the samples. This ensures that:
Code
案例摘自:https://github.com/aerdem4/lofo-importance
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import KFold
from lofo import LOFOImportance, FLOFOImportance, Dataset, plot_importance
from data.test_data import generate_test_data, generate_unstructured_test_data
df = generate_test_data(1000)
df.head()
lr = LinearRegression()
lr.fit(df[["A", "B", "C", "D"]], df["target"])
fi = FLOFOImportance(lr, df, ["A", "B", "C", "D"], 'target', scoring="neg_mean_absolute_error")
importances = fi.get_importance()
importances
适用情况
LOFO目前适用于所有的模型,而且也十分符合我们的直观建模策略。
参考文献