我正在读有关列转换器的scikitlearn教程。给定的示(https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html#sklearn.compose.make_column_selector)工作,但当我尝试只选择几列时,它给出了错误。
MWE
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector
df = sns.load_dataset('tips')
mycols = ['tip','sex']
ct = make_column_transformer(make_column_selector(pattern=mycols)
ct.fit_transform(df)必填项
我只希望在输出中包含选择列。
注意
当然,我知道我能做到df[mycols],我正在寻找scikit学习管道的例子。
发布于 2020-06-17 03:50:21
如果你不介意的话mlxtend,它有内置的变压器。
使用mlxtend
from mlxtend.feature_selection import ColumnSelector
pipe = ColumnSelector(mycols)
pipe.fit_transform(df)使用sklearn
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
class FeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, columns):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return X[self.columns]
pipeline = Pipeline([('selector', FeatureSelector(columns=mycols))
])
pipeline.fit_transform(df)[:5]发布于 2021-02-26 01:19:18
我可能来晚了一点,但您也可以使用sklearn's来选择列ColumnTranformer()通过将转换器设置为"passthrough“,并remainder='drop':
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
pipe = Pipeline([
("selector", ColumnTransformer([
("selector", "passthrough", mycols)
], remainder="drop"))
])https://stackoverflow.com/questions/62416223
复制相似问题