我对在Python中使用scikit库非常陌生,我的scikit-learn版本是0.21.2
。我已经使用OneHotEncoder
对数据集中的分类变量进行了编码。
现在,我正在尝试使用给定的here和here代码,按照以下两个链接将编码的列链回到原始变量
import pandas as pd
import numpy as np
results = []
for i in range(enc.active_features_.shape[0]):
f = enc.active_features_[i]
index_range = np.extract(enc.feature_indices_ <= f, enc.feature_indices_)
s = len(index_range) - 1
f_index = index_range[-1]
f_label_decoded = f - f_index
results.append({
'label_decoded_value': f_label_decoded,
'coefficient': clf.coef_[0][i]
})
R = pd.DataFrame.from_records(results)
from sklearn import preprocessing
encoder = preprocessing.OneHotEncoder(categorical_features=[0,1,2])
X_train = encoder.fit_transform(data_train)
print encoder.feature_indices_
不幸的是,它一直抛出这些错误
'OneHotEncoder' object has no attribute '_active_features_'
'OneHotEncoder' object has no attribute '_feature_indices_'
我如何解决这些错误并使代码正常工作。
发布于 2019-10-30 21:48:25
我认为你提到的解决方案实际上使逻辑变得更加复杂。
get_feature_names()
对它来说就足够了。
示例:
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression
n_samples = 50
data = pd.DataFrame({'colors': np.random.choice(['red', 'blue', 'green'], n_samples),
'shapes': np.random.choice(['circle', 'square'], n_samples)})
y = np.random.choice(['apples', 'oranges'], n_samples)
enc = OneHotEncoder()
X = enc.fit_transform(data)
lr = LogisticRegression().fit(X, y)
pd.DataFrame({'feature_names': enc.get_feature_names(data.columns),
'coef': np.squeeze(lr.coef_)})
https://stackoverflow.com/questions/58625912
复制相似问题