问在机器学习中，有没有一种方法可以归因于缺失值？
EN

Stack Overflow用户

提问于 2018-04-16 18:06:07

回答 1查看 2.8K关注 0票数 6

对于个人知识，我一直在尝试不同的估算方法，而不是均值/中值/模式。到目前为止，我尝试了KNN，MICE，中间估计方法。有人告诉我，也可以通过聚类方法进行归罪，我在互联网上搜索了一个包，发现了一些研究论文。

我在Iris数据集上运行这些估算方法，方法是在其中创建缺失值(因为Iris没有缺失值)。对于其他方法，我的方法如下：

data = pd.read_csv("D:/Iris_classification/train.csv")

#Shuffle the data and reset the index
from sklearn.utils import shuffle
data = shuffle(data).reset_index(drop = True)  

#Create Independent and dependent matrices
X = data.iloc[:, [0, 1, 2, 3]].values 
y = data.iloc[:, 4].values

#train_test_split
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 50, random_state = 0)

#Standardize the data
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

#Impute missing values at random
prop = int(X_train.size * 0.5) #Set the % of values to be replaced
prop1 = int(X_test.size * 0.5)

a = [random.choice(range(X_train.shape[0])) for _ in range(prop)] #Randomly choose indices of the numpy array
b = [random.choice(range(X_train.shape[1])) for _ in range(prop)]

X1_train[a, b] = np.NaN
X1_test[c, d] = np.NaN

然后对于KNN推算，我已经做了

X_train_filled = KNN(3).complete(X_train)
X_test_filled = KNN(3).complete(X_test

有没有一种方法可以通过聚类方法来计算缺失值？此外，当其中有NaN值时，StandardScaler()也不起作用。有没有其他方法来标准化数据？

python

machine-learning

imputation

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49854629

复制

相似问题

问在机器学习中，有没有一种方法可以归因于缺失值？
EN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在机器学习中，有没有一种方法可以归因于缺失值？EN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在机器学习中，有没有一种方法可以归因于缺失值？
EN