我正在尝试将PCA (主成分分析)应用于具有124行和13个特征的数据集。我正在尝试使用多少个功能(通过Logistic回归)来获得最准确的预测,我的代码如下:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/'
'machine-learning-databases/wine/wine.data', header=None)
from sklearn.model_selection import train_test_split
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values
X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size=0.3, stratify=y, random_state=0)
# standardize the features
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
# initializing the PCA transformer and
# logistic regression estimator:
pca = PCA() #prof recommends getting rid of m_components = 3
lr = LogisticRegression()
# dimensionality reduction:
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)
"""
rows = len(X_train_pca)
columns = len(X_train_pca[0])
print(rows)
print(columns)
"""
# fitting the logistic regression model on the reduced dataset:
for i in range(12):
lr.fit(X_train_pca[:, :i], y_train)
y_train_pca = lr.predict(X_train_pca[:, :i])
print('Training accuracy:', lr.score(X_train_pca[:, :i], y_train))
我得到错误消息: raise ValueError("Found array with %d feature(s) (shape=%s) while“ValueError: Found array with 0 feature(s) (shape=(124,0),而至少需要1个特征。
据我所知,for循环范围在12是正确的,因为它将遍历所有13个特征(0到12),我正在尝试让for循环遍历所有特征(对一个特征进行逻辑回归,然后是两个,然后是3...继续,直到所有13个特征,然后看看它们的准确性,看看有多少特征效果最好)。
发布于 2020-11-24 02:40:11
对于你的错误:
当i=0
将为您提供一个空数组时为X_train_pca[:, :i]
,该空数组作为.fit()
的输入无效。
如何解决:
如果您希望仅使用截距拟合模型,则可以在LogisticRegression()
中显式设置fit_intercept=False
,并在X中添加一个额外的列(在最左侧),其中填充1(作为截距)。
https://stackoverflow.com/questions/64974268
复制相似问题