我试图为我的培训和测试数据集做简历。我正在使用LinearRegressor。但是,当我运行代码时,我会得到下面的错误。但是,当我在决策树上运行代码时,我不会发现任何错误,而且代码可以工作。怎么解决这个问题?我的简历部分代码正确吗?谢谢你的help.......................................................
简历代码参考:scikit-learn cross_validation over-fitting or under-fitting
data_set = pd.read_excel("NEW Collected Data for Preliminary Results Independant variables ONLY_NO AREA_NO_INFILL_DENSITY_no_printing_temperature.xlsx")
pd.set_option('max_columns', 35)
pd.set_option('max_rows', 300)
data_set.head(300)
X, y = data_set[[ "Part's Z-Height (mm)","Part's Solid Volume (cm^3)","Layer Height (mm)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's volume) (cm^3)"]], data_set [["Climate change (kg CO2 eq.)","Climate change, incl biogenic carbon (kg CO2 eq.)","Fine Particulate Matter Formation (kg PM2.5 eq.)","Fossil depletion (kg oil eq.)","Freshwater Consumption (m^3)","Freshwater ecotoxicity (kg 1,4-DB eq.)","Freshwater Eutrophication (kg P eq.)","Human toxicity, cancer (kg 1,4-DB eq.)","Human toxicity, non-cancer (kg 1,4-DB eq.)","Ionizing Radiation (Bq. C-60 eq. to air)","Land use (Annual crop eq. yr)","Marine ecotoxicity (kg 1,4-DB eq.)","Marine Eutrophication (kg N eq.)","Metal depletion (kg Cu eq.)","Photochemical Ozone Formation, Ecosystem (kg NOx eq.)","Photochemical Ozone Formation, Human Health (kg NOx eq.)","Stratospheric Ozone Depletion (kg CFC-11 eq.)","Terrestrial Acidification (kg SO2 eq.)","Terrestrial ecotoxicity (kg 1,4-DB eq.)"]] scaler = preprocessing.MinMaxScaler()
names = data_set.columns
d = scaler.fit_transform(data_set)
scaled_df = pd.DataFrame(d, columns=names)
X_normalized, y_for_normalized = scaled_df[[ "Part's Z-Height (mm)","Part's Solid Volume (cm^3)","Layer Height (mm)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's volume) (cm^3)"]], scaled_df [["Climate change (kg CO2 eq.)","Climate change, incl biogenic carbon (kg CO2 eq.)","Fine Particulate Matter Formation (kg PM2.5 eq.)","Fossil depletion (kg oil eq.)","Freshwater Consumption (m^3)","Freshwater ecotoxicity (kg 1,4-DB eq.)","Freshwater Eutrophication (kg P eq.)","Human toxicity, cancer (kg 1,4-DB eq.)","Human toxicity, non-cancer (kg 1,4-DB eq.)","Ionizing Radiation (Bq. C-60 eq. to air)","Land use (Annual crop eq. yr)","Marine ecotoxicity (kg 1,4-DB eq.)","Marine Eutrophication (kg N eq.)","Metal depletion (kg Cu eq.)","Photochemical Ozone Formation, Ecosystem (kg NOx eq.)","Photochemical Ozone Formation, Human Health (kg NOx eq.)","Stratospheric Ozone Depletion (kg CFC-11 eq.)","Terrestrial Acidification (kg SO2 eq.)","Terrestrial ecotoxicity (kg 1,4-DB eq.)"]]
scaled_df.head(200) Part's Z-Height (mm) Part's Solid Volume (cm^3) Layer Height (mm) Printing/Scanning Speed (mm/s) Part's Orientation (Support's volume) (cm^3) Climate change (kg CO2 eq.) Climate change, incl biogenic carbon (kg CO2 eq.) Fine Particulate Matter Formation (kg PM2.5 eq.) Fossil depletion (kg oil eq.) Freshwater Consumption (m^3) Freshwater ecotoxicity (kg 1,4-DB eq.) Freshwater Eutrophication (kg P eq.) Human toxicity, cancer (kg 1,4-DB eq.) Human toxicity, non-cancer (kg 1,4-DB eq.) Ionizing Radiation (Bq. C-60 eq. to air) Land use (Annual crop eq. yr) Marine ecotoxicity (kg 1,4-DB eq.) Marine Eutrophication (kg N eq.) Metal depletion (kg Cu eq.) Photochemical Ozone Formation, Ecosystem (kg NOx eq.) Photochemical Ozone Formation, Human Health (kg NOx eq.) Stratospheric Ozone Depletion (kg CFC-11 eq.) Terrestrial Acidification (kg SO2 eq.) Terrestrial ecotoxicity (kg 1,4-DB eq.)
0 0.258287 0.005030 0.0 0.666667 0.040088 0.069825 0.056976 0.083205 0.010373 0.113808 0.104798 0.086400 0.110358 0.012836 0.091120 0.108676 0.090401 0.087426 0.125608 0.079028 0.080495 0.078380 0.082404 0.045040
1 0.258287 0.005030 0.2 0.666667 0.036597 0.041682 0.022880 0.074884 0.004841 0.045640 0.102285 0.082884 0.044202 0.005414 0.086700 0.105749 0.087161 0.084130 0.060373 0.072878 0.073529 0.074829 0.075438 0.018122
2 0.258287 0.009557 0.4 0.666667 0.031013 0.033310 0.012113 0.073035 0.003458 0.023401 0.102914 0.082494 0.022690 0.003231 0.086279 0.105749 0.086937 0.084130 0.039708 0.071341 0.071981 0.074698 0.073447 0.009856
3 0.258287 0.009054 0.6 0.666667 0.031013 0.029213 0.006954 0.072111 0.002766 0.012936 0.102914 0.082103 0.012524 0.001921 0.086069 0.105423 0.086602 0.084130 0.029579 0.070572 0.071207 0.074435 0.072452 0.005723
4 0.258287 0.010060 1.0 0.666667 0.031711 0.025650 0.001795 0.071803 0.003458 0.002180 0.103542 0.082884 0.002063 0.001048 0.086490 0.106074 0.087049 0.084542 0.019449 0.070572 0.071207 0.074961 0.072452 0.001908
5 0.258287 0.005030 0.0 0.000000 0.040088 0.074279 0.062360 0.084129 0.011065 0.125000 0.104798 0.086790 0.121114 0.014146 0.091330 0.108676 0.091519 0.087426 0.136143 0.080566 0.081269 0.078511 0.083400 0.049385
6 0.258287 0.038226 0.0 0.666667 0.040088 0.097791 0.074249 0.109091 0.038036 0.135174 0.129299 0.111788 0.132164 0.024625 0.116582 0.133725 0.116102 0.112970 0.154781 0.105166 0.106037 0.104419 0.108280 0.064222
7 0.137212 0.004527 0.0 0.666667 0.030314 0.058247 0.046433 0.076117 0.003458 0.095349 0.099144 0.080150 0.092382 0.008907 0.084806 0.102821 0.084702 0.081246 0.106159 0.072878 0.073529 0.072199 0.075438 0.035608
8 0.137212 0.004527 0.2 0.666667 0.029616 0.035269 0.017721 0.069954 0.000000 0.037355 0.098516 0.078197 0.036246 0.002794 0.082281 0.101520 0.082803 0.080010 0.051053 0.068266 0.068885 0.070489 0.070462 0.013247
9 0.137212 0.010060 0.4 0.666667 0.028918 0.031706 0.010543 0.072111 0.002766 0.020494 0.102285 0.081712 0.019891 0.002358 0.085438 0.104773 0.086043 0.083306 0.036467 0.070572 0.071207 0.073908 0.072452 0.008372
10 0.137212 0.010060 0.6 0.666667 0.028220 0.027431 0.005384 0.070878 0.001383 0.010320 0.101657 0.080931 0.010019 0.001484 0.084806 0.104448 0.085373 0.082894 0.026742 0.069803 0.070433 0.073251 0.071457 0.004345
11 0.137212 0.009557 1.0 0.666667 0.027522 0.022800 0.000000 0.069029 0.000000 0.000000 0.101029 0.080150 0.000000 0.000000 0.083754 0.103472 0.084367 0.081658 0.016613 0.068266 0.068885 0.072330 0.070462 0.000000
12 0.137212 0.004527 0.0 0.000000 0.030314 0.062879 0.052266 0.077042 0.004149 0.107122 0.099144 0.080541 0.103875 0.010217 0.085227 0.102821 0.085037 0.081658 0.117099 0.073647 0.074303 0.072462 0.076433 0.040165
13 0.137212 0.037723 0.0 0.666667 0.030314 0.085857 0.063257 0.102003 0.031120 0.116134 0.123645 0.105929 0.112568 0.020695 0.110269 0.127544 0.110515 0.106790 0.134522 0.098247 0.099071 0.097843 0.101314 0.053624
14 0.077118 0.004527 0.0 0.666667 0.054050 0.080335 0.064827 0.091217 0.018672 0.126453 0.111709 0.093821 0.122145 0.016766 0.098485 0.115833 0.098223 0.094842 0.139789 0.087485 0.088235 0.085876 0.090366 0.052777
15 0.077118 0.004527 0.0 0.000000 0.054050 0.085144 0.070884 0.092450 0.019364 0.138081 0.111709 0.094211 0.133638 0.018075 0.099116 0.116158 0.098223 0.094842 0.151135 0.088253 0.089009 0.086139 0.091361 0.057864
16 0.077118 0.004527 0.0 0.333333 0.054050 0.082472 0.067519 0.091834 0.019364 0.132267 0.111709 0.094211 0.127744 0.017639 0.098695 0.116158 0.098223 0.094842 0.144652 0.087485 0.088235 0.086007 0.091361 0.054684 lin_regressor = LinearRegression()
# pass the order of your polynomial here
poly = PolynomialFeatures(1)
# convert to be used further to linear regression
X_transform = poly.fit_transform(x_train)
# fit this to Linear Regressor
linear_regg=lin_regressor.fit(X_transform,y_train). import numpy as np
from sklearn.metrics import SCORERS
from sklearn.model_selection import KFold
scorer = SCORERS['r2']
cv = KFold(n_splits=5, random_state=0,shuffle=True)
train_scores, test_scores = [], []
for train, test in cv.split(X_normalized):
X_transform2 = poly.fit_transform(X_normalized)
OL=lin_regressor.fit(X_transform2.iloc[train], y_for_normalized.iloc[train])
tr_21 = OL.score(X_train, y_train)
ts_21 = OL.score(X_test, y_test)
print ("Train score:", tr_21) # from documentation .score returns r^2
print ("Test score:", ts_21) # from documentation .score returns r^2
train_scores.append(tr_21)
test_scores.append(ts_21)
print ("The Mean for Train scores is:",(np.mean(train_scores)))
print ("The Mean for Test scores is:",(np.mean(test_scores)))错误消息:
--------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_73165/2276765730.py in <module>
10 for train, test in cv.split(X_normalized):
11 X_transform2 = poly.fit_transform(X_normalized)
---> 12 OL=lin_regressor.fit(X_transform2.iloc[train], y_for_normalized.iloc[train])
13 tr_21 = OL.score(X_train, y_train)
14 ts_21 = OL.score(X_test, y_test)
AttributeError: 'numpy.ndarray' object has no attribute 'iloc'决策树
new_model = DecisionTreeRegressor(max_depth=9,
min_samples_split=10,random_state=0)
import numpy as np
from sklearn.metrics import SCORERS
from sklearn.model_selection import KFold
scorer = SCORERS['r2']
cv = KFold(n_splits=5, random_state=0,shuffle=True)
train_scores, test_scores = [], []
for train, test in cv.split(X_normalized):
OO=new_model.fit(X_normalized.iloc[train], y_for_normalized.iloc[train])
tr_2 = OO.score(X_train, y_train)
ts_2 = OO.score(X_test, y_test)
print ("Train score:", tr_2) # from documentation .score returns r^2
print ("Test score:", ts_2) # from documentation .score returns r^2
train_scores.append(tr_2)
test_scores.append(ts_2)
print ("The Mean for Train scores is:",(np.mean(train_scores)))
print ("The Mean for Test scores is:",(np.mean(test_scores)))输出
Train score: 0.8960560474997927
Test score: -0.15521696464773224
Train score: 0.8852795454592853
Test score: 0.17650772852710495
Train score: 0.5825347735306872
Test score: 0.34789159049344665
Train score: 0.8549575808716975
Test score: 0.7615265842042157
Train score: 0.8340261480334055
Test score: 0.14011826401728472
The Mean for Train scores is: 0.8105708190789735
The Mean for Test scores is: 0.2541654405188639#试用1
import numpy as np
from sklearn.metrics import SCORERS
from sklearn.model_selection import KFold
scorer = SCORERS['r2']
cv = KFold(n_splits=5, random_state=0,shuffle=True)
train_scores, test_scores = [], []
for train, test in cv.split(X_normalized):
X_transform2 = poly.fit_transform(X_normalized)
OL=lin_regressor.fit(X_transform2[train], y_for_normalized[train])
tr_21 = OL.score(X_train, y_train)
ts_21 = OL.score(X_test, y_test)
print ("Train score:", tr_21) # from documentation .score returns r^2
print ("Test score:", ts_21) # from documentation .score returns r^2
train_scores.append(tr_21)
test_scores.append(ts_21)
print ("The Mean for Train scores is:",(np.mean(train_scores)))
print ("The Mean for Test scores is:",(np.mean(test_scores)))错误消息:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_90924/12176184.py in <module>
10 for train, test in cv.split(X_normalized):
11 X_transform2 = poly.fit_transform(X_normalized)
---> 12 OL=lin_regressor.fit(X_transform2[train], y_for_normalized[train])
13 tr_21 = OL.score(X_train, y_train)
14 ts_21 = OL.score(X_test, y_test)
~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3462 if is_iterator(key):
3463 key = list(key)
-> 3464 indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
3465
3466 # take() does not accept boolean indexers
~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis)
1312 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1313
-> 1314 self._validate_read_indexer(keyarr, indexer, axis)
1315
1316 if needs_i8_conversion(ax.dtype) or isinstance(
~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
1372 if use_interval_msg:
1373 key = list(key)
-> 1374 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1375
1376 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
KeyError: "None of [Int64Index([ 0, 1, 3, 4, 5, 6, 9, 10, 11, 12, 14, 15, 17, 18, 19, 20, 21,\n 23, 25, 27, 28, 29, 31, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,\n 44, 45, 46, 47, 48, 49, 50, 51, 52, 56, 57, 58, 59, 60, 61, 62, 63,\n 64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 76, 77, 79, 80, 81, 82, 83,\n 84, 85, 87, 88, 89, 90, 91, 94, 96, 97, 98, 99],\n dtype='int64')] are in the [columns]"发布于 2022-08-17 21:10:29
理解
poly.fit_transform将返回numpy.ndarray,因此在这里,您的X_normalized将从pandas.core.frame.DataFrame转换为respectively.,您的y_for_normalized在numpy.ndarray中仍然是pandas.core.frame.DataFrame.
numpy.ndarray[indexes]的形式传递索引,对于pandas.core.frame.DataFrame,您将在.iloc[indexes] respectively.中传递索引
解决方案
X_transform2使用[]获取数据,因为它是X_transform2,y_for_normalized使用.iloc[]就像它是代码
train_scores, test_scores = [], []
for train, test in cv.split(X_normalized):
X_transform2 = poly.fit_transform(X_normalized)
# [] for X_transform2, .iloc[] for y_for_normalized
OL = lin_regressor.fit(X_transform2[train], y_for_normalized.iloc[train])
tr_21 = OL.score(X_transform2[train], y_for_normalized.iloc[train])
ts_21 = OL.score(X_transform2[test], y_for_normalized.iloc[test])
print("Train score:", tr_21) # from documentation .score returns r^2
print("Test score:", ts_21) # from documentation .score returns r^2
train_scores.append(tr_21)
test_scores.append(ts_21)
print("The Mean for Train scores is:", (np.mean(train_scores)))
print("The Mean for Test scores is:", (np.mean(test_scores)))PS:
OL.score中使用X_train,y_train和X_test,y_test。它应该是由train和test生成的具有cv索引的数据集。同样的情况反映在上面的代码片段中。如果您有use.,y_train和X_test,因为特定的原因而定义了y_test,那么您对X_train很好。
当您希望所有功能都达到1度时,为什么要使用sklearn.,所以使用PolynomialFeatures() 1度没有什么区别。如果使用新版本的PolynomialFeatures(),
SCORER的弃用警告。https://stackoverflow.com/questions/73392074
复制相似问题