以下是数据帧:
CNSSSBDVSN CNSSSBDVS1 CNMCRGNNM \
0 5941833 Kluskus 1 Cariboo
1 5949832 Iskut 6 North Coast / Cote-nord
2 5941016 Cariboo H Cariboo
3 5955040 Peace River B Northeast / Nord-est
4 5941801 Alkali Lake 1 Cariboo
CNSSSBDVS3 instagram_posts airports \
0 Indian Reserve 0 0
1 Indian Reserve 0 0
2 Regional District Electoral Area 0 0
3 Regional District Electoral Area 1 17
4 Indian Reserve 0 0
railway_stations accommodations visitor_centers festivals \
0 0 0 0 0
1 0 0 0 0
2 0 5 0 0
3 11 0 0 0
4 0 0 0 0
ports_and_ferry_terminals attractions
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
下面是代码。在你阅读它之前,我想提到两点: 1.我认为残差或索引出了问题2.如果需要,CNSSSBDVSN可以用作索引
# -*- coding: utf-8 -*-
import pandas as pd
import statsmodels.formula.api as sm
import matplotlib.pyplot as plt
import scipy.stats as stats
from tabulate import tabulate
if __name__ == "__main__":
# Read data
census_subdivision_without_lower_mainland_and_van_island = pd.read_csv('../data/augmented/census_subdivision_without_lower_mainland_and_van_island.csv')
# Select data
cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City']
non_cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] != 'City']
# Fit
fit_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=cities).fit()
fit_non_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=non_cities).fit()
print(fit_cities.summary())
print(fit_non_cities.summary())
# Residual
cities['residual'] = fit_cities.resid
non_cities['residual'] = fit_non_cities.resid
给出错误:
/Users/Chu/Documents/dssg/done/linear_model_cities.py:27: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
cities['residual'] = fit_cities.resid
/Users/Chu/Documents/dssg/done/linear_model_cities.py:28: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
non_cities['residual'] = fit_non_cities.resid
发布于 2017-06-26 03:42:29
您的问题是cities是census_subdivision_without_lower_mainland_and_van_island的一部分,如果您想将cities用作自己的数据帧,则可以使用以下命令创建一个副本:
cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City'].copy()
或者,如果您希望修改原始数据帧,您可以使用loc插入结果,就像前面提到的错误:
census_subdivision_without_lower_mainland_and_van_island.loc[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City','residuals'] = fit_cities.resid
同样,对于非城市地区也是如此。仅供参考,我会使用较短的dataframe名称,以保持代码的可读性,并保持在推荐的python行限制内
https://stackoverflow.com/questions/44752581
复制相似问题