文章/答案/技术大牛

发布

社区首页 >问答首页 >Python线性回归模型(Pandas，状态模型)-值错误: endog外部矩阵大小不匹配

问Python线性回归模型(Pandas，状态模型)-值错误: endog外部矩阵大小不匹配
EN

Stack Overflow用户

提问于 2018-06-09 17:47:54

回答 1查看 7.2K关注 0票数 2

我的一个朋友问我这个线性回归码，我也解不出来，所以现在它也是我的问题。

我们得到的错误：、ValueError: endog矩阵和exog矩阵是不同大小的。

当我将"Tech“从ind_names中删除时，它就可以正常工作了。这可能是没有意义的，但为了消除语法错误的可能性，我尝试这样做。

科技和金融行业的标签在DataFrame中的分布并不均衡，所以这可能导致了尺寸的错配？但我不能再调试了所以决定问你们。

很高兴得到一些关于错误和解决方案想法的确认。请在下面找到密码。

    #We have a portfolio constructed of 3 randomly generated factors (fac1, fac2, fac3). 
#Python code provides the following message 
#ValueError: The indices for endog and exog are not aligned

import pandas as pd
from numpy.random import rand
import numpy as np
import statsmodels.api as sm

fac1, fac2, fac3 = np.random.rand(3, 1000) #Generate  random factors

#Consider a collection of hypothetical stock portfolios
#Generate randomly 1000 tickers
import random; random.seed(0)
import string
N = 1000
def rands(n):
  choices = string.ascii_uppercase
  return ''.join([random.choice(choices) for _ in range(n)])


tickers = np.array([rands(5) for _ in range(N)])
ticker_subset = tickers.take(np.random.permutation(N)[:1000])

#Weighted sum of factors plus noise

port = pd.Series(0.7 * fac1 - 1.2 * fac2 + 0.3 * fac3 + rand(1000), index=ticker_subset)
factors = pd.DataFrame({'f1': fac1, 'f2': fac2, 'f3': fac3}, index=ticker_subset)

#Correlations between each factor and the portfolio 
#print(factors.corrwith(port))
factors1=sm.add_constant(factors)


#Calculate factor exposures using a regression estimated by OLS
#print(sm.OLS(np.asarray(port), np.asarray(factors1)).fit().params)

#Calculate the exposure on each industry
def beta_exposure(chunk, factors=None):
    return sm.OLS(np.asarray(chunk), np.asarray(factors)).fit().params


#Assume that we have only two industries – financial and tech

ind_names = np.array(['Financial', 'Tech'])
#Create a random industry classification 

sampler = np.random.randint(0, len(ind_names), N)
industries = pd.Series(ind_names[sampler], index=tickers, name='industry')
by_ind = port.groupby(industries)



exposures=by_ind.apply(beta_exposure, factors=factors1)
print(exposures)
#exposures.unstack()

#Determinate the exposures on each industry

python

pandas

linear-regression

statsmodels

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-06-09 23:15:17

理解错误信息：

ValueError: endog矩阵和exog矩阵大小不同

好吧，不算太糟。内源矩阵和外生矩阵大小不同。该模块提供了页面，说明内源是系统内部的因素，和的外源性是系统外的因素。

一些调试

检查我们得到的数组的形状。要做到这一点，我们需要拆开单线，打印参数的.shape，或者打印每个参数的前几个。另外，注释掉抛出错误的行。在那里，我们发现我们得到：

chunk [490]
factor [1000    4]
chunk [510]
factor [1000    4]

噢!这就是了。我们还以为因素也会变大呢。第一次是490 4次，第二次是510 4次。注意:由于这些类别是随机分配的，所以每次都会有所不同。

基本上，我们在这个函数中有太多的信息了。我们可以使用块来查看选择哪些因素，过滤这些因素，然后一切都会正常工作。

查看文档中的函数定义：

class statsmodels.regression.linear_model.OLS(endog, exog=None, missing='none', hasconst=None, **kwargs)

我们只是传递两个参数，其余的都是可选的。让我们看看我们路过的两个。

内皮素(阵列样)- 1-d内源性反应变量。因变量。外型(类似数组)-一个nobs x k数组，其中nob是观察的数目，k是回归者的数目。

啊，endog和exog又来了。endog是一维数组状的。到目前为止，shape 490运行良好。exog nobs?哦，它的观测次数。因此，它是一个2d数组，在本例中，我们需要4的shape 4。

这一具体问题：

beta_exposure应该是：

def beta_exposure(chunk, factors=None):
    factors = factors.loc[factors.index.isin(chunk.index)]
    return sm.OLS(np.asarray(chunk), np.asarray(factors)).fit().params

问题是，您将beta_exposures应用于列表的每个部分(它是随机的，假设490个元素用于Financial，510个用于Tech)，但是factors=factors1总是给出1000个值( groupby代码没有触及到这个值)。

有关我使用的参考资料，请参阅model.OLS.html和exog.html。

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50776985

复制

相似问题

问Python线性回归模型(Pandas，状态模型)-值错误: endog外部矩阵大小不匹配
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python线性回归模型(Pandas，状态模型)-值错误: endog外部矩阵大小不匹配EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python线性回归模型(Pandas，状态模型)-值错误: endog外部矩阵大小不匹配
EN