我是一个初学数据科学的学生,我被要求从头开始编写一个线性回归,包括梯度下降,遵循老师的说明(例如,实现哪些函数)并使用numpy。
整个过程运行正常,但是当我使用gradient_descent()函数时,当我对一个向量的所有元素求和以计算关于on估计器的梯度时,我一直得到gradient()函数的溢出错误。奇怪的是梯度()函数本身工作得很好,但在gradient_descent()函数中溢出。
我试图将中间结果四舍五入,以免溢出任何溢出的结果,我试图隔离每个结果。我在MacOs 10.14.6和jupyter上使用3.7.3。
下面是我的代码:
import numpy as np
import random
def predict(x,th):
if x.shape[1] != th.shape[0]:
return "ERROR : The number of covariable columns is not equal to number of lines in parameter matrix !"
else :
return (x@th)
def error(x,th,y):
return (y-predict(x,th))
def gradient(x,th,y):
grad = np.full(th.shape[0],1)
for i in range(grad.shape[0]):
err = error(x,th,y).transpose()
temp = x[:,i]*err
grad[i] = temp.sum()
return grad
def gradient_descent(x,th,y,a = 0.01):
i = 0
while i<2000:
dif = a*gradient(x,th,y)
th = th - dif
i += 1
if dif.all()<0.5:
break
return th
th = np.full(13,1).reshape(13,1) #just for testing purposes
predict(x_train, th)
error(x_train, th, y_train).shape
cost_fun(x_train, th, y_train)
gradient_descent(x_train, th, y_train)以及随之而来的错误:
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-414-5e608e709a9e> in <module>
4 error(x_train, th, y_train).shape
5 cost_fun(x_train, th, y_train)
----> 6 gradient_descent(x_train, th, y_train)
7
8
<ipython-input-413-57df3054d402> in gradient_descent(x, th, y, a)
25 i = 0
26 while i<2000:
---> 27 dif = a*gradient(x,th,y)
28 th = th - dif
29 i += 1
<ipython-input-413-57df3054d402> in gradient(x, th, y)
19 err = error(x,th,y)[i]
20 temp = x[i,:]*err
---> 21 grad[i] = round(temp.sum(), ndigits=10)
22 return grad
23
OverflowError: Python int too large to convert to C long当我运行gradient(x_train,th,y_train)时,我得到这样的结果:
array([ -98761915, -398968695, -1128435471, -1089578372, -7619613, -54698832, -620945173, -6731108064, -378298899, -932523483, -40174412843, -1826831673, 34647602295])
gradient_descent()应该返回一个优化参数的向量。怎么会出问题呢?!
发布于 2019-09-21 16:38:57
在你每次执行语句dif = a*gradient(x,th,y)之前,通过打印循环no和th和dif的值来跟踪你的错误,然后在遇到错误后检查最后的值。我不使用x_train和y_train,所以我不能运行代码。如果可能的话,分享部分数据的链接,以便我可以查看。
https://stackoverflow.com/questions/58035456
复制相似问题