目前,我有代码来拟合对数正态分布。
shape, loc, scale = sm.lognorm.fit(dataToLearn, floc = 0)
for b in bounds:
toPlot.append((b, currCount+sm.lognorm.ppf(b, s = shape, loc = loc, scale = scale)))
我希望能够将权重的向量传递给拟合。目前我有一个变通办法,我将所有的权重四舍五入为2个小数,然后重复每个值w
次,以便它得到适当的权重。
for i, d in enumerate(dataToLearn):
dataToLearn2 += int(w[i] * 100) * [d]
这个问题的运行时间对我的计算机来说太慢了,所以我希望得到一个更正确的解决方案。
请告诉我是使用scipy还是numpy来使我的解决方法更快、更有效
发布于 2018-07-19 05:31:39
SciPy分布不实现加权拟合。然而,对于对数正态分布,有(未加权的) maximum likelihood estimation的显式公式,并且这些公式很容易推广到加权数据。显式公式都是(实际上)平均值,对加权数据情况的推广是在公式中使用加权平均值。
下面是一个脚本,它演示了使用具有整数权重的小数据集进行计算,因此我们知道拟合参数的确切值应该是多少。
import numpy as np
from scipy.stats import lognorm
# Sample data and weights. To enable an exact comparison with
# the method of generating an array with the values repeated
# according to their weight, I use an array of weights that is
# all integers.
x = np.array([2.5, 8.4, 9.3, 10.8, 6.8, 1.9, 2.0])
w = np.array([ 1, 1, 2, 1, 3, 3, 1])
#-----------------------------------------------------------------------------
# Fit the log-normal distribution by creating an array containing the values
# repeated according to their weight.
xx = np.repeat(x, w)
# Use the explicit formulas for the MLE of the log-normal distribution.
lnxx = np.log(xx)
muhat = np.mean(lnxx)
varhat = np.var(lnxx)
shape = np.sqrt(varhat)
scale = np.exp(muhat)
print("MLE using repeated array: shape=%7.5f scale=%7.5f" % (shape, scale))
#-----------------------------------------------------------------------------
# Use the explicit formulas for the weighted MLE of the log-normal
# distribution.
lnx = np.log(x)
muhat = np.average(lnx, weights=w)
# varhat is the weighted variance of ln(x). There isn't a function in
# numpy for the weighted variance, so we compute it using np.average.
varhat = np.average((lnx - muhat)**2, weights=w)
shape = np.sqrt(varhat)
scale = np.exp(muhat)
print("MLE using weights: shape=%7.5f scale=%7.5f" % (shape, scale))
#-----------------------------------------------------------------------------
# Might as well check that we get the same result from lognorm.fit() using the
# repeated array
shape, loc, scale = lognorm.fit(xx, floc=0)
print("MLE using lognorm.fit: shape=%7.5f scale=%7.5f" % (shape, scale))
输出为
MLE using repeated array: shape=0.70423 scale=4.57740
MLE using weights: shape=0.70423 scale=4.57740
MLE using lognorm.fit: shape=0.70423 scale=4.57740
发布于 2018-07-19 04:26:18
您可以使用numpy.repeat来提高解决方法的效率:
import numpy as np
dataToLearn = np.array([1,2,3,4,5])
weights = np.array([1,2,1,1,3])
print(np.repeat(dataToLearn, weights))
# Output: array([1, 2, 2, 3, 4, 5, 5, 5])
numpy.repeat
性能的非常基本的性能测试:
import timeit
code_before = """
weights = np.array([1,2,1,1,3] * 1000)
dataToLearn = np.array([1,2,3,4,5] * 1000)
dataToLearn2 = []
for i, d in enumerate(dataToLearn):
dataToLearn2 += int(weights[i]) * [d]
"""
code_after = """
weights = np.array([1,2,1,1,3] * 1000)
dataToLearn = np.array([1,2,3,4,5] * 1000)
np.repeat(dataToLearn, weights)
"""
print(timeit.timeit(code_before, setup="import numpy as np", number=1000))
print(timeit.timeit(code_after, setup="import numpy as np", number=1000))
因此,对于您当前的方法,我得到了大约3.38,而对于numpy.repeat
,我得到了0.75
https://stackoverflow.com/questions/51410155
复制相似问题