文章/答案/技术大牛

发布

社区首页 >问答首页 >`numpy.nanpercentile`速度极慢

问`numpy.nanpercentile`速度极慢
EN

Stack Overflow用户

提问于 2020-02-01 16:31:53

回答 3查看 434关注 0票数 1

numpy.nanpercentile的速度非常慢。所以，我想使用cupy.nanpercentile；但是还没有实现cupy.nanpercentile。有人对此有解决方案吗？

python

numpy

cupy

回答 3

Stack Overflow用户

发布于 2020-12-21 21:27:57

我还有一个问题，np.nanpercentile对我的数据集来说太慢了。我找到了一个可以让你使用标准np.percentile的变通方法。它还可以应用于许多其他库。

这个应该可以解决你的问题。而且它的运行速度也比np.nanpercentile快得多：

arr = np.array([[np.nan,2,3,1,2,3],
                [np.nan,np.nan,1,3,2,1],
                [4,5,6,7,np.nan,9]])

mask = (arr >= np.nanmin(arr)).astype(int)

count = mask.sum(axis=1)
groups = np.unique(count)
groups = groups[groups > 0]

p90 = np.zeros((arr.shape[0]))
for g in range(len(groups)):
    pos = np.where (count == groups[g])
    values = arr[pos]
    values = np.nan_to_num (values, nan=(np.nanmin(arr)-1))
    values = np.sort (values, axis=1)
    values = values[:,-groups[g]:]
    p90[pos] = np.percentile (values, 90, axis=1)

因此，它不是将百分位数与nans一起使用，而是按有效数据量对行进行排序，并采用分隔的行的百分位数。然后把所有的东西都加在一起。这也适用于3D阵列，只需添加y_pos和x_pos而不是pos。并且要注意你正在计算的轴。

票数 2

Stack Overflow用户

发布于 2020-09-12 18:56:34

def testset_gen(num):
    init=[]
    for i in range (num):
        a=random.randint(65,122) # Dummy name
        b=random.randint(1,100) # Dummy value: 11~100 and 10% of nan
        if b<11:
            b=np.nan # 10% = nan
        init.append([a,b])
    return np.array(init)

np_testset=testset_gen(30000000) # 468,751KB

def f1_np (arr, num):
    return np.percentile (arr[:,1], num)
# 55.0, 0.523902416229248 sec

打印(f1_np(np_testset:,1，50))

def cupy_nanpercentile (arr, num):
    return len(cp.where(arr > num)[0]) / (len(arr) - cp.sum(cp.isnan(arr))) * 100
    # 55.548758317136446, 0.3640251159667969 sec
    # 43% faster
    # If You need same result, use int(). But You lose saved time.

打印(cupy_nanpercentile(cp_testset:,1，50))

我无法想象测试结果会花几天的时间。用我的电脑，它似乎有1万亿行或更多的数据。正因为如此，由于缺乏资源，我无法重现同样的问题。

票数 0

Stack Overflow用户

发布于 2021-09-01 13:28:46

下面是numba的一个实现。编译后，它比numpy版本快7倍以上。

现在，它被设置为沿着第一个轴获取百分位数，但是可以很容易地更改它。

@numba.jit(nopython=True, cache=True)
def nan_percentile_axis0(arr, percentiles):
    """Faster implementation of np.nanpercentile
    
    This implementation always takes the percentile along axis 0.
    Uses numba to speed up the calculation by more than 7x.

    Function is equivalent to np.nanpercentile(arr, <percentiles>, axis=0)

    Params:
        arr (np.array): Array to calculate percentiles for
        percentiles (np.array): 1D array of percentiles to calculate

    Returns:
        (np.array) Array with first dimension corresponding to
            values as passed in percentiles

    """
    shape = arr.shape
    arr = arr.reshape((arr.shape[0], -1))
    out = np.empty((len(percentiles), arr.shape[1]))
    for i in range(arr.shape[1]):
        out[:,i] = np.nanpercentile(arr[:,i], percentiles)
    shape = (out.shape[0], *shape[1:])
    return out.reshape(shape)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60015245

复制

相似问题

问`numpy.nanpercentile`速度极慢
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问`numpy.nanpercentile`速度极慢EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问`numpy.nanpercentile`速度极慢
EN