问获取每组变量分位数的值
EN

Stack Overflow用户

提问于 2020-10-12 18:37:42

回答 1查看 37关注 0票数 0

我有按组分类的数据，每个组有一个给定的分位数百分比。我想为每个组创建一个阈值，根据分位数百分比将组内的所有值分开。因此，如果一个组有q=0.8，我希望最低80%的值为1，高20%的值为0。

所以，假设数据是这样的：

我希望对象1、2和5得到结果1，其他3个得到结果0。我的数据总共由7.000.000行和14.000个组组成。我尝试过使用groupby.quantile进行此操作，但因此我需要一个常量分位数度量，而我的数据对于每个组都有一个不同的分位数度量。

pandas

quantile

回答 1

Stack Overflow用户

发布于 2020-10-20 16:34:49

设置：

num = 7_000_000
grp_num = 14_000
qua = np.around(np.random.uniform(size=grp_num), 2)
df = pd.DataFrame({
    "Group": np.random.randint(low=0, high=grp_num, size=num),
    "Quantile": 0.0,
    "Value": np.random.randint(low=100, high=300, size=num)
}).sort_values("Group").reset_index(0, drop=True)

def func(grp):
    grp["Quantile"] = qua[grp.Group]
    return grp

df = df.groupby("Group").apply(func)

答：(这基本上是一个for循环，因此为了提高性能，您可以尝试将numba应用于此)

def func2(grp):
    return grp.Value < grp.Value.quantile(grp.Quantile.iloc[0])

df["result"] = df.groupby("Group").apply(func2).reset_index(0, drop=True)
print(df)

输出：

         Group  Quantile  Value  result
0            0      0.33    156       1
1            0      0.33    259       0
2            0      0.33    166       1
3            0      0.33    183       0
4            0      0.33    111       1
...        ...       ...    ...     ...
6999995  13999      0.83    194       1
6999996  13999      0.83    227       1
6999997  13999      0.83    215       1
6999998  13999      0.83    103       1
6999999  13999      0.83    115       1

[7000000 rows x 4 columns]
CPU times: user 14.2 s, sys: 362 ms, total: 14.6 s
Wall time: 14.7 s

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64316194

复制

相似问题

问获取每组变量分位数的值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取每组变量分位数的值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取每组变量分位数的值
EN