我有一个包含几个数字的系列。我希望使用字典值将它们替换为其他字符串类型的数据。但我不知道怎么做..。
GDP_group['GdpForYearPer$1M'].head(5)
0 46.919625
1 47.515189
2 47.737955
3 54.832578
4 56.338028
5 63.101272 \
这就是我用来替换数据的dict。
range_GDP = {'$0 ~ $100M': np.arange(0,100), '$100M ~ $1B': np.arange(100.0000001,1000), '$1B ~ $10B': np.arange(1000.000001, 10000), '$10B ~ $100B': np.arange(10000.000001, 100000),
'$100B ~ $1T': np.arange(100000.000001, 1000000), '$1T ~': np.arange(1000000.000001, 20000000)}
发布于 2020-09-06 05:43:18
您可以使用pd.cut
在范围内分割数据并应用标签。
(Re)在日志空间中生成均匀采样的虚拟数据:
import numpy as np
import pandas as pd
GdpForYearPer1M = pd.Series(10**np.random.randint(0, 8, 100))
"""
0 1
1 1000
2 100
3 10
4 100
...
95 1000000
96 100
97 100000
98 10000
99 10
"""
解决办法:
# generate "cuts" (bins) and associated labels from `range_GDP`.
cut_data = [(np.min(v), k) for k, v in range_GDP.items()]
bins, labels = zip(*cut_data)
# bins required to have one more value than labels
bins = list(bins) + [np.inf]
pd.cut(GdpForYearPer1M, bins=bins, labels=labels)
产出:
0 $0 ~ $100M
1 $100M ~ $1B
2 $0 ~ $100M
3 $0 ~ $100M
4 $0 ~ $100M
...
95 $100B ~ $1T
96 $0 ~ $100M
97 $10B ~ $100B
98 $1B ~ $10B
99 $0 ~ $100M
Length: 100, dtype: category
Categories (6, object): [$0 ~ $100M < $100M ~ $1B < $1B ~ $10B < $10B ~ $100B < $100B ~ $1T < $1T ~]
https://stackoverflow.com/questions/63760699
复制相似问题