我有一个熊猫数据框,每行都有日期,我想按第二列分组,然后将日期排序,然后将字母A分配给第一个日期,将B分配给第二个日期,等等。
import pandas as pd
rng = pd.date_range('2015-02-24', periods=5, freq='T')
names = ['MAK', 'MAK', 'OKA', 'OKA', 'MAK']
df = pd.DataFrame({ 'Date': rng, 'Groups': names })
df
Date Groups
0 2015-02-24 00:00:00 MAK
1 2015-02-24 00:01:00 MAK
2 2015-02-24 00:02:00 OKA
3 2015-02-24 00:03:00 OKA
4 2015-02-24 00:04:00 MAK
我想要的结果是:
Date Groups Letter
0 2015-02-24 00:00:00 MAK A
1 2015-02-24 00:01:00 MAK B
2 2015-02-24 00:02:00 OKA A
3 2015-02-24 00:03:00 OKA B
4 2015-02-24 00:04:00 MAK C
我想我可以这样定义一个函数:
def assignLetters(row):
return sort == 'A'
df['Letter'] = df.groupby('Groups').Date.apply(assignLetters)
任何帮助都将不胜感激!也许指引我正确的方向,不确定如何按时间顺序分配字母?
发布于 2021-01-28 18:09:40
将DataFrameGroupBy.rank
与由大写字母创建的字典映射一起使用:
import string
d = dict(enumerate(string.ascii_uppercase, 1))
df['Letter'] = df.groupby('Groups')['Date'].rank('dense').map(d)
print (df)
Date Groups Letter
0 2015-02-24 00:00:00 MAK A
1 2015-02-24 00:01:00 MAK B
2 2015-02-24 00:02:00 OKA A
3 2015-02-24 00:03:00 OKA B
4 2015-02-24 00:04:00 MAK C
编辑:对于许多组,可以通过自定义函数在Z
后生成AA,AB..
值:
from string import ascii_uppercase
import itertools
#https://stackoverflow.com/a/29351603/2901002
def iter_all_strings():
for size in itertools.count(1):
for s in itertools.product(ascii_uppercase, repeat=size):
yield "".join(s)
df['Letter'] = df.groupby('Groups')['Date'].rank('dense')
d = dict(enumerate(itertools.islice(iter_all_strings(), int(df['Letter'].max())), 1))
print (d)
{1: 'A', 2: 'B', 3: 'C'}
df['Letter'] = df['Letter'].map(d)
print (df)
Date Groups Letter
0 2015-02-24 00:00:00 MAK A
1 2015-02-24 00:01:00 MAK B
2 2015-02-24 00:02:00 OKA A
3 2015-02-24 00:03:00 OKA B
4 2015-02-24 00:04:00 MAK C
https://stackoverflow.com/questions/65934691
复制相似问题