我有下面的数据
df = pd.DataFrame({'Category': {0: 'onboarding segment-confirmation-unexpected-input origin',
1: 'onboarding segment-confirmation-unexpected-input view',
2: 'product-availability cpf-request-unexpected-input origin',
3: 'product-availability postalcode-validation-true-unexpected-input origin',
4: 'product-availability postalcode-validation-true-unexpected-input view'},
'UserId': {0: 9090, 1: 4545, 2: 3266, 3: 2894, 4: 2772}})我要做的是制定一个标志,检查字符串部分是否不同于“视图”或“起源”。等于前一个值,如果是这样,如果不增加标志值,则保持标志。
通缉结果
df = pd.DataFrame({'Category': {0: 'onboarding segment-confirmation-unexpected-input origin',
1: 'onboarding segment-confirmation-unexpected-input view',
2: 'product-availability cpf-request-unexpected-input origin',
3: 'product-availability postalcode-validation-true-unexpected-input origin',
4: 'product-availability postalcode-validation-true-unexpected-input view'},
'UserId': {0: 9090, 1: 4545, 2: 3266, 3: 2894, 4: 2772},
'Flag':{0:'Flag_1',1:'Flag_1',2:'Flag_2',3:'Flag_3',4:'Flag_3'}})怎样才能做到这一点?我试着把它切成一组,但在增加的部分上我遇到了一些困难。
发布于 2022-09-22 13:03:03
假设您想要考虑前两个块或字符串(块由空格分隔):
# get substrings, keep first 2 (can be changed)
df2 = df['Category'].str.split(expand=True).iloc[:, :2]
# start new group if any value is different from the previous row
group = df2.ne(df2.shift()).any(axis=1).cumsum()
# add flag
df['Flag'] = 'Flag_'+group.astype(str)产出:
Category UserId Flag
0 onboarding segment-confirmation-unexpected-inp... 9090 Flag_1
1 onboarding segment-confirmation-unexpected-inp... 4545 Flag_1
2 product-availability cpf-request-unexpected-in... 3266 Flag_2
3 product-availability postalcode-validation-tru... 2894 Flag_3
4 product-availability postalcode-validation-tru... 2772 Flag_3发布于 2022-09-22 13:06:22
这对我来说很管用:
df = pd.DataFrame({'Category': {0: 'onboarding segment-confirmation-unexpected-input origin',
1: 'onboarding segment-confirmation-unexpected-input view',
2: 'product-availability cpf-request-unexpected-input origin',
3: 'product-availability postalcode-validation-true-unexpected-input origin',
4: 'product-availability postalcode-validation-true-unexpected-input view'},
'UserId': {0: 9090, 1: 4545, 2: 3266, 3: 2894, 4: 2772}})
#I chose 40 but you can change it to fit your needs depending on the data
df['temp']=df['Category'].str[:40]
df['Flag'] = df.groupby(['temp'], sort=False).ngroup() + 1
df['Flag'] ='Flag_' + df['Flag'].astype(str)https://stackoverflow.com/questions/73814929
复制相似问题