首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何以块形式计算数据帧值并将它们组合起来?

如何以块形式计算数据帧值并将它们组合起来?
EN

Stack Overflow用户
提问于 2022-03-13 05:49:32
回答 6查看 308关注 0票数 -1

我有这个数据:

代码语言:javascript
运行
复制
Metric  ProcId  TimeStamp               Value
CPU     proce_123   Mar-11-2022 11:00:00    1.4453125
CPU     proce_126   Mar-11-2022 11:00:00    0.058320373
CPU     proce_123   Mar-11-2022 11:00:00    0.095274389
CPU     proce_000   Mar-11-2022 11:00:00    0.019654088
CPU     proce_144   Mar-11-2022 11:00:00    0.019841269
CPU     proce_1     Mar-11-2022 11:00:00    0.234741792
CPU     proce_100   Mar-11-2022 11:00:00    5.32945776
CPU     proce_57777 Mar-11-2022 11:00:00    0.25390625
CPU     proce_0000  Mar-11-2022 11:00:00    0.019349845
CPU     proce_123   Mar-11-2022 11:00:00    0.019500781
CPU     proce_123   Mar-11-2022 11:00:00    2.32421875
CPU     proce_123   Mar-11-2022 11:00:00    68.3903656
CPU     proce_123   Mar-11-2022 11:00:00    0.057781201
CPU     proce_123   Mar-11-2022 11:00:00    0.416666627

这只是一个示例dataframe;实际的dataframe有数千行。我需要遍历这个数据块,ProdID列,我需要为每次迭代创建一个将这些ProdID组合在一起的字符串。

例如,给定块大小3,字符串需要如下所示:

代码语言:javascript
运行
复制
proce_000%22%2C%2proce_144%22%2C%2proce_1%22%29
proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%29
proce_123%22%2C%2proce_126%22%2C%2proce_111%22%29

请注意,在第三块之后,我们需要添加%22%29。在第一个广告之后,我们需要添加%22%2C%2

我可以做一些这样的事情来打印出这些片段:

代码语言:javascript
运行
复制
n = 3 #size of chunks
chunks = [] #list of chunks

for i in range(0, len(id), n): 
    chunks.append(id[i:i + n])

我不知道如何将这3项合并在一个字符串中,并在最后添加其他字符串。

EN

回答 6

Stack Overflow用户

回答已采纳

发布于 2022-03-16 10:22:29

避免在for循环中遍历数据帧。如果您使用groupby__、merge__、shift和其他面向数组的numpy或大熊猫操作的组合,您的性能几乎肯定会更差。

通过对索引的整数除法将数据块ids从数据中提取出来(假设增量索引值)

代码语言:javascript
运行
复制
chunk_size = 3
df['ChunkId'] = df.index // chunk_size

向每个ProcId添加后缀以创建一个新的列ProcEnds,然后在每个组中加入这些列。

代码语言:javascript
运行
复制
df['ProcEnds'] =  (df.ProcId + '%22%2C%2').where(
  df.index % chunk_size != chunk_size - 1, 
  df.ProcId + '%22%29')
# note DataFrame.where replaces values with other when cond is False

df['ChunkString'] = df.groupby('ChunkId').ProcEnds.transform(lambda x: x.str.cat())

可选地,删除ChunkId & ProcEnds列以获得只有附加列ChunkString的输出

代码语言:javascript
运行
复制
df = df.drop(columns=['ChunkId', 'ProcEnds'])

df现在输出:

代码语言:javascript
运行
复制
   Metric       ProcId           TimeStamp      Value                                           ChunkString
0     CPU    proce_123 2022-03-11 11:00:00   1.445312     proce_123%22%2C%2proce_126%22%2C%2proce_123%22%29
1     CPU    proce_126 2022-03-11 11:00:00   0.058320     proce_123%22%2C%2proce_126%22%2C%2proce_123%22%29
2     CPU    proce_123 2022-03-11 11:00:00   0.095274     proce_123%22%2C%2proce_126%22%2C%2proce_123%22%29
3     CPU    proce_000 2022-03-11 11:00:00   0.019654       proce_000%22%2C%2proce_144%22%2C%2proce_1%22%29
4     CPU    proce_144 2022-03-11 11:00:00   0.019841       proce_000%22%2C%2proce_144%22%2C%2proce_1%22%29
5     CPU      proce_1 2022-03-11 11:00:00   0.234742       proce_000%22%2C%2proce_144%22%2C%2proce_1%22%29
6     CPU    proce_100 2022-03-11 11:00:00   5.329458  proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%29
7     CPU  proce_57777 2022-03-11 11:00:00   0.253906  proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%29
8     CPU   proce_0000 2022-03-11 11:00:00   0.019350  proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%29
9     CPU    proce_123 2022-03-11 11:00:00   0.019501     proce_123%22%2C%2proce_123%22%2C%2proce_123%22%29
10    CPU    proce_123 2022-03-11 11:00:00   2.324219     proce_123%22%2C%2proce_123%22%2C%2proce_123%22%29
11    CPU    proce_123 2022-03-11 11:00:00  68.390366     proce_123%22%2C%2proce_123%22%2C%2proce_123%22%29
12    CPU    proce_123 2022-03-11 11:00:00   0.057781                    proce_123%22%2C%2proce_123%22%2C%2
13    CPU    proce_123 2022-03-11 11:00:00   0.416667                    proce_123%22%2C%2proce_123%22%2C%2

更新

google笔记本显示带有示例数据https://colab.research.google.com/drive/1f9ZHXE2ATZXD2qWsoATxEWABIBt0tMRN?usp=sharing的输出

更新2

执行部分问:

快速提问。我们能根据df‘度量’来分组吗?例如,它将是CPU,内存。我需要基于CPU或内存的ChunkString吗?

要在每个度量组中应用此转换,最简单的方法是将转换逻辑包含在函数中并应用于数据。

需要特别注意保留原来的索引。

代码语言:javascript
运行
复制
def transform(frame):
  _df = frame.reset_index(drop=True)
  _df['ChunkId'] = _df.index // chunk_size
  _df['ProcEnds'] =  (_df.ProcId + '%22%2C%2').where(
    _df.index % chunk_size != chunk_size - 1, 
    _df.ProcId + '%22%29')
  _df['ChunkString'] = _df.groupby('ChunkId').ProcEnds.transform(lambda x: x.str.cat())
  return _df.drop(columns=['ChunkId', 'ProcEnds'])
idx = df.index
df.groupby('Metric').apply(transform).set_index(idx)

产生与先前相同的输出,为简洁而省略。

票数 2
EN

Stack Overflow用户

发布于 2022-03-17 00:49:13

您可以使用Python整数除法(//)将索引形成N的组:

代码语言:javascript
运行
复制
N = 3
df['ChunkString'] = df.groupby(df.index//N)['ProcId'].transform(lambda x: '%22%2C%2'.join(x.tolist() + ['']*(N-len(x))) + ('%22%29' if len(x) == N else ''))

备注:

  • x.tolist() + ['']*(N-len(x))只将x转换为列表,并使用空项将其填充,直到到达长度N为止。

输出

代码语言:javascript
运行
复制
>>> df
   Metric       ProcId           TimeStamp      Value                                           ChunkString
0     CPU    proce_123 2022-03-11 11:00:00   1.445312     proce_123%22%2C%2proce_126%22%2C%2proce_123%22%29
1     CPU    proce_126 2022-03-11 11:00:00   0.058320     proce_123%22%2C%2proce_126%22%2C%2proce_123%22%29
2     CPU    proce_123 2022-03-11 11:00:00   0.095274     proce_123%22%2C%2proce_126%22%2C%2proce_123%22%29
3     CPU    proce_000 2022-03-11 11:00:00   0.019654       proce_000%22%2C%2proce_144%22%2C%2proce_1%22%29
4     CPU    proce_144 2022-03-11 11:00:00   0.019841       proce_000%22%2C%2proce_144%22%2C%2proce_1%22%29
5     CPU      proce_1 2022-03-11 11:00:00   0.234742       proce_000%22%2C%2proce_144%22%2C%2proce_1%22%29
6     CPU    proce_100 2022-03-11 11:00:00   5.329458  proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%29
7     CPU  proce_57777 2022-03-11 11:00:00   0.253906  proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%29
8     CPU   proce_0000 2022-03-11 11:00:00   0.019350  proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%29
9     CPU    proce_123 2022-03-11 11:00:00   0.019501     proce_123%22%2C%2proce_123%22%2C%2proce_123%22%29
10    CPU    proce_123 2022-03-11 11:00:00   2.324219     proce_123%22%2C%2proce_123%22%2C%2proce_123%22%29
11    CPU    proce_123 2022-03-11 11:00:00  68.390366     proce_123%22%2C%2proce_123%22%2C%2proce_123%22%29
12    CPU    proce_123 2022-03-11 11:00:00   0.057781                    proce_123%22%2C%2proce_123%22%2C%2
13    CPU    proce_123 2022-03-11 11:00:00   0.416667                    proce_123%22%2C%2proce_123%22%2C%2

N = 5

代码语言:javascript
运行
复制
>>> df
   Metric       ProcId           TimeStamp      Value                                                                           ChunkString
0     CPU    proce_123 2022-03-11 11:00:00   1.445312   proce_123%22%2C%2proce_126%22%2C%2proce_123%22%2C%2proce_000%22%2C%2proce_144%22%29
1     CPU    proce_126 2022-03-11 11:00:00   0.058320   proce_123%22%2C%2proce_126%22%2C%2proce_123%22%2C%2proce_000%22%2C%2proce_144%22%29
2     CPU    proce_123 2022-03-11 11:00:00   0.095274   proce_123%22%2C%2proce_126%22%2C%2proce_123%22%2C%2proce_000%22%2C%2proce_144%22%29
3     CPU    proce_000 2022-03-11 11:00:00   0.019654   proce_123%22%2C%2proce_126%22%2C%2proce_123%22%2C%2proce_000%22%2C%2proce_144%22%29
4     CPU    proce_144 2022-03-11 11:00:00   0.019841   proce_123%22%2C%2proce_126%22%2C%2proce_123%22%2C%2proce_000%22%2C%2proce_144%22%29
5     CPU      proce_1 2022-03-11 11:00:00   0.234742  proce_1%22%2C%2proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%2C%2proce_123%22%29
6     CPU    proce_100 2022-03-11 11:00:00   5.329458  proce_1%22%2C%2proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%2C%2proce_123%22%29
7     CPU  proce_57777 2022-03-11 11:00:00   0.253906  proce_1%22%2C%2proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%2C%2proce_123%22%29
8     CPU   proce_0000 2022-03-11 11:00:00   0.019350  proce_1%22%2C%2proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%2C%2proce_123%22%29
9     CPU    proce_123 2022-03-11 11:00:00   0.019501  proce_1%22%2C%2proce_100%22%2C%2proce_57777%22%2C%2proce_0000%22%2C%2proce_123%22%29
10    CPU    proce_123 2022-03-11 11:00:00   2.324219                  proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2
11    CPU    proce_123 2022-03-11 11:00:00  68.390366                  proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2
12    CPU    proce_123 2022-03-11 11:00:00   0.057781                  proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2
13    CPU    proce_123 2022-03-11 11:00:00   0.416667                  proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2proce_123%22%2C%2
票数 2
EN

Stack Overflow用户

发布于 2022-03-13 06:11:18

代码语言:javascript
运行
复制
chunk_size = 3
list_of_proc_ids = []
# First, generate a list of the procIds
for obj in range(0, len(id)):
    list_of_proc_ids.append(procId) # Not sure how you're appending this, guessing you use a slice on the string line?

final_str = ''
# Then enumerate through that list, adding a unique ending at every third
for index, obj in enumerate(list_of_proc_ids]:
    final_str += str(obj)
    if (index + 1) % chunk_size == 0: # Checks if divisible by 3, accounting for 0 index
        final_str += '%22%29'
    else:
        final_str += '%22%2C%2'
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71454195

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档