问用于在窗口中迭代和更改Pandas DataFrame的每一列的矢量化Python代码
EN

Stack Overflow用户

提问于 2018-09-24 03:05:43

回答 1查看 523关注 0票数 0

我有一个由1和0组成的数据帧。我使用循环遍历每一列。如果我在一次迭代中得到一个1，我应该将它保存在列中。但是如果在这之后的下一个n位置中有一些1，我应该把它们变成0。然后重复相同的操作，直到列的末尾，然后在每一列上重复所有这些操作。

有没有可能摆脱循环，在pandas/numpy中使用dataframe/矩阵/数组操作来向量化所有的东西？那我该怎么做呢？n可能是2到100之间的任何地方。

我尝试了这个函数，但失败了，它只保留1，如果它们之间至少有n 0，这显然不是我需要的：

def clear_window(df, n):

    # create buffer of size n
    pad = pd.DataFrame(np.zeros([n, df.shape[1]]),
                       columns=df.columns)
    padded_df = pd.concat([pad, df])

    # compute rolling sum and cut off the buffer
    roll = (padded_df
            .rolling(n+1)
            .sum()
            .iloc[n:, :]
           )

    # delete ones where rolling sum is above 1 or below -1
    result = df * ((roll == 1.0) | (roll == -1.0)).astype(int)

    return result

python

pandas

numpy

dataframe

vectorization

回答 1

Stack Overflow用户

发布于 2018-09-27 06:09:18

如果你找不到向量化的方法，Numba会帮你解决这些顺序循环问题。

这段代码遍历每一行，查找目标值。当找到目标值(1)时，将接下来的n行设置为填充值(0)。增加搜索行索引以跳过填充行，并开始下一次搜索。

from numba import jit

@jit(nopython=True)
def find_and_fill(arr, span, tgt_val=1, fill_val=0):
    start_idx = 0
    end_idx = arr.size
    while start_idx < end_idx:
        if arr[start_idx] == tgt_val:
            arr[start_idx + 1 : start_idx + 1 + span] = fill_val
            start_idx = start_idx + 1 + span
        else:
            start_idx = start_idx + 1
    return arr

df2 = df.copy()
# get the dataframe values into a numpy array
a = df2.values

# transpose and run the function for each column of the dataframe
for col in a.T:
    # fill span is set to 6 in this example
    col = find_and_fill(col, 6)

# assign the array back to the dataframe
df2[list(df2.columns)] = a

# df2 now contains the result values

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52469357

复制

相似问题

问用于在窗口中迭代和更改Pandas DataFrame的每一列的矢量化Python代码
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用于在窗口中迭代和更改Pandas DataFrame的每一列的矢量化Python代码EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用于在窗口中迭代和更改Pandas DataFrame的每一列的矢量化Python代码
EN