问Pandas DataFrame:在多列条件下对数据帧进行编程行拆分
EN

Stack Overflow用户

提问于 2018-12-11 08:29:42

回答 1查看 173关注 0票数 0

上下文

我正在处理一个DataFrame df，它有很多填充了数值的列

df
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2
150.0        |     3.14    |  ...  | 1.008

通过另一种方式，我有一个列的list_cols：

list_cols = ['lorem ipsum', 'dolor sic', ... ]  # arbitrary length, of course len(list_cols ) <= len(df.columns), and contains valid columns of my df

我想获取两个数据帧：

包含至少一个list_cols (对应于OR )的value < 0的所有行的

1。让我们将其称为与dataframe的剩余部分相对应的negative_values_matches
1，让我们将其称为positive_values_matches

预期结果示例

对于list_cols = ['lorem ipsum', 'dolor sic']，我将获得list_cols中至少有1个值严格为负的数据帧：

negative_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2


positive_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
150.0        |     3.14    |  ...  | 1.008

我不想写这样的代码：

negative_values_matches = df[ (criterion1 | criterion2 | ... | criterionn)]
positive_values_matches = df[~(criterion1 | criterion2 | ... | criterionn)]

(其中criterionk是列k的布尔值，例如：(df[col_k]>=0)，此处使用括号，因为它是Pandas语法)

我们的想法是使用programmatic方法。我主要查找布尔值的数组，这样我就可以使用布尔值索引(参见Pandas documentation)。

据我所知，这些帖子并不完全是我所说的：

Filtering DataFrame on multiple conditions in Pandas
Drop rows on multiple conditions in pandas dataframe
Pandas: np.where with multiple conditions on dataframes
Pandas DataFrame : How to select rows on multiple conditions?这个离我要找的更近一点。但是，它依赖于生成一个可能不适用于“奇异”列名(空格)的字符串(或者至少我不知道如何做到这一点)

我不知道如何使用OR操作符将DataFrame上的布尔值完全链接起来，并获得正确的行拆分。

我能做些什么？

python

pandas

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-12-11 08:29:42

经过几次尝试，我终于实现了我的目标。

代码如下：

import Pandas
import numpy
# assume dataframe exists
df = ...
# initiliaze an array of False, matching df number of rows
resulting_bools = numpy.zeros((1, len(df.index)), dtype=bool)

for col in list_cols:
    # obtain array of booleans for given column and boolean condition for [row, column] value
    criterion = df[col].map(lambda x: x < 0) # same condition for each column, different conditions would have been more difficult (for me)

     # perform cumulative boolean evaluation accross columns
    resulting_bools |= criterion

# use the array of booleans to build the required df
negative_values_matches = df[ resulting_bools].copy() # use .copy() to avoid further possible warnings from Pandas depending on what you do with your data frame
positive_values_matches = df[~resulting_bools].copy()

这样，我成功地获得了2个数据帧：

对于list_cols

1中的列中至少有1列的所有行，以及所有其他行( list_col)

中的每列的值均为>= 0)，

( False上的数组初始化取决于布尔值选择)

注:该方法可以与multiple conditions on dataframes结合使用。有待确认。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53715731

复制

相似问题

问Pandas DataFrame:在多列条件下对数据帧进行编程行拆分
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas DataFrame:在多列条件下对数据帧进行编程行拆分EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas DataFrame:在多列条件下对数据帧进行编程行拆分
EN