我注意到从0.25版本开始,当使用pandas.DataFrame.query时,列名中允许有空格,即那些列名应该用反引号括起来。例如:
import pandas as pd
df = pd.DataFrame({'a b':[1,0,1,1,0,0],
'c d':[1,0,1,1,0,0],
'e f':[0,0,0,0,1,0]})
print(df)
a b c d e f
0 1 1 0
1 0 0 0
2 1 1 0
3 1 1 0
4 0 0 1
5 0 0 0
q = "(`a b` == 1) | (`c d` == 1) | (`e f` == 1)"
df = df.query(q)
print (df)
a b c d e f
0 1 1 0
2 1 1 0
3 1 1 0
4 0 0 1它工作得很好,但我的列可能包含与号、加号或其他特殊字符。它们目前似乎不受支持:
df2 = pd.DataFrame({'a b+':[1,0,1,1,0,0],
'c | d':[1,0,1,1,0,0],
'e & f':[0,0,0,0,1,0]})
print(df2)
a b+ c | d e & f
0 1 1 0
1 0 0 0
2 1 1 0
3 1 1 0
4 0 0 1
5 0 0 0
q = "(`a b+` == 1) | (`c | d` == 1) | (`e & f` == 1)"
df2 = df2.query(q)
print (df2)最后一次打印给我一个错误:
Traceback (most recent call last):
File "C:\Users\XXX\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\computation\scope.py", line 188, in resolve
return self.resolvers[key]
File "C:\Users\XXX\AppData\Local\Programs\Python\Python37\lib\collections\__init__.py", line 914, in __getitem__
return self.__missing__(key) # support subclasses that define __missing__
File "C:\Users\XXX\AppData\Local\Programs\Python\Python37\lib\collections\__init__.py", line 906, in __missing__
raise KeyError(key)
KeyError: 'a_b_'是否有针对此问题的解决方法,或为数据帧构建筛选条件的不同方法?我想定义一个以字符串形式返回动态过滤器的函数。
发布于 2019-08-13 20:20:55
实际上,pandas支持列名称中的unicode符号,它应该是有效的。请尝试此方法:
set1 = df2['a b+'] == 1
set2 df2['c | d'] == 1
print(df2[set1 | set2])在你的测试数据中为我工作
https://stackoverflow.com/questions/57476664
复制相似问题