我与R一起工作,但我还没有遇到这样的情况:我必须为整个数据帧应用一个比较操作符。当比较Pandas dataframe和R数据帧时,我发现在Python和R中,dfdf >0的结果是不同的。
在Python中,dfdf >0的结果是另一个DataFrame,而在R中,结果是一个向量。
Python代码:
from numpy.random import randn
np.random.seed(101)
df = pd.DataFrame(randn(5,5), ['A', 'B', 'C', 'D', 'E'], ['V', 'W', 'X' , 'Y', 'Z'])
df[df > 0]
V W X Y Z
A 2.706849839 0.628132709 0.907969446 0.503825754 0.651117948
B NaN NaN 0.605965349 NaN 0.740122057
C 0.528813494 NaN 0.188695309 NaN NaN
D 0.955056509 0.190794322 1.978757324 2.60596728 0.683508886
E 0.302665449 1.693722925 NaN NaN NaN
R代码:
> set.seed(101)
> df = data.frame(matrix(rnorm(25), 5, 5))
> df
X1 X2 X3 X4 X5
1 -0.3260365 1.1739663 0.5264481 -0.1933380 -0.1637557
2 0.5524619 0.6187899 -0.7948444 -0.8497547 0.7085221
3 -0.6749438 -0.1127343 1.4277555 0.0584655 -0.2679805
4 0.2143595 0.9170283 -1.4668197 -0.8176704 -1.4639218
5 0.3107692 -0.2232594 -0.2366834 -2.0503078 0.7444358
> df[df > 0]
[1] 0.5524619 0.2143595 0.3107692 1.1739663 0.6187899 0.9170283 0.5264481 1.4277555 0.0584655 0.7085221 0.7444358
>
有人能让我知道R和Python输出结果的方式有什么意义吗?此外,在R中,有一种方法可以得到命令dfdf >0的结果数据
发布于 2020-06-26 00:11:47
对于“重要意义”部分,我不太清楚,但如果您希望获得与R中的Python相同的输出,则可以将小于0的数字赋值为NaN
。
set.seed(101)
df = data.frame(matrix(rnorm(25), 5, 5))
df[df <= 0] <- NaN
df
# X1 X2 X3 X4 X5
#1 NaN 1.1739663 0.5264481 NaN NaN
#2 0.5524619 0.6187899 NaN NaN 0.7085221
#3 NaN NaN 1.4277555 0.0584655 NaN
#4 0.2143595 0.9170283 NaN NaN NaN
#5 0.3107692 NaN NaN NaN 0.7444358
https://stackoverflow.com/questions/62590483
复制