有没有办法做到这一点?我似乎不是一种简单的方法来连接pandas系列和绘制CDF。
发布于 2014-10-16 07:57:58
我相信您正在寻找的功能是在Series对象的hist方法中,它包装了matplotlib中的hist()函数
以下是相关文档
In [10]: import matplotlib.pyplot as plt
In [11]: plt.hist?
...
Plot a histogram.
Compute and draw the histogram of *x*. The return value is a
tuple (*n*, *bins*, *patches*) or ([*n0*, *n1*, ...], *bins*,
[*patches0*, *patches1*,...]) if the input contains multiple
data.
...
cumulative : boolean, optional, default : False
If `True`, then a histogram is computed where each bin gives the
counts in that bin plus all bins for smaller values. The last bin
gives the total number of datapoints. If `normed` is also `True`
then the histogram is normalized such that the last bin equals 1.
If `cumulative` evaluates to less than 0 (e.g., -1), the direction
of accumulation is reversed. In this case, if `normed` is also
`True`, then the histogram is normalized such that the first bin
equals 1.
...
例如
In [12]: import pandas as pd
In [13]: import numpy as np
In [14]: ser = pd.Series(np.random.normal(size=1000))
In [15]: ser.hist(cumulative=True, density=1, bins=100)
Out[15]: <matplotlib.axes.AxesSubplot at 0x11469a590>
In [16]: plt.show()
发布于 2015-08-13 00:57:35
CDF或累积分布函数图基本上是一个在X轴上具有排序值,在Y轴上具有累积分布的图形。因此,我将创建一个新的序列,将排序后的值作为索引,将累积分布作为值。
首先创建一个示例系列:
import pandas as pd
import numpy as np
ser = pd.Series(np.random.normal(size=100))
对序列进行排序:
ser = ser.sort_values()
现在,在继续之前,再次追加最后一个(也是最大的)值。为了获得无偏的CDF,这一步对于小样本尤其重要:
ser[len(ser)] = ser.iloc[-1]
创建一个新系列,将排序后的值作为索引,将累积分布作为值:
cum_dist = np.linspace(0.,1.,len(ser))
ser_cdf = pd.Series(cum_dist, index=ser)
最后,将函数绘制为以下步骤:
ser_cdf.plot(drawstyle='steps')
发布于 2016-09-22 07:52:51
这是最简单的方法。
import pandas as pd
df = pd.Series([i for i in range(100)])
df.hist( cumulative = True )
https://stackoverflow.com/questions/25577352
复制相似问题