大家好,我是才哥。
之前咱们介绍过Pandas
可视化图表的绘制《『数据可视化』一文掌握Pandas可视化图表》,不过它是依托于matplotlib
,因此无法进行交互。但其实,在Pandas
的0.25.0
版本之后,提供了一些其他绘图后端,其中就有我们今天要演示的主角基于Bokeh
!
Starting in 0.25 pandas can be extended with third-party plotting backends. The main idea is letting users select a plotting backend different than the provided one based on Matplotlib.
目录:
0. 环境准备
我们用到的是pandas-bokeh
,它为Pandas
、GeoPandas
和Pyspark
的DataFrames
提供了Bokeh
绘图后端,类似于Pandas
已经存在的可视化功能。导入库后,在DataFrames
和Series
上就新添加了一个绘图方法plot_bokeh()
。
安装第三方库
pip install pandas-bokeh
or conda:
conda install -c patrikhlobil pandas-bokeh
如果你是使用jupyter notebook,可以这样让其直接显示
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()
同样如果输出是html文件,则可以用以下方式处理
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_file("Interactive Plot.html")
当然在使用的时候,记得先设置 绘制后端为pandas_bokeh
import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')
目前这个绘图方式支持的可视化图表有以下几类:
1. 折线图
交互元素含有以下几种:
先看一个简单案例:
import numpy as np
np.random.seed(42)
df = pd.DataFrame({"谷歌": np.random.randn(1000)+0.2,
"苹果": np.random.randn(1000)+0.17},
index=pd.date_range('1/1/2020', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(kind="line") #等价于 df.plot_bokeh.line()
折线图
在绘制过程中,我们还可以设置很多参数,用来设置可视化图表的一些功能:
df.plot_bokeh.line(
figsize=(800, 450), # 图的宽度和高度
y="苹果", # y的值,这里选择的是df数据中的苹果列
title="苹果", # 标题
xlabel="Date", # x轴标题
ylabel="Stock price [$]", # y轴标题
yticks=[0, 100, 200, 300, 400], # y轴刻度值
ylim=(0, 400), # y轴区间
toolbar_location=None, # 工具栏(取消)
colormap=["red", "blue"], # 颜色
hovertool_string=r"""<img
src='https://dss0.bdstatic.com/-0U0bnSm1A5BphGlnYG/tam-ogel/920152b13571a9a38f7f3c98ec5a6b3f_122_122.jpg'
height="42" alt="@imgs" width="42"
style="float: left; margin: 0px 15px 15px 0px;"
border="2"></img> Apple
<h4> Stock Price: </h4> @{苹果}""", # 悬停工具显示形式(支持css)
panning=False, # 禁止平移
zooming=False) # 禁止缩放
对于折线图来说,还有一些特殊的参数,它们是:
df.plot_bokeh.line(
figsize=(800, 450),
title="苹果 vs 谷歌",
xlabel="Date",
ylabel="价格 [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 100),
xlim=("2020-01-01", "2020-02-01"),
colormap=["red", "blue"],
plot_data_points=True, # 是否线上数据点
plot_data_points_size=10, # 数据点的大小
marker="square") # 数据点的类型
启动范围工具滚动条的折线图
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2020', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()
df.plot_bokeh(rangetool=True)
带有范围滚动条的折线图
2. 柱状图(条形图)
柱状图没有特殊的关键字参数,一般分为柱状图和堆叠柱状图,默认是柱状图。
data = {
'fruits':
['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
'2015': [2, 1, 4, 3, 2, 4],
'2016': [5, 3, 3, 2, 4, 6],
'2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6)
柱状图
我们可以通过参数stacked
来绘制堆叠柱状图:
p_stacked_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
stacked=True, # 堆叠柱状图
alpha=0.6)
默认情况下,x轴的值就是数据索引列的值,我们也可通过指定参数x来设置x轴;另外,我们还可以通过关键字kind="barh"
或访问器plot_bokeh.barh
来进行条形图绘制。
#Reset index, such that "fruits" is now a column of the DataFrame:
df.reset_index(inplace=True)
#Create horizontal bar (via kind keyword):
p_hbar = df.plot_bokeh(
kind="barh",
x="fruits",
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)
#Create stacked horizontal bar (via barh accessor):
p_stacked_hbar = df.plot_bokeh.barh(
x="fruits",
stacked=True,
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)
#Plot all barplot examples in a grid:
pandas_bokeh.plot_grid([[p_bar, p_stacked_bar],
[p_hbar, p_stacked_hbar]],
plot_width=450)
3. 散点图
散点图需要指定x和y,以下参数可选:
以下绘制表格和散点图:
# Load Iris Dataset:
df = pd.read_csv(
r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/iris/iris.csv"
)
df = df.sample(frac=1)
# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.models import ColumnDataSource
data_table = DataTable(
columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
source=ColumnDataSource(df),
height=300,
)
# Create Scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False,
)
# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_scatter]], plot_width=400, plot_height=350)
表格与散点图
我们还可以传递一些参数比如 散点的大小之类的(用某列的值)
#Change one value to clearly see the effect of the size keyword
df.loc[13, "sepal length (cm)"] = 15
#Make scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization with Size Keyword",
size="sepal length (cm)", # 散点大小
)
4. 点图
点图比较简单,直接调用pointplot
即可
import numpy as np
x = np.arange(-3, 3, 0.1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.point(
x="x",
xticks=range(-3, 4),
size=5,
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
marker="x")
点图
5. 阶梯图
阶梯图主要是需要设置其模式mode
,目前可供选择的是before
, after
和center
import numpy as np
x = np.arange(-3, 3, 1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
figsize=(800,300),
fontsize_title=30,
fontsize_label=25,
fontsize_ticks=15,
fontsize_legend=5,
)
df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
mode="after",
figsize=(800,300)
)
6. 饼图
这里我们用网上的一份自 2002 年以来德国所有联邦议院选举结果的数据集为例展示
df_pie = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/Bundestagswahl/Bundestagswahl.csv")
df_pie
Partei | 2002 | 2005 | 2009 | 2013 | 2017 | |
---|---|---|---|---|---|---|
0 | CDU/CSU | 38.5 | 35.2 | 33.8 | 41.5 | 32.9 |
1 | SPD | 38.5 | 34.2 | 23.0 | 25.7 | 20.5 |
2 | FDP | 7.4 | 9.8 | 14.6 | 4.8 | 10.7 |
3 | Grünen | 8.6 | 8.1 | 10.7 | 8.4 | 8.9 |
4 | Linke/PDS | 4.0 | 8.7 | 11.9 | 8.6 | 9.2 |
5 | AfD | 0.0 | 0.0 | 0.0 | 0.0 | 12.6 |
6 | Sonstige | 3.0 | 4.0 | 6.0 | 11.0 | 5.0 |
df_pie.plot_bokeh.pie(
x="Partei",
y="2017",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Election 2017",
)
饼图
如果我们想绘制全部的列(上图中我们绘制的是2017年的数据),则无需对y赋值,结果会嵌套显示在一个图中:
df_pie.plot_bokeh.pie(
x="Partei",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Elections [2002-2017]",
line_color="grey")
7. 直方图
在绘制直方图时,有不少参数可供选择:
import numpy as np
df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])
#Top-on-Top Histogram (Default):
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Top-on-Top)",
line_color="black")
#Side-by-Side Histogram (multiple bars share bin side-by-side) also accessible via
#kind="hist":
df_hist.plot_bokeh(
kind="hist",
bins=np.linspace(-5, 5, 41),
histogram_type="sidebyside",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Side-by-Side)",
line_color="black")
#Stacked histogram:
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
histogram_type="stacked",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Stacked)",
line_color="black")
Top-on-Top Histogram (Default)
Side-by-Side Histogram
Stacked histogram
同时,对于直方图我们还有更高级的参数:
p_hist = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed)",
show_average=True,
xlim=(-4, 6),
ylim=(0, 30),
show_figure=False)
p_hist_cum = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
cumulative=True,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed & cumulative)",
show_figure=False)
pandas_bokeh.plot_grid([[p_hist, p_hist_cum]], plot_width=450, plot_height=300) # 仪表盘输出方式
8. 面积图
面积图嘛,提供两种:堆叠或者在彼此之上绘制
# 我们用 之前饼图里的数据来绘制
df_energy = df_pie
df_energy.plot_bokeh.area(
x="Partei",
stacked=True,
legend="top_right",
colormap=["brown", "orange", "black", "grey", "blue"],
title="标题",
ylabel="Y轴",
)
堆叠面积图
df_energy.plot_bokeh.area(
x="Partei",
stacked=False,
legend="top_right",
colormap=["brown", "orange", "black", "grey", "blue"],
title="标题",
ylabel="Y轴",
)
非堆叠面积图
当我们使用normed关键字对图进行规范时,还可以看到这种效果:
df_energy.plot_bokeh.area(
x="Partei",
stacked=True,
normed=100, # 规范满100(可看大致占比)
legend="top_right",
colormap=["brown", "orange", "black", "grey", "blue"],
title="标题",
ylabel="Y轴",
)
9. 地图
关于地图绘制部分内容较多,这里我们不做详细介绍,后续出个专题讲解!
plot_bokeh.map
函数,参数x和y分别对应经纬度坐标,我们以全球超过100万居民所有城市为例简单展示一下:
df_mapplot = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/populated_places.csv")
df_mapplot.head()
name | pop_max | latitude | longitude | |
---|---|---|---|---|
0 | Mesa | 1085394 | 33.423915 | -111.736084 |
1 | Sharjah | 1103027 | 25.371383 | 55.406478 |
2 | Changwon | 1081499 | 35.219102 | 128.583562 |
3 | Sheffield | 1292900 | 53.366677 | -1.499997 |
4 | Abbottabad | 1183647 | 34.149503 | 73.199501 |
df_mapplot["size"] = df_mapplot["pop_max"] / 1000000
df_mapplot.plot_bokeh.map(
x="longitude",
y="latitude",
hovertool_string="""<h2> @{name} </h2>
<h3> Population: @{pop_max} </h3>""",
tile_provider="STAMEN_TERRAIN_RETINA",
size="size",
figsize=(900, 600),
title="World cities with more than 1.000.000 inhabitants")
map
10. 其他
仪表盘输出,通过pandas_bokeh.plot_grid
来设计仪表盘(大家具体看这行代码的逻辑)
import pandas as pd
import numpy as np
import pandas_bokeh
pandas_bokeh.output_notebook()
#Barplot:
data = {
'fruits':
['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
'2015': [2, 1, 4, 3, 2, 4],
'2016': [5, 3, 3, 2, 4, 6],
'2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh(
kind="bar",
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
show_figure=False)
#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
"Google": np.random.randn(1000) + 0.2,
"Apple": np.random.randn(1000) + 0.17
},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
p_line = df.plot_bokeh(
kind="line",
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
colormap=["red", "blue"],
show_figure=False)
#Scatterplot:
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris["data"])
df.columns = iris["feature_names"]
df["species"] = iris["target"]
df["species"] = df["species"].map(dict(zip(range(3), iris["target_names"])))
p_scatter = df.plot_bokeh(
kind="scatter",
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False)
#Histogram:
df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])
p_hist = df_hist.plot_bokeh(
kind="hist",
bins=np.arange(-6, 6.5, 0.5),
vertical_xlabel=True,
normed=100,
hovertool=False,
title="Normal distributions",
show_figure=False)
#Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_line, p_bar],
[p_scatter, p_hist]], plot_width=450)
仪表盘输出
又或者这样:
p_line.plot_width = 900
p_hist.plot_width = 900
layout = pandas_bokeh.column(p_line,
pandas_bokeh.row(p_scatter, p_bar),
p_hist) # 指定每行显示的内容
pandas_bokeh.show(layout)
替代仪表板布局
以上就是本次全部内容,通过这部分的学习,我们发现Pandas
除了结合matplotlib
常规绘图外,还可以通过bokeh
绘图后端快速绘制可交互的图表,用起来非常方便。
当然,如果想更深入了解或者定制化这些可视化图表,可能需要对bokeh
有更多的了解,这块查阅官网资料即可!