今天小编继续给大家送上优秀可视化教程推文,同时,我们也提供练习数据哦~本期的重点是是关于桑葚图(Sankey Diagram),中文名字叫法不同,我们还是以英文名称为主哈,本期内容主要包括以下几点:
「桑基图(Sankey Diagram)」,即桑基能量分流图,也叫桑基能量平衡图。它是一种特定类型的流程图,图中延伸的分支的宽度对应数据流量的大小,通常应用于能源、材料成分、金融、零售等数据的可视化分析(来源于网络)。接下来小编就带你如何使用R轻松绘制桑基图。
得益于ggplot2 强大的绘制功能,在R绘图生态系统中也存在一个包专门用于绘制桑基图-ggalluvial,也是ggplot2的拓展包,大家使用起来也十分方便。针对不同图表绘制,第一步就是转化数据,而 ggalluvial包也为我们贴心的准备数据转换函数 is_alluvia_form()、is_alluvia_form()、to_lodes_form()和to_alluvia_form(),这里小编用的比较多的就是is_alluvia_form()和to_lodes_form()两个函数。接下来我们具体使用具体数据操作。(更多详细内容,可以查看ggalluvial官网:http://corybrunson.github.io/ggalluvial/articles/ggalluvial.html)
ggalluvial可为两种数据绘制桑基图,其中一种就是宽数据,这里使用官网提供的例子来看一下宽数据绘图过程:
library(ggalluvial)
library(tidyverse)
library(ggtext)
library(hrbrthemes)
library(wesanderson)
library(LaCroixColoR)
#导入数据
head(as.data.frame(UCBAdmissions), n = 12)
## Admit Gender Dept Freq
## 1 Admitted Male A 512
## 2 Rejected Male A 313
## 3 Admitted Female A 89
## 4 Rejected Female A 19
## 5 Admitted Male B 353
## 6 Rejected Male B 207
## 7 Admitted Female B 17
## 8 Rejected Female B 8
## 9 Admitted Male C 120
## 10 Rejected Male C 205
## 11 Admitted Female C 202
## 12 Rejected Female C 391
可以看到,这就是宽数据了,通常在使用ggplot2绘图,我们都需要将其转换成长数据绘图,但这里,我们则可以使用如下代码检测即可:
is_alluvia_form(as.data.frame(UCBAdmissions), axes = 1:3, silent = TRUE)
## [1] TRUE
ggplot(as.data.frame(UCBAdmissions),
aes(y = Freq, axis1 = Gender, axis2 = Dept)) +
geom_alluvium(aes(fill = Admit), width = 1/12) +
geom_stratum(width = 1/12, fill = "black", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("Gender", "Dept"), expand = c(.05, .05)) +
scale_fill_brewer(type = "qual", palette = "Set1") +
hrbrthemes::theme_ipsum(base_family = "Roboto Condensed") +
ggtitle("UC Berkeley admissions and rejections, by sex and department")
geom_alluvium example
宽数据绘制虽然比较容易理解,但对其定制化修改则有些麻烦,这时候我们可以使用 长数据 进行绘制。
data(majors)
head(majors,8)
long data
接下来的绘图则与通常的ggplot2绘制相似,如下:
data(majors)
majors$curriculum <- as.factor(majors$curriculum)
ggplot(majors,
aes(x = semester, stratum = curriculum, alluvium = student,
fill = curriculum, label = curriculum)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_fill_manual(values = lacroix_palette(type = "paired"))+
ggalluvial::geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "black") +
geom_stratum() +
hrbrthemes::theme_ipsum(base_family = "Roboto Condensed")
long data charts example
介绍完长短数据绘图之后,我们使用具体的数据进行练习,获取数据及处理代码如下:
# 读取Excel数据包
library(readxl)
df2<-read_excel("ggalluvial_test_data.xlsx")
#检查是否符合绘制需求
is_alluvia_form(df2, tidyselect::starts_with("d"))
## TRUE
# 转换成长数据进行绘图
df2_pro <- to_lodes_form(df2,axes = 4:14,key = "x", value = "stratum",id = "alluvium")
# 由于会出现NA值,我们需将NA去除
df2_pro_nona <- df2_pro %>% filter(!(stratum %in% NA))
# 开始可视化绘制
flow02 <- ggplot(df2_pro_nona,
aes(x = x, stratum = stratum, alluvium = alluvium,
fill = stratum, label = stratum)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_fill_manual(values = lacroix_palette(type = "paired"),name="Status")+
ggalluvial::geom_flow(color = "black") +
geom_stratum(width = .4) +
labs(x="",y="",
title = "Example of <span style='color:#D20F26'>ggalluvial charts makes</span>",
subtitle = "processed charts with <span style='color:#1A73E8'>geom_flow()</span>",
caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>") +
hrbrthemes::theme_ipsum(base_family = "Roboto Condensed") +
#添加文本信息
geom_text(stat = "stratum",size=3,color="black",fontface = "plain") +
theme(
plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
size = 25, margin = margin(t = 1, b = 12)),
plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=20),
plot.caption = element_markdown(face = 'bold',size = 15),
# 去除刻度线和网格线和图形背景颜色
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill="#FFFFF3"),
panel.border = element_rect(fill = NA,colour = "#FFFFF3"),
plot.background = element_rect(fill="#FFFFF3",colour = "#FFFFF3"),
# 定制化图例操作
legend.direction = "vertical",
legend.position = c(.95,.7),
legend.key.width = unit(1.1, "lines"),
legend.key.height = unit(1.1, "lines"),
legend.spacing.x = unit(0.2, 'cm'),
legend.background = element_rect(fill = NA,colour=NA),
legend.title = element_text(size = 13, face = "bold"),
legend.text = element_text(size = 11)
)
实例演示
这里小伙伴们可能看到,我是一步一步进行展示的(数据分析习惯了,刚开始学习的同学也可以这样操作了),有基础的同学可以使用管道(%>%) 功能。
在选择使用ggalluvial包进行绘制之前,我也查阅了其他绘图工具,如R-easyalluvial和R-networkD3等包,下面我给出其官网的部分可视化结果供大家参考。
suppressPackageStartupMessages( require(parcats) )
p = alluvial_wide(mtcars2, max_variables = 5)
parcats(p, marginal_histograms = TRUE, data_input = mtcars2)
R-easyalluvial example 可交互
URL <- paste0(
"https://cdn.rawgit.com/christophergandrud/networkD3/",
"master/JSONdata/energy.json")
Energy <- jsonlite::fromJSON(URL)
# Plot
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
units = "TWh", fontSize = 12, nodeWidth = 30)
R-networkD3 example 可交互
可以看出,以上两个偏网页交互式集成居多哈~
以上就是本期的可视化教程推文了,感兴趣的小伙伴可以获取源数据进行练习哦~~
参考链接:http://corybrunson.github.io/ggalluvial/articles/ggalluvial.html
https://erblast.github.io/easyalluvial/
http://christophergandrud.github.io/networkD3/