一篇关于国旗与奥运会奖牌的可视化笔记

作者简介Introduction

简介

本文主要介绍一个R包ggflags,可以用于绘制国旗。安装的话从Github上利用包devtools安装。

devtools::install_github("baptiste/ggflags")

下面看个小例子来介绍一下

数据集

library(ggflags)#load package

set.seed(1111)

#create the dataset

data

country=sample(c("ar", "us", "cn", "fr", "gb", "es"), 50 ,replace = TRUE),

stringsAsFactors=FALSE)

head(data)

x y country

-0.0865801-0.7055274 gb

1.3225244-0.5910791 fr

0.6397020-0.2796410 us

1.1747866-1.3209782 cn

0.11629030.5851085 gb

-2.93084640.0198323 ar

绘图

library(ggplot2)

ggplot(data, aes(x=x, y=y, country=country, size=x))+

geom_flag()+

scale_country()+

scale_size(range = c(0, 10))

国旗的图片是来自于EmojiOne数据集(https://github.com/eosrei/emojione-color-font),有兴趣的可以去看看了解一下。

题目有奥运会奖牌,所以接下来就可视化一下索契冬奥运会各国奖牌,本次用国旗与国家联系起来。

爬取数据

library(dplyr)

library(rvest)

medals %

html_nodes("table")%>%

html_table()

knitr::kable(head(medals))

Country GoldSilverBronzeTotal

Russia 1311933

United States971228

Norway 1151026

Canada 1010525

Netherlands87924

Germany 86519

爬取完数据之后进行清洗

数据清洗

本文重要的一环是将国家与国旗联系起来,因此首先要将国家名缩写弄出来,这就要用到countrycode(https://github.com/vincentarelbundock/countrycode)这个包了。

#install the package

install.packages("countrycode")

数据清洗

library(countrycode)

library(tidyr)

medals %

mutate(code=countrycode(Country, "country.name", "iso2c"))%>%

mutate(code=tolower(code))%>%

gather(medal_color, count, Gold, Silver, Bronze)%>%

mutate(medal_color=factor(medal_color, levels = c("Gold", "Silver", "Bronze")))%>%

drop_na(Country, code)

knitr::kable(head(medals))

Country Totalcodemedal_color count

Russia 33 ru Gold 13

United States 28 us Gold 9

Norway 26 no Gold 11

Canada 25 ca Gold 10

Netherlands 24 nl Gold 8

Germany 19 de Gold 8

绘图

由于国家数量太多,并且好多国家奖牌数基本为零,因此我们筛选一下:只绘制总奖牌数不小于5的国家。

medals%>%filter(Total>=5)%>%

ggplot(aes(x=reorder(Country, Total), y=count))+

geom_bar(stat = "identity", aes(fill=medal_color))+

geom_flag(aes(y=-2,country=code), size=10)+

theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7, vjust = 0.5))+

scale_fill_manual(values = c(

"Gold"="gold",

"Bronze" = "#cd7f32",

"Silver" = "#C0C0C0"

))+

scale_y_continuous(expand = c(0.1, 1))+

xlab("Country")+

ylab("Number of medals")+

theme_bw()+

theme(panel.grid = element_blank())+

theme(legend.justification = c(1, 0), legend.position = c(1, 0))+

theme(legend.title = element_blank())+

coord_flip()

sessionInfo()

## R version 3.4.1 (2017-06-30)

## Platform: x86_64-pc-linux-gnu (64-bit)

## Running under: Ubuntu 16.04.3 LTS

##

## Matrix products: default

## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0

## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

##

## locale:

## [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C

## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8

## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8

## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C

## [9] LC_ADDRESS=C LC_TELEPHONE=C

## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C

##

## attached base packages:

## [1] stats graphics grDevices utils datasets methods base

##

## other attached packages:

## [1] bindrcpp_0.2 tidyr_0.7.1 countrycode_0.19 rvest_0.3.2

## [5] xml2_1.1.1 dplyr_0.7.3 ggflags_0.0.1 ggplot2_2.2.1

##

## loaded via a namespace (and not attached):

## [1] Rcpp_0.12.12 compiler_3.4.1 plyr_1.8.4 highr_0.6

## [5] bindr_0.1 tools_3.4.1 digest_0.6.12 evaluate_0.10.1

## [9] tibble_1.3.4 gtable_0.2.0 pkgconfig_2.0.1 rlang_0.1.2

## [13] curl_2.8.1 yaml_2.1.14 stringr_1.2.0 httr_1.3.1

## [17] knitr_1.17 tidyselect_0.2.0 rprojroot_1.2 grid_3.4.1

## [21] glue_1.1.1 R6_2.2.2 XML_3.98-1.9 rmarkdown_1.6

## [25] purrr_0.2.3 selectr_0.3-1 magrittr_1.5 backports_1.1.0

## [29] scales_0.5.0 htmltools_0.3.6 assertthat_0.2.0 colorspace_1.3-2

## [33] labeling_0.3 stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20180316G0KYQ600?refer=cp_1026
  • 腾讯「云+社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。

扫码关注云+社区

领取腾讯云代金券

年度创作总结 领取年终奖励