TidyFriday 如何编写一个自动获取和展示疫情数据的 R 包？

王诗翔呀

发布于 2020-07-06 17:42:48

1K0

发布于 2020-07-06 17:42:48

文章被收录于专栏：优雅R

是不是总觉得会开发 R 包的都是大佬呢？其实你也可以，今天我们就为你揭开 R 包开发的神秘面纱！开发本文介绍的这个 R 包仅仅一些一些 R 语言的基础！

打开 RStudio

这是我的 RStudio 界面：

创建一个 R 包开发的 Project

运行：

usethis::create_package("~/Desktop/ncov")

即可在 ~/Desktop/ncov 位置创建一个 R 包的项目了，创建好之后会自动打开这个项目：

注意到右上角：

查看创建的项目文件夹

我们可以从桌面上找到这个文件夹：

这个目录的结构是这样的：

$ tree
.
├── DESCRIPTION
├── NAMESPACE
├── R
└── ncov.Rproj

接下来的操作中我们只需要关注这两个文件（夹）：DESCRIPTION 和 R。

R 文件夹里面就是等下用来放 R 脚本文件的，我们先写个 Hello.R 放进去。

还是先创建一个包文档：

usethis::use_package_doc()

使用下面的语句在 R 文件夹里面创建并打开 Hello.R 文件：

usethis::edit_file("R/Hello.R")

Hello.R 的内容(#' 开头的内容等下会被翻译成帮助文档，不能省略)：

#' Hello World
#' @description Print messages
#' @param message message
#' @examples
#' hello("Hello")
hello <- function(message){
  print(ifelse(message != "", message, "No message!"))
}

运行 devtools::document() 创建函数文档（等下 ?hello 即可查看）：

devtools::document()
#> Updating ncov documentation
#> Updating roxygen version in /Users/czx/Desktop/ncov/DESCRIPTION
#> Writing NAMESPACE
#> Loading ncov
#> Writing hello.Rd
#> Writing ncov-package.Rd

快速地安装这个 R 包：

devtools::install(quick = TRUE)
#> Running /Library/Frameworks/R.framework/Resources/bin/R CMD \
#>   INSTALL /Users/czx/Desktop/ncov --install-tests --no-docs \
#>   --no-multiarch --no-demo
#> * installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
#> * installing *source* package ‘ncov’ ...
#> ** using staged installation
#> ** R
#> ** byte-compile and prepare package for lazy loading
#> ** help
#> *** installing help indices
#> ** building package indices
#> ** testing if installed package can be loaded from temporary location
#> ** testing if installed package can be loaded from final location
#> ** testing if installed package keeps a record of temporary installation path
#> * DONE (ncov)

结果显示安装成功，下面使用这个包里面的函数：

devtools::load_all(".")
hello("Hello")
#> [1] "Hello"

hello()
#> Error in ifelse(message != "", message, "No message!") :
#> 	argument "message" is missing, with no default

hello("")
#> [1] "No message!"

这样我们就写好这个最简单的 R 包了。创建源码包：

devtools::build()
#> ✓  checking for file ‘/Users/czx/Desktop/ncov/DESCRIPTION’ ...
#> ─  preparing ‘ncov’:
#> ✓  checking DESCRIPTION meta-information ...
#> ─  checking for LF line-endings in source and make files and shell scripts
#> ─  checking for empty or unneeded directories
#> ─  building ‘ncov_0.0.0.9000.tar.gz’
#>
#> [1] "/Users/czx/Desktop/ncov_0.0.0.9000.tar.gz"

然后创建得到的 ncov_0.0.0.9000.tar.gz 就是这个包的源码包了，你可以把这个文件分享给别人安装，安装方法：

install.packages('~/Desktop/ncov_0.0.0.9000.tar.gz', repos = NULL, type = "source")

使用下面的方式可以把你的 R 包上传到 GitHub 上：

usethis::use_git()
#> ✔ Initialising Git repo
#> ✔ Adding '.Rhistory', '.RData' to '.gitignore'
#> There are 8 uncommitted files:
#> * '.DS_Store'
#> * '.gitignore'
#> * '.Rbuildignore'
#> * 'DESCRIPTION'
#> * 'man/'
#> * 'NAMESPACE'
#> * 'ncov.Rproj'
#> * 'R/'
#> Is it ok to commit them?
#>
#> 1: Absolutely
#> 2: Absolutely not
#> 3: Not now
#>
#> Selection: 1
#> ✔ Adding files
#> ✔ Commit with message 'Initial commit'
#> ● A restart of RStudio is required to activate the Git pane
#> Restart now?
#>
#> 1: Nope
#> 2: I agree
#> 3: No way#>

#> Selection: 2

最后一个选择 2 会重启 R 进程，再次打开的 RStudio 的右上角面板就变成了这样的：

多了一个 Git 表单。然后关于 Git 部分的操作就可以在这个表单里完成了，我不就不说这部分了，因为很多小伙伴可能还没用过 GitHub，所以我们还是回到 R 包的编写上。

先确认爬取疫情数据的函数还能正常运行

我们使用新浪新闻提供的接口，发现这个比较稳定，这个接口的爬取我之前有介绍过：

library(jsonlite)
library(tidyverse)
jsondata <- fromJSON('https://interface.sina.cn/news/wap/fymap2020_data.d.json')
# 当前时间
(times <- jsondata$data$times)
#> [1] "截至2月7日12时01分"

# 确诊数量
(confirm <- jsondata$data$gntotal)
#> [1] "31211"

# 死亡数量
(dead <- jsondata$data$deathtotal)
#> [1] "637"

# 疑似数量
(suspect <- jsondata$data$sustotal)
#> [1] "26359"

# 治愈数量
(cure <- jsondata$data$curetotal)
#> [1] "1542"

# 省份分布
prov_distribution <- jsondata$data$list %>%
  as_tibble()
prov_distribution

#> # A tibble: 34 x 7
#>    name  ename     value susNum deathNum cureNum city
#>    <chr> <chr>     <chr> <chr>  <chr>    <chr>   <list>
#>  1 北京  beijing   297   0      1        33      <df[,6] [15 × 6]>
#>  2 湖北  hubei     22112 0      618      819     <df[,6] [17 × 6]>
#>  3 广东  guangdong 1018  170    1        71      <df[,6] [20 × 6]>
#>  4 浙江  zhejiang  1006  0      0        99      <df[,6] [11 × 6]>
#>  5 河南  henan     914   0      3        70      <df[,6] [18 × 6]>
#>  6 湖南  hunan     772   0      0        101     <df[,6] [14 × 6]>
#>  7 重庆  chongqing 411   0      2        24      <df[,6] [39 × 6]>
#>  8 安徽  anhui     665   0      0        35      <df[,6] [16 × 6]>
#>  9 四川  sichuan   344   0      1        38      <df[,6] [21 × 6]>
#> 10 山东  shandong  379   0      0        31      <df[,6] [15 × 6]>
#> # … with 24 more rows

# 城市分布
city_distribution <- prov_distribution %>%
  select(city) %>%
  unnest(city) %>%
  type_convert()
city_distribution

#> # A tibble: 409 x 6
#>    name     conNum susNum cureNum deathNum mapName
#>    <chr>     <dbl>  <dbl>   <dbl>    <dbl> <chr>
#>  1 海淀区       47      0       0        0 海淀区
#>  2 怀柔区        7      0       0        0 怀柔区
#>  3 延庆区        1      0       0        0 延庆区
#>  4 丰台区       22      0       3        0 丰台区
#>  5 大兴区       34      0       2        0 大兴区
#>  6 东城区        7      0       0        0 东城区
#>  7 昌平区       17      0       0        0 昌平区
#>  8 西城区       36      0       0        0 西城区
#>  9 朝阳区       51      0       0        0 朝阳区
#> 10 石景山区     13      0       0        0 石景山区
#> # … with 399 more rows

# 在国外的分布
othercountry <- jsondata$data$otherlist %>%
  as_tibble()

#> # A tibble: 31 x 5
#>    name     value susNum deathNum cureNum
#>    <chr>    <chr> <chr>  <chr>    <chr>
#>  1 日本     86    0      0        1
#>  2 澳大利亚 15    0      0        3
#>  3 韩国     24    0      0        1
#>  4 美国     12    0      0        1
#>  5 马来西亚 14    4      0        1
#>  6 德国     13    3      0        0
#>  7 比利时   1     0      0        0
#>  8 西班牙   1     0      0        0
#>  9 俄罗斯   2     0      0        0
#> 10 柬埔寨   1     0      0        0
#> # … with 21 more rows

看来还都很好用！下面我们把这些操作封装进 R 包里面。

首先我们创建一个 R6.R 文件，因为我想使用 R6 类去封装这些操作：

usethis::edit_file('R/R6.R')

然后在这个文件里面写上如下内容：

#' Get 2019nCov data
#' @description Get 2019nCov data
#' @import R6
#' @importFrom jsonlite fromJSON
#' @importFrom tibble as_tibble
#' @importFrom readr type_convert
#' @importFrom dplyr select
#' @importFrom tidyr unnest
#' @import magrittr
#' @field jsondata raw data
#' @field times current time
#' @field confirm confirmed num
#' @field dead dead num
#' @field suspect suspected num
#' @field cure cured num
#' @field prov_distribution prov distribution
#' @field city_distribution city distribution
#' @field othercountry other country distribution
#' @export

ncov <- R6::R6Class(
  "ncov",
  public = list(
    jsondata = NULL,
    times = NULL,
    confirm = NULL,
    dead = NULL,
    suspect = NULL,
    cure = NULL,
    prov_distribution = NULL,
    city_distribution = NULL,
    othercountry = NULL,
    #' @details
    #' Initialise a ncov object
    initialize = function(){
      jsondata <- fromJSON('https://interface.sina.cn/news/wap/fymap2020_data.d.json')
      self$jsondata <- jsondata
      self$times = jsondata$data$times
      self$confirm = jsondata$data$gntotal
      self$dead = self$jsondata$data$deathtotal
      self$suspect = self$jsondata$data$sustotal
      self$cure = self$jsondata$data$curetotal
      self$prov_distribution = suppressMessages(
        self$jsondata$data$list %>%
          as_tibble() %>%
          type_convert()
      )
      self$city_distribution = suppressMessages(
        self$jsondata$data$list %>%
          as_tibble() %>%
          select(city) %>%
          unnest(city) %>%
          type_convert()
      )
      self$othercountry = as_tibble(self$jsondata$data$otherlist)
    },
    #' @details
    #' Prov distribution
    #' @param ... params
    plot = function(...){
      hchinamap::hchinamap(
        name = self$prov_distribution$name,
        value = self$prov_distribution$value,
        ...
      )
    }
  )
)

R6 的使用非常简单，这里我只声明了 public 对象，然后使用 initialize 方法为所有的 public 对象进行赋值。在 R6 类的最后我添加了一个 plot() 方法，这个函数调用了 hchinamap 包的 hchinamap() 函数进行绘图。

下面运行：

devtools::document()

自动生成包文档。

然后再运行下面的代码进行安装：

devtools::install()

然后我们试试这个包的使用：

library(ncov)
# 初始化一个 ncov 类的 df 变量
df <- ncov$new()

# df 里面就存储着所有我们需要的信息了，例如省份分布数据：
df$prov_distribution
#> # A tibble: 34 x 7
#>    name  ename     value susNum deathNum cureNum city
#>    <chr> <chr>     <dbl>  <dbl>    <dbl>   <dbl> <list>
#>  1 北京  beijing     297      0        1      33 <df[,6] [15 × 6]>
#>  2 湖北  hubei     22112      0      618     819 <df[,6] [17 × 6]>
#>  3 广东  guangdong  1018    170        1      71 <df[,6] [20 × 6]>
#>  4 浙江  zhejiang   1006      0        0     101 <df[,6] [11 × 6]>
#>  5 河南  henan       914      0        3      70 <df[,6] [18 × 6]>
#>  6 湖南  hunan       772      0        0     104 <df[,6] [14 × 6]>
#>  7 重庆  chongqing   411      0        2      24 <df[,6] [39 × 6]>
#>  8 安徽  anhui       665      0        0      40 <df[,6] [16 × 6]>
#>  9 四川  sichuan     344      0        1      40 <df[,6] [21 × 6]>
#> 10 山东  shandong    379      0        0      31 <df[,6] [15 × 6]>
#> # … with 24 more rows

# 对 df 使用 plot() 方法
plot(df,
	itermName = "确诊人数",
	title = "新型冠状病毒肺炎确诊人数的分布",
	subtitle = "TidyFriday Project",
	theme = "sandsignika")

因为 plot() 方法调用的是 hchinamap() 函数，所以你可以把 hchinamap() 函数的参数传递进去（除了 region 参数，因为这里我传递的数据只有省份分布的数据）。

编写 ncov 包的 `DESCRIPTION` 文件

使用下面的命令把该包依赖的 R 包写入 DESCRIPTION 文件中的 Imports 字段下：

for(pkg in c("jsonlite", "tibble", "readr", "dplyr", "tidyr", "magrittr", "hchinamap")){
	usethis::use_package(pkg)
}

再把你自己的信息和 R 包的一些信息填入 DES 就好了：

Package: ncov
Title: Get and Plot 2019 nCov Data
Version: 0.0.0.9000
Authors@R:
    person(given = "Zhenxing",
           family = "Cheng",
           role = c("aut", "cre"),
           email = "czxjnu@163.com")
Description: Get and Plot 2019 nCov Data.
License: What license it uses
Encoding: UTF-8
License: MIT + file LICENSE
Date: 2020-02-07
LazyData: true
RoxygenNote: 7.0.2
Imports:
    R6,
    jsonlite,
    tibble,
    readr,
    dplyr,
    tidyr,
    magrittr,
    hchinamap

注意，如果你想把你的 R 包发布在 CRAN 上，Description 字段应该是由多个句子组成的段落介绍，Title 字段应该是呀标题格式（该首字母大写的要大写）。

License 字段是声明该包的许可证的，这里我使用的是 MIT 许可 + 文件许可证，这个文件在的内容是：

YEAR: 2020
COPYRIGHT HOLDER: Zhenxing Cheng

这是符合 CRAN 的要求的声明方式。

最后我们再看一下这个 R 包项目的结构：

$ tree
.
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── R
│   ├── Hello.R
│   ├── R6.R
│   └── ncov-package.R
├── man
│   ├── hello.Rd
│   ├── ncov-package.Rd
│   └── ncov.Rd
└── ncov.Rproj

打包或者上传到 GitHub 上就可以分享给别人使用了！