前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >TidyFriday 每天 5 分钟,轻轻松松上手 R 语言(五)

TidyFriday 每天 5 分钟,轻轻松松上手 R 语言(五)

作者头像
王诗翔呀
发布2020-07-06 17:48:28
3100
发布2020-07-06 17:48:28
举报
文章被收录于专栏:优雅R优雅R

今天我们依旧利用 msleep 数据集来探讨 dplyr 的列筛选,并在最后补充几个行筛选的例子。

切片选择

  • 选择某列到某列的数据
代码语言:javascript
复制
msleep %>%
+   select(name:order)
# A tibble: 83 x 4
#   name                       genus       vore  order
#   <chr>                      <chr>       <chr> <chr>
# 1 Cheetah                    Acinonyx    carni Carnivora
# 2 Owl monkey                 Aotus       omni  Primates
# 3 Mountain beaver            Aplodontia  herbi Rodentia
# 4 Greater short-tailed shrew Blarina     omni  Soricomorpha
# 5 Cow                        Bos         herbi Artiodactyla
# 6 Three-toed sloth           Bradypus    herbi Pilosa
# 7 Northern fur seal          Callorhinus carni Carnivora
# 8 Vesper mouse               Calomys     NA    Rodentia
# 9 Dog                        Canis       carni Carnivora
#10 Roe deer                   Capreolus   herbi Artiodactyla
# … with 73 more rows

  • 去除某列到某列数据

去除 sleep_total 到 awake 列

代码语言:javascript
复制
msleep %>% select(-(sleep_total:awake))
# A tibble: 83 x 7
#   name                  genus     vore  order      conservation  brainwt  bodywt
#   <chr>                 <chr>     <chr> <chr>      <chr>           <dbl>   <dbl>
# 1 Cheetah               Acinonyx  carni Carnivora  lc           NA        50
# 2 Owl monkey            Aotus     omni  Primates   NA            0.0155    0.48
# 3 Mountain beaver       Aplodont… herbi Rodentia   nt           NA         1.35
# 4 Greater short-tailed… Blarina   omni  Soricomor… lc            0.00029   0.019
# 5 Cow                   Bos       herbi Artiodact… domesticated  0.423   600
# 6 Three-toed sloth      Bradypus  herbi Pilosa     NA           NA         3.85
# 7 Northern fur seal     Callorhi… carni Carnivora  vu           NA        20.5
# 8 Vesper mouse          Calomys   NA    Rodentia   NA           NA         0.045
# 9 Dog                   Canis     carni Carnivora  domesticated  0.07     14
#10 Roe deer              Capreolus herbi Artiodact… lc            0.0982   14.8
# … with 73 more rows
  • 删除 sleep_total 到 awake|的数据,但保留 sleep_rem。
代码语言:javascript
复制
msleep %>% select(-(sleep_total:awake),sleep_rem)
# A tibble: 83 x 8
#   name            genus   vore  order    conservation  brainwt  bodywt sleep_rem
#   <chr>           <chr>   <chr> <chr>    <chr>           <dbl>   <dbl>     <dbl>
# 1 Cheetah         Acinon… carni Carnivo… lc           NA        50          NA
# 2 Owl monkey      Aotus   omni  Primates NA            0.0155    0.48        1.8
# 3 Mountain beaver Aplodo… herbi Rodentia nt           NA         1.35        2.4
# 4 Greater short-… Blarina omni  Soricom… lc            0.00029   0.019       2.3
# 5 Cow             Bos     herbi Artioda… domesticated  0.423   600           0.7
# 6 Three-toed slo… Bradyp… herbi Pilosa   NA           NA         3.85        2.2
# 7 Northern fur s… Callor… carni Carnivo… vu           NA        20.5         1.4
# 8 Vesper mouse    Calomys NA    Rodentia NA           NA         0.045      NA
# 9 Dog             Canis   carni Carnivo… domesticated  0.07     14           2.9
#10 Roe deer        Capreo… herbi Artioda… lc            0.0982   14.8        NA
# … with 73 more rows

基于模式匹配选择

❝select() 语法 : select(data , ....) data : Data Frame .... : 变量名或者是 function ❞

前面的基本都是变量名,下面我们来看几个 function 的例子

  • 选择以 sleep 开头的列
代码语言:javascript
复制
msleep %>% select(name,starts_with('sleep'))
# A tibble: 83 x 4
#   name                       sleep_total sleep_rem sleep_cycle
#   <chr>                            <dbl>     <dbl>       <dbl>
# 1 Cheetah                           12.1      NA        NA
# 2 Owl monkey                        17         1.8      NA
# 3 Mountain beaver                   14.4       2.4      NA
# 4 Greater short-tailed shrew        14.9       2.3       0.133
# 5 Cow                                4         0.7       0.667
# 6 Three-toed sloth                  14.4       2.2       0.767
# 7 Northern fur seal                  8.7       1.4       0.383
# 8 Vesper mouse                       7        NA        NA
# 9 Dog                               10.1       2.9       0.333
#10 Roe deer                           3        NA        NA
# … with 73 more rows

类似的 function 还有

函数

解释

starts_with()

Starts with a prefix

ends_with()

Ends with a prefix

contains()

Contains a literal string

matches()

Matches a regular expression

num_range()

Numerical range like x01, x02, x03.

one_of()

Variables in character vector.

everything()

All variables.

我们再来看几个例子

选择列名中含有正则 o.+er模式的, . 代表任意字符,+ 表示一个或多个

代码语言:javascript
复制
msleep %>% select(matches('o.+er'))
# A tibble: 83 x 2
#   order        conservation
#   <chr>        <chr>
# 1 Carnivora    lc
# 2 Primates     NA
# 3 Rodentia     nt
# 4 Soricomorpha lc
# 5 Artiodactyla domesticated
# 6 Pilosa       NA
# 7 Carnivora    vu
# 8 Rodentia     NA
# 9 Carnivora    domesticated
#10 Artiodactyla lc
# … with 73 more rows
  • 选择包含字符串 serv 的列
代码语言:javascript
复制
msleep %>% select(contains('serv'))
#> A tibble: 83 x 1
#   conservation
#   <chr>
# 1 lc
# 2 NA
# 3 nt
# 4 lc
# 5 domesticated
# 6 NA
# 7 vu
# 8 NA
# 9 domesticated
#10 lc          with 73 more rows
  • 选择所有列并重新排序

将 awake 列放在第一列

代码语言:javascript
复制
msleep %>% select(awake,everything())
# A tibble: 83 x 11
#   awake name  genus vore  order conservation sleep_total sleep_rem sleep_cycle
#   <dbl> <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl>
# 1  11.9 Chee… Acin… carni Carn… lc                  12.1      NA        NA
# 2   7   Owl … Aotus omni  Prim… NA                  17         1.8      NA
# 3   9.6 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA
# 4   9.1 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133
# 5  20   Cow   Bos   herbi Arti… domesticated         4         0.7       0.667
# 6   9.6 Thre… Brad… herbi Pilo… NA                  14.4       2.2       0.767
# 7  15.3 Nort… Call… carni Carn… vu                   8.7       1.4       0.383
# 8  17   Vesp… Calo… NA    Rode… NA                   7        NA        NA
# 9  13.9 Dog   Canis carni Carn… domesticated        10.1       2.9       0.333
#10  21   Roe … Capr… herbi Arti… lc                   3        NA        NA
# … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>

  • 筛选数值型的列
代码语言:javascript
复制
msleep %>%
+   select_if(is.numeric) %>%
+   glimpse
Observations: 83
Variables: 6
$ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1, 3.0, 5.…
$ sleep_rem   <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.6, 0.8, 0…
$ sleep_cycle <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.3833333, NA…
$ awake       <dbl> 11.9, 7.0, 9.6, 9.1, 20.0, 9.6, 15.3, 17.0, 13.9, 21.0, 18…
$ brainwt     <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0.07000, 0.…
$ bodywt      <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.490, 0.045…

类似的还有is.characteris.factor

补充几个行筛选

  • 随机选择5个样本
代码语言:javascript
复制
msleep %>% sample_n(5)
# A tibble: 5 x 11
#   name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
#   <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
# 1 Star… Cond… omni  Sori… lc                  10.3       2.2      NA      13.7
# 2 Donk… Equus herbi Peri… domesticated         3.1       0.4      NA      20.9
# 3 Musk… Sunc… NA    Sori… NA                  12.8       2         0.183  11.2
# 4 Pig   Sus   omni  Arti… domesticated         9.1       2.4       0.5    14.9
# 5 Hous… Mus   herbi Rode… nt                  12.5       1.4       0.183  11.5
# … with 2 more variables: brainwt <dbl>, bodywt <dbl>
  • 随机选择 10% 的样本
代码语言:javascript
复制
msleep %>% sample_frac(0.1)
# A tibble: 8 x 11
#   name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
#   <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
# 1 Big … Epte… inse… Chir… lc                  19.7       3.9       0.117   4.3
# 2 East… Tami… herbi Rode… NA                  15.8      NA        NA       8.2
# 3 Braz… Tapi… herbi Peri… vu                   4.4       1         0.9    19.6
# 4 Pilo… Glob… carni Ceta… cd                   2.7       0.1      NA      21.4
# 5 Musk… Sunc… NA    Sori… NA                  12.8       2         0.183  11.2
# 6 Chim… Pan   omni  Prim… NA                   9.7       1.4       1.42   14.3
# 7 Slow… Nyct… carni Prim… NA                  11        NA        NA      13
# 8 Red … Vulp… carni Carn… NA                   9.8       2.4       0.35   14.2
# … with 2 more variables: brainwt <dbl>, bodywt <dbl>
  • 去除重复的观测值

没有完全重复的值,所以所有的值都选到了。

代码语言:javascript
复制
msleep %>% distinct()
# A tibble: 83 x 11
#   name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
#   <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
# 1 Chee… Acin… carni Carn… lc                  12.1      NA        NA      11.9
# 2 Owl … Aotus omni  Prim… NA                  17         1.8      NA       7
# 3 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
# 4 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
# 5 Cow   Bos   herbi Arti… domesticated         4         0.7       0.667  20
# 6 Thre… Brad… herbi Pilo… NA                  14.4       2.2       0.767   9.6
# 7 Nort… Call… carni Carn… vu                   8.7       1.4       0.383  15.3
# 8 Vesp… Calo… NA    Rode… NA                   7        NA        NA      17
# 9 Dog   Canis carni Carn… domesticated        10.1       2.9       0.333  13.9
#10 Roe … Capr… herbi Arti… lc                   3        NA        NA      21
# … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>

  • 去除 sleep_total 重复的观测值

设置 .keep_all 将保留所有其他变量

代码语言:javascript
复制
msleep %>% distinct(sleep_total,.keep_all = TRUE)
# A tibble: 65 x 11
#   name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
#   <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
# 1 Chee… Acin… carni Carn… lc                  12.1      NA        NA      11.9
# 2 Owl … Aotus omni  Prim… NA                  17         1.8      NA       7
# 3 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
# 4 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
# 5 Cow   Bos   herbi Arti… domesticated         4         0.7       0.667  20
# 6 Nort… Call… carni Carn… vu                   8.7       1.4       0.383  15.3
# 7 Vesp… Calo… NA    Rode… NA                   7        NA        NA      17
# 8 Dog   Canis carni Carn… domesticated        10.1       2.9       0.333  13.9
# 9 Roe … Capr… herbi Arti… lc                   3        NA        NA      21
#10 Goat  Capri herbi Arti… lc                   5.3       0.6      NA      18.7
# … with 55 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-02-18,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 优雅R 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 切片选择
  • 基于模式匹配选择
  • 补充几个行筛选
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档