前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >R excel 工作

R excel 工作

原创
作者头像
大发明家
发布2021-12-06 13:57:49
1810
发布2021-12-06 13:57:49
举报
文章被收录于专栏:技术博客文章
代码语言:txt
复制
install.library('tidyverse')

The package tidyverse includes several useful packages using in data analysis,

such as ggplot2, phlyr, tidyr. The phlyr is selected to perform the data in

this article.

Work Flow

代码语言:txt
复制
# load the tidyverse package
代码语言:txt
复制
library(tidyverse)

filter——过滤

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with.

代码语言:txt
复制
# filter(.data, ..., .preserve = FALSE)
代码语言:txt
复制
# using the iris data
代码语言:txt
复制
> data(iris)
# display the first five rows of the iris data
代码语言:txt
复制
> head(iris)
# filter the data and attain the Sepal.Length = 5
代码语言:txt
复制
> filter(iris, Sepal.Length == 5)
代码语言:txt
复制
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
代码语言:txt
复制
1             5         3.6          1.4         0.2     setosa
代码语言:txt
复制
2             5         3.4          1.5         0.2     setosa
代码语言:txt
复制
3             5         3.0          1.6         0.2     setosa
代码语言:txt
复制
4             5         3.4          1.6         0.4     setosa
代码语言:txt
复制
5             5         3.2          1.2         0.2     setosa
代码语言:txt
复制
6             5         3.5          1.3         0.3     setosa
代码语言:txt
复制
7             5         3.5          1.6         0.6     setosa
代码语言:txt
复制
8             5         3.3          1.4         0.2     setosa
代码语言:txt
复制
9             5         2.0          3.5         1.0 versicolor
代码语言:txt
复制
10            5         2.3          3.3         1.0 versicolor
代码语言:txt
复制
> filter(iris, Sepal.Length == 5 & Sepal.Width == 3)
代码语言:txt
复制
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
代码语言:txt
复制
1            5           3          1.6         0.2  setosa

Useful filter functions

There are many functions and operators that are useful when constructing the

expressions used to filter the data:

  • ==, >, >= etc
  • &, |, !, xor()
  • is.na()
  • between(), near()

Attention:

The filter() will exclude the data contain NA , or you can keep the NA by

adding restrictions.

代码语言:txt
复制
> flower <- iris
> flower[1,1] <- NA
> filter(flower, is.na(flower) | Sepal.Length == 5 )
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
代码语言:txt
复制
1            NA         3.5          1.4         0.2     setosa
代码语言:txt
复制
2             5         3.6          1.4         0.2     setosa
代码语言:txt
复制
3             5         3.4          1.5         0.2     setosa
代码语言:txt
复制
4             5         3.0          1.6         0.2     setosa
代码语言:txt
复制
5             5         3.4          1.6         0.4     setosa
代码语言:txt
复制
6             5         3.2          1.2         0.2     setosa
代码语言:txt
复制
7             5         3.5          1.3         0.3     setosa
代码语言:txt
复制
8             5         3.5          1.6         0.6     setosa
代码语言:txt
复制
9             5         3.3          1.4         0.2     setosa
代码语言:txt
复制
10            5         2.0          3.5         1.0 versicolor
代码语言:txt
复制
11            5         2.3          3.3         1.0 versicolor

arrange——排序

arrange() orders the rows of a data frame by the values of selected columns.undefined Unlike other dplyr verbs, arrange() largely ignores grouping; you need to explicitly mention grouping variables (or use .by_group = TRUE) in order to group by them, and functions of variables are evaluated once per data frame, not once per group.

代码语言:txt
复制
# arrange the Sepal.Width column and then the Species column
代码语言:txt
复制
> arrange(iris, Petal.Width, Species)
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
代码语言:txt
复制
1            4.9         3.1          1.5         0.1     setosa
代码语言:txt
复制
2            4.8         3.0          1.4         0.1     setosa
代码语言:txt
复制
3            4.3         3.0          1.1         0.1     setosa
代码语言:txt
复制
4            5.2         4.1          1.5         0.1     setosa
代码语言:txt
复制
5            4.9         3.6          1.4         0.1     setosa
代码语言:txt
复制
...
代码语言:txt
复制
47           5.4         3.4          1.5         0.4     setosa
代码语言:txt
复制
48           5.1         3.8          1.9         0.4     setosa
代码语言:txt
复制
49           5.1         3.3          1.7         0.5     setosa
代码语言:txt
复制
50           5.0         3.5          1.6         0.6     setosa
代码语言:txt
复制
51           4.9         2.4          3.3         1.0 versicolor
代码语言:txt
复制
52           5.0         2.0          3.5         1.0 versicolor
代码语言:txt
复制
53           6.0         2.2          4.0         1.0 versicolor
代码语言:txt
复制
...
代码语言:txt
复制
# The optional parameters desc() can be used to descend order.

select()——选择

Select (and optionally rename) variables in a data frame, using a concise mini-language that makes it easy to refer to variables based on their name (e.g. a:f selects all columns from a on the left to f on the right). You can also use predicate functions like is.numeric to select variables based on their properties.

代码语言:txt
复制
# select the Petal.Width column and Species column
代码语言:txt
复制
> select(iris, Petal.Width, Species)
# select the data from Petal.Width column to Species column
代码语言:txt
复制
> select(iris, Petal.Width:Species)
# select the data except Petal.Width column to Species column
代码语言:txt
复制
> select(iris, -c(Petal.Width:Species))

Useful selection skills

Overview of selection features

Tidyverse selections implement a dialect of R where operators make it easy to

select variables:

  • : for selecting a range of consecutive variables.
  • ! for taking the complement of a set of variables.
  • & and | for selecting the intersection or the union of two sets of variables.
  • c() for combining selections.

In addition, you can use selection helpers. Some helpers select specific

columns:

  • everything(): Matches all variables.
  • last_col(): Select last variable, possibly with an offset.

These helpers select variables by matching patterns in their names:

  • starts_with(): Starts with a prefix.
  • ends_with(): Ends with a suffix.
  • contains(): Contains a literal string.
  • matches(): Matches a regular expression.
  • num_range(): Matches a numerical range like x01, x02, x03.

These helpers select variables from a character vector:

  • all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.
  • any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

This helper selects variables with a function:

  • where(): Applies a function to all variables and selects those for which the function returns TRUE.

mutate()——创建新变量

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL.

代码语言:txt
复制
iris_part <- mutate(iris, Sepal.Area = Sepal.Length * Sepal.Width)

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
作者已关闭评论
0 条评论
热度
最新
推荐阅读
目录
  • Work Flow
    • filter——过滤
      • Useful filter functions
    • arrange——排序
      • select()——选择
        • Useful selection skills
      • mutate()——创建新变量
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档