我有一个数据集,如下所示:
用于创建起始数据集的代码:
dataset<-data.frame(Attorney=c("John Doe", "Client #1","274", "296",
"297", "Client #2", "633", "Jane Doe",
"Client #1", "309", "323"),
Date=c(NA, NA, "2019/4/4", "2019/4/4", "2019/4/12",
NA, " 2019/2/3", NA, NA, "2019/12/1", "2019/12/4"),
Code=c(NA, NA, "7NP/7NP", "1UE/1UE", "2C1/2C1",NA,
"7NP/7NP", NA, NA, "7NP/7NP", "7FU/7FU"),
Billed_Amount=c(NA, NA, 1200.00, 4000.00, 2775.00,
NA, 1200.00, NA, NA, 1200.00, 385),
Amount= c(NA, NA, "1200", "4000", "2775", NA, "1200",
NA, NA, "1200", "385"),
Current =c(NA, NA, 0, 0, 0, NA, 0, NA, NA, 0, 0),
X.120=c(NA, NA, "1200", "4000", "2775", NA, "1200",
NA, NA, "1200", "385"))我的目标是最终得到一个看起来像这样的数据集:
用于创建目标数据集的代码:
dataset<-data.frame(Attorney=c("John Doe", "John Doe", "John Doe",
"John Doe", "Jane Jane", "Jane Jane"),
Date=c("2019/4/4", "2019/4/4", "2019/12/4", " 2019/2/3",
"2019/12/1","2019/12/4" ),
Code=c("7NP/7NP", "1UE/1UE","2C1/2C1", "7NP/7NP",
"7NP/7NP", "7FU/7FU"),
Billed_Amount=c(1200.00, 4000.00,2775.00, 1200.00,
1200.00, 385),
Amount= c(1200, 4000, 2775, 1200,1200, 385),
Current= c(0, 0, 0, 0, 0, 0),
X.120=c(1200, 4000, 2775,1200, 1200, 385))我想用律师的名字来重命名每个律师下面的行,而不用担心保留客户的名字。我的原始数据集有许多律师,他们有不同数量的客户,这些客户有不同数量的代码、日期和金额与之关联。
我尝试使用if else语句,但遇到错误消息。
感谢你能给我的任何帮助。谢谢!
编辑:我已经编辑了我的问题,包括假想的律师姓名。
发布于 2020-03-04 07:49:27
一种选择是基于‘presence’列中是否存在‘PRORTH子字符串’来创建一个分组变量,然后在按'grp‘分组之后,使用’presence‘的first元素对’presence‘列进行mutate,并输出NA元素
library(dplyr)
library(stringr)
dataset %>%
group_by(grp = cumsum(str_detect(Attorney, "^Attorney"))) %>%
mutate(Attorney = first(Attorney)) %>%
filter_at(vars(Date:X.120), all_vars(!is.na(.))) %>%
ungroup %>%
select(-grp)我们也可以在这里使用na.omit
dataset %>%
group_by(grp = cumsum(str_detect(Attorney, "^Attorney"))) %>%
mutate(Attorney = first(Attorney)) %>%
ungroup %>%
select(-grp) %>%
na.omit
# A tibble: 6 x 7
# Attorney Date Code Billed_Amount Amount Current X.120
# <fct> <fct> <fct> <dbl> <fct> <dbl> <fct>
#1 Attorney #1 "2019/4/4" 7NP/7NP 1200 1200 0 1200
#2 Attorney #1 "2019/4/4" 1UE/1UE 4000 4000 0 4000
#3 Attorney #1 "2019/4/12" 2C1/2C1 2775 2775 0 2775
#4 Attorney #1 " 2019/2/3" 7NP/7NP 1200 1200 0 1200
#5 Attorney #2 "2019/12/1" 7NP/7NP 1200 1200 0 1200
#6 Attorney #2 "2019/12/4" 7FU/7FU 385 385 0 385 或者,另一种选择是在使用fill对非‘another’子串元素进行replace之后,使用NA来填充‘after’列,以便它被前一个非NA元素填充,然后执行na.omit
library(tidyr)
dataset %>%
mutate(Attorney = replace(Attorney, !str_detect(Attorney, "Attorney"), NA)) %>%
fill(Attorney) %>%
na.omithttps://stackoverflow.com/questions/60517156
复制相似问题