我有一个数据框,如下所示:
ID Capital Instal Date1 Date2
2 500 25 a b
2 500 20 a c
2 450 15 a a
2 300 10 a f
2 250 0 a z
4 100 25 b a
4 90 20 b b
4 80 15 b a
4 75 10 b f
4 25 0 b z
我想从这里创建一个新的data.frame
,如果是Date1=Date2
,那么我的新数据框B
将如下所示:
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
2 a 450 15 10 0
4 b 90 20 15 10 0
因此,我希望新的data.frame
只考虑Date1
和Date2
相等后的数据。
发布于 2018-05-31 18:38:10
这是另一个tidyverse
解决方案...
library(dplyr)
library(tidyr)
df2 <- df %>%
group_by(ID) %>% #group by ID
mutate(ind=cumsum(Date1==Date2)) %>% #mark elements after first Date1==Date2
filter(ind!=0) %>% #remove previous elements
summarise(Date1=first(Date1),
Capital=first(Capital),
Instal=list(Instal)) %>% #capture values for table
unnest() %>% #spread Instal, one value per row
group_by(ID) %>%
mutate(Inst=paste0("Instal",row_number())) %>% #mark names of Instal values
spread(key=Inst,value=Instal) #spread into wide format
df2
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
1 2 a 450 15 10 0 NA
2 4 b 90 20 15 10 0
发布于 2018-05-31 18:36:32
有点拐弯抹角的。我相信有一种更快的方法可以做到这一点,但这会得到您期望的输出。
步骤: when date1 == date2选择row_number并在选择后填写。筛选这些记录并仅选择所需的列。创建一列以用作展开中的标题,并展开Instal数据。接下来,连接数据的一个子集以获得正确的Capital值,并将此表与前一个选择连接起来。
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
mutate(rowid = row_number(),
selection = ifelse(Date1 == Date2, rowid, NA)) %>%
fill(selection) %>% # fill rowid over the rows after date1 == date2
filter(!is.na(selection)) %>%
select(ID, Date1, Instal) %>%
mutate(Installation = paste0("Instal", row_number())) %>%
spread(Installation, Instal) %>%
inner_join(df %>% filter(Date1 == Date2) %>% select(ID, Date1, Capital), .)
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
1 2 a 450 15 10 0 NA
2 4 b 90 20 15 10 0
数据:
df <- structure(list(ID = c(2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L),
Capital = c(500L, 500L, 450L, 300L, 250L, 100L, 90L, 80L,
75L, 25L), Instal = c(25L, 20L, 15L, 10L, 0L, 25L, 20L, 15L,
10L, 0L), Date1 = c("a", "a", "a", "a", "a", "b", "b", "b",
"b", "b"), Date2 = c("b", "c", "a", "f", "z", "a", "b", "a",
"f", "z")), class = "data.frame", row.names = c(NA, -10L))
https://stackoverflow.com/questions/50621809
复制相似问题