我有两个数据框,data1大约是孩子的年龄。对于不同的年份,值是不同的。
data2仍然是每个年龄的人口数据,列名仍然是year。
我想要的是将人口信息提取到一个新的数据帧中。但是2008/2009/2010年的人口数据应该是7-10岁。但是对于2011/2012/2013,人口数据应该是从6岁到9岁。
有谁知道吗?
data2 = data.frame('population by age' = seq(5, 11, by = 1),
'2008' = c(145391,
140621,
136150,
131944,
198933,
182182,
159103
),
'2009' = c(148566,
143943,
139367,
135083,
212196,
196398,
155033
),
'2010' = c(152330,
147261,
142555,
138172,
218701,
161330,
142190
),
'2011' = c(156630,
151387,
146491,
141905,
119397,
116093,
112666
),
'2012' = c(133545,
129737,
126124,
122678,
120213,
116826,
113381
),
'2013' = c(119397,
116093,
112666,
109174,
106871,
103659,
100398))
data1 <- data.frame('2008'= c(7,
8,
9,
10),
'2009' = c(7,
8,
9,
10),
'2010' = c(7,
8,
9,
10),
'2011' = c(6,
7,
8,
9),
'2012' = c(6,
7,
8,
9),
'2013' = c(6,
7,
8,
9)
)
发布于 2018-06-30 05:12:55
在我看来,你想要其中之一:
library(tidyverse)
data2 %>%
gather(year,value,-population.by.age) %>%
inner_join(gather(data1,year,population.by.age)) %>%
spread(year,value)
# population.by.age X2008 X2009 X2010 X2011 X2012 X2013
# 1 6 NA NA NA 151387 129737 116093
# 2 7 136150 139367 142555 146491 126124 112666
# 3 8 131944 135083 138172 141905 122678 109174
# 4 9 198933 212196 218701 119397 120213 106871
# 5 10 182182 196398 161330 NA NA NA
data2 %>%
gather(year,value,-population.by.age) %>%
inner_join(gather(data1,year,population.by.age)) %>%
group_by(year) %>%
mutate(population.by.age = letters[row_number()]) %>%
spread(year,value)
# # A tibble: 4 x 7
# population.by.age X2008 X2009 X2010 X2011 X2012 X2013
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a 136150 139367 142555 151387 129737 116093
# 2 b 131944 135083 138172 146491 126124 112666
# 3 c 198933 212196 218701 141905 122678 109174
# 4 d 182182 196398 161330 119397 120213 106871
这里是第二种情况的基础版本,没有经过长/宽的改革:
data3 <- data1
data3[] <- Map(function(x,y) y[data2[[1]] %in% x,drop=FALSE],data1,data2[-1])
data3
# X2008 X2009 X2010 X2011 X2012 X2013
# 1 136150 139367 142555 151387 129737 116093
# 2 131944 135083 138172 146491 126124 112666
# 3 198933 212196 218701 141905 122678 109174
# 4 182182 196398 161330 119397 120213 106871
发布于 2018-06-30 05:11:48
我们可以根据条件将第二个数据集转换为“gather
”格式,将filter
转换为“spread
”。
library(tidyverse)
gather(data2, key, val, X2008:X2013) %>%
filter((population.by.age %in% 7:10 & key %in% paste0("X", 2008:2010))|
(population.by.age %in% 6:9 & key %in% paste0("X", 2011:2013))) %>%
spread(key, val)
#population.by.age X2008 X2009 X2010 X2011 X2012 X2013
#1 6 NA NA NA 151387 129737 116093
#2 7 136150 139367 142555 146491 126124 112666
#3 8 131944 135083 138172 141905 122678 109174
#4 9 198933 212196 218701 119397 120213 106871
#5 10 182182 196398 161330 NA NA NA
发布于 2018-06-30 05:17:02
它不是很优雅,但你可以试试这个:
aux <- data2 %>%
select(population.by.age,X2008,X2009,X2010) %>%
filter(population.by.age > 6,
population.by.age < 10)
aux2 <- data2 %>%
select(population.by.age,X2011,X2012,X2013) %>%
filter(population.by.age > 5,
population.by.age < 9)
df <- full_join(aux,aux2) %>%
arrange(population.by.age)
祝好运!
https://stackoverflow.com/questions/51109301
复制相似问题