抱歉,如果我是双重张贴,但我已经尝试了不同的东西,我已经看到堆栈溢出&不能完全解决这个问题或理解为什么我有它。
所以我有一个这样的数据集:
council_name <- c("Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barnet","Barnet")
period <- c("1st 2006", "1st 2006", "1st 2006", "1st 2006", "2nd 2006", "2nd 2006", "2nd 2006", "2nd 2006", "1st 2006", "1st 2006")
category <- c ("glass", "fridges", "paper", "glass", "glass", "fridges", "paper", "glass", "glass", "fridges")
data <- c(333, 222, 100, 98, 450, 540, 33, 450, 560, 120)
category2 <- c ("collected", "collected", "collected", "no donors", "collected", "collected", "collected", "no donors", "collected", "collected")
df <- data.frame (council_name, period, category, category2, data)我想要的是这样的东西:
council_name <- c("Barking and Dagenham","Barking and Dagenham","Barnet")
period <- c("1st 2006", "2nd 2006", "1st 2006")
glass <- c(333, 450, 560)
fridges <- c(222,540,120)
paper <- c(100, 33, NA)
no.donors <- c(98, 450, NA)
df.desired <- data.frame (council_name, period, glass, fridges, paper, no.donors)我一直在尝试使用pivot函数来实现多种功能,但是我需要从category1和category2中提取列名,而在df中用同一列中的值填充单元格,这给我带来了各种各样的问题。
非常感谢你的帮助!
发布于 2020-06-08 23:01:49
以下方法是一种data.table方法:
使用您的数据:
library(data.table)
setDT(df)
df[, sum(data), by = .(council_name, period, category, category2)][, dcast(.SD, council_name + period ~ category + category2, value.var = "V1")]这将产生
df
council_name period fridges_collected glass_collected glass_no donors paper_collected
1: Barking and Dagenham 1st 2006 222 333 98 100
2: Barking and Dagenham 2nd 2006 540 450 450 33
3: Barnet 1st 2006 120 560 NA NA这基本上就是您的数据,除了名称和列顺序上的不同。可以使用data.table的setnames和setcolorder来解决这些问题。
里面有什么:
sum(data), by = .(council_name...将为理事会、期间、类别和category2的每个独特组合汇总您的数据。请考虑只运行这一部分来查看输出:df[, sum(data), by = .(council_name, period, category, category2)][, dcast(.SD, council_name + period ~ category + category2, value.var = V1)]将前面得到的长表转换成一个宽的表(您想要的格式)。它说它希望行中有议会名称和句点,列中有category和category2,值变量是V1 (V1是第一个链接命令中给出的一个虚拟名称,因为我只写了sum(data),没有给它命名)。希望能有所帮助。
发布于 2020-06-08 23:18:20
下面是一个tidyverse解决方案,它使用pivot_wider将数据转换为宽格式,然后使用rename更改列名。
library(tidyverse)
df %>%
# Pivot from long to wide format using the first two columns as id cols and using both category and category2 columns to get the new column names
pivot_wider(id_cols = c(council_name,period),
names_from = c(category, category2),
values_from = data) %>%
# Rename the columns
rename("glass" = "glass_collected",
"fridges" = "fridges_collected",
"paper" = "paper_collected",
"no.donors" = "glass_no donors")
# A tibble: 3 x 6
# council_name period glass fridges paper no.donors
# <fct> <fct> <dbl> <dbl> <dbl> <dbl>
# 1 Barking and Dagenham 1st 2006 333 222 100 98
# 2 Barking and Dagenham 2nd 2006 450 540 33 450
# 3 Barnet 1st 2006 560 120 NA NA发布于 2020-06-09 00:26:16
下面是在dcast中进行聚合的另一种data.table方法
library(data.table)
setDT(df)
dcast(df, council_name + period ~ category + category2, value.var = "data", fun.aggregate = sum)
# council_name period fridges_collected glass_collected glass_no donors paper_collected
# 1: Barking and Dagenham 1st 2006 222 333 98 100
# 2: Barking and Dagenham 2nd 2006 540 450 450 33
# 3: Barnet 1st 2006 120 560 0 0https://stackoverflow.com/questions/62264770
复制相似问题