我有一个如下所示的数据集:
Age Monday Tuesday Wednesday
6-9 a b a
6-9 b b c
6-9 c a
9-10 c c b
9-10 c a b
使用R,我希望获得以下数据集/结果(其中每列表示每个唯一因子的总频率):
Age a b c
6-9 2 1 0
6-9 0 2 1
6-9 1 0 1
9-10 0 1 2
9-10 1 1 1
注意:我的数据也包含缺失值
发布于 2020-05-12 18:25:25
几个快速和肮脏的整洁解决方案--应该有一种方法来减少步骤。
library(tidyverse) # install.packages("tidyverse")
input <- tribble(
~Age, ~Monday, ~Tuesday, ~Wednesday,
"6-9", "a", "b", "a",
"6-9", "b", "b", "c",
"6-9", "", "c", "a",
"9-10", "c", "c", "b",
"9-10", "c", "a", "b"
)
# pivot solution
input %>%
rowid_to_column() %>%
mutate_all(function(x) na_if(x, "")) %>%
pivot_longer(cols = -c(rowid, Age), values_drop_na = TRUE) %>%
count(rowid, Age, value) %>%
pivot_wider(id_cols = c(rowid, Age), names_from = value, values_from = n, values_fill = list(n = 0)) %>%
select(-rowid)
# manual solution (if only a, b, c are expected as options)
input %>%
unite(col = "combine", Monday, Tuesday, Wednesday, sep = "") %>%
transmute(
Age,
a = str_count(combine, "a"),
b = str_count(combine, "b"),
c = str_count(combine, "c")
)
发布于 2020-05-13 12:35:55
在基数R中,我们可以用NA
替换空值,在数据帧中获得唯一的值,并按行使用apply
,并使用table
计算值的出现次数。
df[df == ''] <- NA
vals <- unique(na.omit(unlist(df[-1])))
cbind(df[1], t(apply(df, 1, function(x) table(factor(x, levels = vals)))))
# Age a b c
#1 6-9 2 1 0
#2 6-9 0 2 1
#3 6-9 1 0 1
#4 9-10 0 1 2
#5 9-10 1 1 1
https://stackoverflow.com/questions/61748729
复制相似问题