我知道有类似的文章,但我有点困惑如何使用pivot_longer将自己的数据从宽格式转换为长格式。下面的代码创建一个模拟数据集,该数据集的结构类似于我的真实数据。
library(tidyverse)
## Dummy data.
# ID Variables.
part <- rep(rep(paste0("P", c(1:2)), each = 20, times = 2))
type <- rep(c("pre", "post"), each = 10, times = 4)
sp <- rep(c("slow", "mod"), each = 40)
# Values
var1_site1_L <- rep(c(1, NA), each = 5, times = 8)
var1_site1_R <- rep(c(1, NA), each = 5, times = 8)
var1_site1_ALL <- rep(1, times = 80)
var1_site1_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
var2_site2_L <- rep(c(1, NA), each = 5, times = 8)
var2_site2_R <- rep(c(1, NA), each = 5, times = 8)
var2_site2_ALL <- rep(1, times = 80)
var2_site2_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
dat <- data.frame(part, type, sp, var1_site1_L, var1_site1_R, var1_site1_ALL,
var1_site1_ALL_M, var2_site2_L, var2_site2_R, var2_site2_ALL,
var2_site2_ALL_M)我希望能够将变量part、type和sp作为ID变量,但将惟一的列名分隔符添加为附加ID变量,并在最后一列中指定值。例如,我希望结果类似于(注意,这只是一个非常基本的例子,当然还有更多的观察,包括值列中的NA值):
par type sp var site side misc value
p1 pre slow var1 site1 L NA 1
p1 pre slow var1 site1 R NA 1
p1 pre slow var1 site1 ALL NA 1
p1 pre slow var1 site1 ALL M 1我知道这是一个非常独特的数据结构。在某些情况下(在每个ID变量只有一个值的情况下),我特别关注如何处理第四个列名分隔符(M)。
我得到了下面的代码,我知道,如果我要达到我想要的结果,需要做一些工作。
long <- dat %>%
pivot_longer(cols = c(1:3),
names_to = c("var", "site", "side", "misc"),
names_sep = "_")任何帮助都将不胜感激!
发布于 2022-03-04 02:09:46
我实验了我在早期解决方案和pivot_wider (然后是pivot_longer )中产生的结果,发现如何使其工作在pivot_longer.Your最初的方法中非常接近。
dat %>%
pivot_longer(
cols = !c(part, type, sp),
names_to = c("var", "site", "side", "misc"),
names_sep = "_",
values_to = "value"
)
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1发布于 2022-03-03 03:30:34
我不认为你能用pivot_longer到达那里,但是试试这个。
library(stringr)
results <- data.frame()
for (x in 4:length(dat)){
names <- names(dat[,c(1:3,x)])
res <- dat %>%
mutate(id = 1:nrow(dat)) %>%
select(id, names) %>%
mutate(var = str_extract(names[4],"var\\d"),
site = str_extract(names[4],"site\\d"),
side = str_extract(names[4],"L|R|ALL"),
misc = str_extract(names[4],"[M]"),
misc = ifelse(is.na(misc), "NA", misc)) %>%
rename("value" = 5) %>%
select(id, part, type, sp, var, site, side, misc, value)
results <- rbind(results, res)
}
head(results %>% arrange(id) %>% select(-id))
part type sp var site side misc value
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1发布于 2022-03-03 03:49:51
dat %>%
pivot_longer(starts_with('var')) %>%
separate(name, c('var', 'site', 'side', 'misc'), fill = 'right')
# A tibble: 640 x 8
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1
# ... with 630 more rowshttps://stackoverflow.com/questions/71331220
复制相似问题