我不认为这很复杂,但我已经迷失了方向。我有一个dataframe,在这里我需要创建一个基于循环列的新的数周列。问题是week_0中的周期在人与人之间是不同的(ID),接下来的每周(week_1通过week_6)每周有相同的天数(7天)。所以从理论上讲,我的数据最终会是这样的:
| ID | cycle | week |
| 1 | 0 | week_0 |
| 1 | 01 | week_0 |
| 1 | 02 | week_0 |
| 1 | 03 | week_1 |
| 1 | 04 | week_1 |
| 1 | 05 | week_1 |
| 1 | 06 | week_1 |
| 1 | 07 | week_1 |
| 1 | 08 | week_1 |
| 1 | 09 | week_1 |
| 1 | 10 | week_2 |
...
| 2 | 0 | week_0 |
| 2 | 01 | week_1 |
...
我创建了一个向量(diff_days),表示每个ID的week_0天数,但不知道如何编写基于cycle[1] + diff_days
的周变量和值的循环。
基本上我想:如果循环中的x介于0和相关的diff_days值之间(在这里,ID = 1,diff_days = 3),那么"week_0“应该出现在周列中,之后的每7个循环数都会出现随后的几周(week_1、week_2、week_3、week_4、week_5、week_6)。在这里,ID =1,循环= 03,循环= 09,循环= 10,循环= 16,等等。
我需要这样的东西:
for (i in cycle) {
if (i<= diff_days) {
i <- "week_0"
}
if (i>diff_days & i<=diff_days+7) {
i <- "week_1"
}
etc...
}
我知道这不是正确的R语法,但我不知道如何翻译它。
还是我让我的生活变得更复杂了,而且有一个简单的解决办法?
任何帮助都将不胜感激!
编辑:是的,很抱歉
这是diff_days向量
c(4, 7, 4, 5, 4, 5, 4, 6, 6, 5, 6, 3, 4, 5, 3, 0, 6, 10, 0, 4,
3, 4, 6, 3, 3, 12, 4, 5, 4, 6, 4, 4, 5, 5, 4, 5, 5, 3, 4, 4,
-1, 5, 5, 4, 6, 5, 4, 5, 6, 5, 7, 4, 4, 11, 6, 5, 6, 3)
下面是我的数据格式的前60行
structure(list(ID = c("PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022", "PTP0022",
"PTP0022", "PTP0040", "PTP0040", "PTP0040", "PTP0040", "PTP0040",
"PTP0040", "PTP0040", "PTP0040", "PTP0040", "PTP0040", "PTP0040",
"PTP0040", "PTP0040"), cycle_number = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L,
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L,
31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L,
44L, 45L, 46L, 47L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L), .Label = c("0", "1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17",
"18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28",
"29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39",
"40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50",
"51"), class = "factor")), row.names = c(NA, -60L), class = "data.frame")
发布于 2021-06-05 15:30:58
虽然您还没有共享您的起始数据,但从问题的措辞来看,我将假设您是从两列的数据(ID, cycle
)和包含每个ID
的差异天数的单独向量diff_days
开始的。下面是5个in在这种格式下的模拟数据作为示例:
library(tidyverse)
cycledata <- tibble(
ID = unlist(map(1:5, ~ rep(.x, 20))),
cycle = rep(seq(from=0, to=19, by=1), 5)
)
diff_days <- c(3, 4, 2, 5, 6)
您可以使用week
内部的case_when
函数计算mutate
列,如下所示:
result <- cycledata %>%
mutate(
week = case_when(cycle < diff_days[ID] ~ paste0("week_0"),
TRUE ~ paste0("week_", floor((cycle-diff_days[ID]+7)/7)))
)
上面的代码将一个if语句应用于您的dataframe中的每一行。如果cycle
的值低于diff_days
的适当值,则会打印"week_0“。如果不是,它使用一个公式来计算周期值属于哪一个星期。
虽然我的示例假设每个ID都有20行与其相关联,但该解决方案应该可以处理每个ID的不均匀行数。
https://stackoverflow.com/questions/67850428
复制相似问题