我希望标识包含NA且介于0和1之间的行。考虑一下这个data.table:
DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))
# DT
# a
# 1: 0
# 2: NA
# 3: NA
# 4: 0
# 5: NA
# 6: 1
# 7: 1
# 8: NA
# 9: 0
# 10: NA
# 11: 1
# 12: NA
# 13: NA
# 14: NA
# 15: 0
# 16: 1
# 17: 1
# 18: 0
# 19: NA
# 20: 0
如何识别5、8、10、12:14行?
发布于 2020-09-29 16:05:05
您可以尝试使用approx
DT[,b := approx((1:.N)[!is.na(a)],na.omit(a),1:.N)$y]
然后再申请
DT[, which(is.na(a) & b>0 & b<1)]
或
DT[, which(is.na(a) & between(b, 0, 1, FALSE))]
这给了我们
[1] 5 8 10 12 13 14
发布于 2020-09-29 15:22:10
NA序列的开始可以这样计算:
library("data.table")
DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))
r <- DT[, rle(is.na(a))]
R <- data.table(r$values, r$lengths, start=c(1, 1+head(cumsum(r$lengths), -1)))
i <- R[(V1), start]
j <- R[(V1), start+V2-1]
i[(DT[i-1, a] + DT[j+1, a])==1]
# result: [1] 5 8 10 12
发布于 2020-09-29 19:07:09
正如Dirk Eddelbuettel在这里:Replacing NAs with latest non-NA value中描述的那样,zoo
包及其na.locf()
函数可以为您提供帮助。
library(data.table)
library(zoo)
DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))
non_nas <- DT[!is.na(a), a]
successor <- c(non_nas[-1], 0)
diff <- abs(non_nas - successor)
DT[!is.na(a), diff:=diff]
这将为您提供如下数据表:
a diff
1: 0 0
2: NA NA
3: NA NA
4: 0 1
5: NA NA
6: 1 0
7: 1 1
8: NA NA
9: 0 1
10: NA NA
11: 1 1
12: NA NA
13: NA NA
14: NA NA
15: 0 1
16: 1 0
17: 1 1
18: 0 0
19: NA NA
20: 0 0
这里的想法是,diff列中的每个'1‘告诉您,在执行下面的NAs之后,'a’中的值将发生变化。
现在,您希望删除'diff‘列中的NAs。为了清楚起见,我们将结果放入新的列'b‘中。这就是zoo
包发挥作用的地方:
DT[, b:=na.locf(diff)]
这会导致
a diff b
1: 0 0 0
2: NA NA 0
3: NA NA 0
4: 0 1 1
5: NA NA 1
6: 1 0 0
7: 1 1 1
8: NA NA 1
9: 0 1 1
10: NA NA 1
11: 1 1 1
12: NA NA 1
13: NA NA 1
14: NA NA 1
15: 0 1 1
16: 1 0 0
17: 1 1 1
18: 0 0 0
19: NA NA 0
20: 0 0 0
最终
DT[is.na(a) & b == 1, which = TRUE]
将为您提供:
[1] 5 8 10 12 13 14
https://stackoverflow.com/questions/64114184
复制相似问题