我有一个30000+元素的大列表。有不同长度的向量,我想将列表转换为一个dataframe,其中每个向量表示一行,其值被分散到多个列中。有一个模拟列表示例:
lst <- list(a = c(1,2,4,5,6), c = c(7,8,9), c = c(10,11))
我想要的输出如下所示:
# [,1] [,2] [,3] [,4] [,5] [,6]
#a 1 2 3 4 5 6
#b 7 8 9 NA NA NA
#c 10 11 NA NA NA NA
发布于 2019-10-16 06:18:48
你可以这样做:
t(as.data.frame(lapply(lst, "length<-", max(lengths(lst)))))
# [,1] [,2] [,3] [,4] [,5]
#a 1 2 4 5 6
#c 7 8 9 NA NA
#c.1 10 11 NA NA NA
或者,正如安德鲁指出的,你可以这样做:
t(sapply(lst, "length<-", max(lengths(lst))))
# [,1] [,2] [,3] [,4] [,5]
#a 1 2 4 5 6
#c 7 8 9 NA NA
#c 10 11 NA NA NA
发布于 2019-10-16 06:21:46
以下是一个R基选项:
# Create a vector for number of times an NA needs to be padded
na_nums <- max(lengths(lst)) - lengths(lst)
# Transpose results after patting NA's using mapply
t(mapply(c, lst, sapply(na_nums, rep, x = NA)))
[,1] [,2] [,3] [,4] [,5]
a 1 2 4 5 6
c 7 8 9 NA NA
c 10 11 NA NA NA
发布于 2019-10-16 06:42:31
这是我的第一次冲动。
max_len <- max(vapply(lst,
FUN = length,
FUN.VALUE = numeric(1)))
lst <- lapply(lst,
function(x, max_len) c(x, rep(NA, max_len - length(x))),
max_len)
# Form a matrix
do.call("rbind", lst)
它有点冗长,其他一些解决方案也相当优雅。既然你说你的列表超过了30,000个元素,我很好奇这些元素在一个长度为30,000的列表上是如何表现的。
如果这是您需要经常做的事情,您可能想采用andrew的方法。
lst <- list(a = c(1,2,4,5,6), c = c(7,8,9), c = c(10,11))
# build out a list of 30,000 elements.
lst <- lst[sample(1:3, 30000, replace = TRUE)]
library(microbenchmark)
microbenchmark(
benjamin = {
max_len <- max(vapply(lst,
FUN = length,
FUN.VALUE = numeric(1)))
lst <- lapply(lst,
function(x, max_len) c(x, rep(NA, max_len - length(x))),
max_len)
# Form a matrix
do.call("rbind", lst)
},
slava = {
Reduce(function(x,y){
n <- max(length(x), length(y))
length(x) <- n
length(y) <- n
rbind(x,y,deparse.level = 0)
},
lst)
},
andrew = {
na_nums <- max(lengths(lst)) - lengths(lst)
# Transpose results after patting NA's using mapply
t(mapply(c, lst, sapply(na_nums, rep, x = NA)))
},
matt = {
t(as.data.frame(lapply(lst, "length<-", max(lengths(lst)))))
}
)
Unit: milliseconds
expr min lq mean median uq max neval
benjamin 77.08337 91.42793 117.9376 106.97656 122.53898 191.6612 5
slava 32383.10840 32962.57589 32976.6662 33071.40314 33180.70634 33285.5372 5
andrew 60.91803 66.74401 87.1645 71.92043 77.78805 158.4520 5
matt 1685.09158 1702.19796 1759.2741 1737.01949 1760.86237 1911.1993 5
https://stackoverflow.com/questions/58415322
复制