问如何替换数据集中的缺失点？
EN

Stack Overflow用户

提问于 2021-11-07 02:07:05

回答 1查看 79关注 0票数 2

我想用R编写一个函数，它接收任何数据集作为输入，这样数据集就会有一些缺失点(NA)。现在我想使用mean函数来替换数据集中缺失点(NA)的一些数字/值。我想的是这样的函数：

x<function(data,type=c("mean", lag=2))

实际上，它应该计算后面的两个数字和缺失点之前的两个数字的平均值(因为我在函数中将lag视为2 )。例如，如果缺失点位于第12位，则函数应计算第10位、第11位、第13位和第14位的数字的平均值，并将结果替换为第12位的缺失点。在特定情况下，例如，如果缺失点在最后一位，并且后面没有两个数字，则该函数应该计算相应列的所有数据的平均值，并替换缺失点。这里我举个例子来说明一下。考虑以下数据集：

3  7 8 0  8  12 2
5  8 9 2  8  9  1
1  2 4 5  0  6  7
5  6 0 NA 3  9  10
7  2 3 6  11 14 2
4  8 7 4  5  3  NA

在上面的数据集中，第一个NA应该替换为数字2，5 (之前的两个数据)和6和4 (之后的两个数据)的平均值，即(2+5+6+4)/4等于17/4。最后一列的平均值应该替换为最后一列的平均值，即(2+1+7+10+2)/5等于22/5。

我的问题是，我如何在上面的函数中添加一些代码(if、if-else或其他循环)，使其成为一个完整的函数，以满足上述解释。我应该强调的是，我希望使用apply函数系列。

if-statement

point

腾讯云OCR文字识别特惠

文字识别限时抢购，热门产品低至14.9元

回答 1

Stack Overflow用户

发布于 2021-11-07 06:43:09

首先，我们可以定义一个平滑单个向量的函数：

library(dplyr)

smooth = function(vec, n=2){
    # Lead and lag the vector twice in both directions
    purrr::map(1:n, function(i){
        cbind(
            lead(vec, i),
            lag(vec, i)
        )
    }) %>%
        # Bind the matrix together
        do.call(cbind, .) %>%
        # Take the mean of each row, ie the smoothed version at each position
        # If there are NAs in the mean, it will itself be NA
        rowMeans() %>%
        # In order, take a) original values b) locally smoothed values
        # c) globally smoothed values (ie the entire mean ignoring NAs)
        coalesce(vec, ., mean(vec, na.rm=TRUE))
}

> smooth(c(0, 2, 5, NA, 6, 4))
[1] 0.00 2.00 5.00 4.25 6.00 4.00
> smooth(c(2, 1, 7, 10, 2, NA))
[1]  2.0  1.0  7.0 10.0  2.0  4.4

然后我们可以将其应用于每一列：

> c(3, 7, 8, 0, 8, 12, 2, 5, 8, 9, 2, 8, 9, 1, 1, 2, 4, 5, 0, 6, 7, 5, 6, 0, NA, 3, 9, 10, 7, 2, 3, 6, 11, 14, 2, 4, 8, 7, 4, 5, 3, NA) %>% 
    matrix(byrow=TRUE, ncol=7) %>%
    as_tibble(.name_repair="universal") %>%                        
    mutate(across(everything(), smooth))
# A tibble: 6 × 7
   ...1  ...2  ...3  ...4  ...5  ...6  ...7
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     3     7     8  0        8    12   2  
2     5     8     9  2        8     9   1  
3     1     2     4  5        0     6   7  
4     5     6     0  4.25     3     9  10  
5     7     2     3  6       11    14   2  
6     4     8     7  4        5     3   4.4