对于使用messy data的国会预算分析师来说,这是一个典型的问题。
Dataframe显示为每个项目请求和授权的金额。
授权的数量有时比请求的数量多或少。当发生这种情况时,调整(此处不包括说明性文本)显示在合计下面的括号中。
例如,在下面的数据框中,授权者调整了项目"a“(80 requested) by +19 and +1的请求数量。经过这些调整后,"a“的核准总金额为100。
80 requested + (19 authorized + 1 authorized) = 100 total authorized.
目标:我想通过括号内的数字来调整授权金额。
library(tidyverse)
## DATA
df <- tribble(
  ~item, ~requested_amount,  ~authorized_amount,
  "a",           80,               "100",  #< Total
  "a",           NA,               "[19]", #< Adjustment from request
  "a",           NA,               "[1]",  #< Adjustment from request 
  "b",           300,              "300",  #< Total (no adjustment)
  "c",           80,                "70",  #< Total
  "c",           NA,              "[-10]"  #< Adjustment from request
              )
#> # A tibble: 6 x 3
#>   item  requested_amount    authorized_amount
#>   <chr>            <dbl>    <chr>            
#> 1 a                 80      100              
#> 2 a                 NA      [19]             
#> 3 a                 NA      [1]              
#> 4 b                300      300              
#> 5 c                 80       70               
#> 6 c                 NA      [-10]预期结果将括号内的金额视为实际调整:
项目"a" = (80 + 19 + 1) = 100的授权金额
#>   item  requested_amount authorized_amount
#>   <chr>            <dbl>             <dbl>
#> 1 a                 80               80 #< Together... 
#> 2 a                 NA               19 #< these add...
#> 3 a                 NA                1 #< to 100 for item "a"
#> 4 b                300              300   
#> 5 c                 80               70 
#> 6 c                 NA             - 10由reprex package创建于2018-06-07 (v0.2.0)。
发布于 2018-06-08 13:26:43
如果我理解正确的话,你需要每一项的authorized_amount总和。一种解决方案是:
library(tidyverse)
library(readr)
df %>% 
  mutate(authorized_amount = readr::parse_number(df$authorized_amount)) %>% 
  group_by(item) %>% 
  summarise(requested_amount = requested_amount[!is.na(requested_amount)],
            authorized_amount = sum(authorized_amount))
# A tibble: 3 x 3
  item  requested_amount authorized_amount
  <chr>            <dbl>             <dbl>
1 a                 80.0             120  
2 b                300               300  
3 c                 80.0              60.0https://stackoverflow.com/questions/50753180
复制相似问题