我的数据如下:
country    supporter1   supporter2   supporter3  supporter4    supporter5    
USA           Albania     Germany        USA           NA           NA
France        USA         France         NA            NA           NA
UK            UK          Chile          Peru          NA           NA
Germany       USA         Iran           Mexico        India        Pakistan
USA           China       Spain          NA            NA           NA
Cuba          Cuba        UK             Germany       South Korea  NA
China         Russia      NA             NA            NA           NA我想要做的是创建一个新的变量,当国家列和剩下的一个支持者列(支持者1、支持者2、支持者3、支持者4和支持者5)相同时(例如,法国和supporter2法国是相同的)。在这种情况下,新变量应该取1,0否则。
我希望有这样的:
country    supporter1   supporter2   supporter3  supporter4    supporter5      new variable  
USA          Albania     Germany        USA           NA           NA               1
France       USA         France         NA            NA           NA               1
UK           UK          Chile          Peru          NA           NA               1
Germany      USA         Iran           Mexico        India        Pakistan         0
USA          China       Spain          NA            NA           NA               0         
Cuba         Cuba        UK             Germany       South Korea  NA               1
China        Russia      NA             NA            NA           NA               0发布于 2022-01-02 09:03:41
只使用if_any更新dplyr解决方案
library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(new_var = as.integer(as.logical(if_any(starts_with("supporter"), ~ . %in% country))))  country supporter1 supporter2 supporter3 supporter4  supporter5 new_var
  <chr>   <chr>      <chr>      <chr>      <chr>       <chr>        <int>
1 USA     Albania    Germany    USA        NA          NA               1
2 France  USA        France     NA         NA          NA               1
3 UK      UK         Chile      Peru       NA          NA               1
4 Germany USA        Iran       Mexico     India       Pakistan         0
5 USA     China      Spain      NA         NA          NA               0
6 Cuba    Cuba       UK         Germany    South Korea NA               1
7 China   Russia     NA         NA         NA          NA               0第一个答案:也是正确的:这里有一个可能的解决方案:
如果country是rowwise
supporter1 to supporter5中计算,如果所有的新列都是一个,并且使用ifelse语句,则取1或0library(dplyr)
library(stringr)
library(tidyr)
df %>% 
  rowwise() %>% 
  mutate(across(supporter1:supporter5, ~ifelse(. %in% country, 1,0), .names = "new_{col}")) %>% 
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(New_Col = ifelse(str_detect(New_Col,  "1"), 1,0))  country supporter1 supporter2 supporter3 supporter4  supporter5 New_Col
  <chr>   <chr>      <chr>      <chr>      <chr>       <chr>        <dbl>
1 USA     Albania    Germany    USA        NA          NA               1
2 France  USA        France     NA         NA          NA               1
3 UK      UK         Chile      Peru       NA          NA               1
4 Germany USA        Iran       Mexico     India       Pakistan         0
5 USA     China      Spain      NA         NA          NA               0
6 Cuba    Cuba       UK         Germany    South Korea NA               1
7 China   Russia     NA         NA         NA          NA               0发布于 2022-01-02 09:16:36
这是一个基本的R解。
首先,mapply检查suporter*和country的相等性。NA被认为是返回FALSE,然后as.integer/rowSums将至少一个TRUE的行转换为1,否则为0。
eq <- mapply(\(x, y){x == y & !is.na(x)}, df1[-1], df1[1])
as.integer(rowSums(eq) != 0)
#[1] 1 1 1 0 0 1 0
df1$new_variable <- as.integer(rowSums(eq) != 0)数据
df1 <- read.table(text = "
country    supporter1   supporter2   supporter3  supporter4    supporter5    
USA           Albania     Germany        USA           NA           NA
France        USA         France         NA            NA           NA
UK            UK          Chile          Peru          NA           NA
Germany       USA         Iran           Mexico        India        Pakistan
USA           China       Spain          NA            NA           NA
Cuba          Cuba        UK             Germany       'South Korea'  NA
China         Russia      NA             NA            NA           NA
", header = TRUE)发布于 2022-01-02 09:12:40
另一种解决方案是检查每一行中是否存在country:
df <- data.frame(country=c("USA","France","UK","Germany","USA","Cuba","China"),
supporter1=c("Albania","USA","UK","USA","China","Cuba","Russia"),
supporter2=c("Germany","France","Chile","Iran","Spain","UK","NA"),  
supporter3=c("USA","NA","Peru","Mexico","NA","Germany","NA"),
supporter4=c("NA","NA","NA","India","NA","South Korea","NA"),   
supporter5=c("NA","NA","NA","Pakistan","NA","NA","NA"))这将使:
df$new <- sapply(seq(1,nrow(df)), function(x) ifelse(df$country[x] %in% df[x,2:6],1,0))
> df$new
[1] 1 1 1 0 0 1 0https://stackoverflow.com/questions/70554560
复制相似问题