首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >R:从数据帧中拆分变量并查找唯一变量

R:从数据帧中拆分变量并查找唯一变量
EN

Stack Overflow用户
提问于 2018-06-04 04:40:24
回答 2查看 63关注 0票数 1

我有一个28行的tibble:

代码语言:javascript
复制
> al
# A tibble: 28 x 1
   lang_name                                               
   <chr>                                                   
 1 Objective-C,Swift,Other                                 
 2 Ruby,Shell                                              
 3 Ruby,HTML,Shell                                         
 4 Java,HTML,Kotlin,Other                                  
 5 TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML         
 6 Vue,JavaScript,CSS,HTML                                 
 7 HTML,JavaScript,CSS                                     
 8 JavaScript,HTML,CSS,Other                               
 9 NA                                                      
10 Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other
# ... with 18 more rows

我用al <- gh[,'lang_name']切开另一个数据帧得到的结果。我想从每一行中提取数据,并将其全部放在一个列表中,这样我就可以找到唯一的值。

我该怎么做?

我尝试过使用al <- str_split(al, ",")拆分,但它返回以下列表:

代码语言:javascript
复制
[[1]]
  [1] "c(\"Objective-C"  "Swift"            "Other\""          " \"Ruby"         
  [5] "Shell\""          " \"Ruby"          "HTML"             "Shell\""         
  [9] " \"Java"          "HTML"             "Kotlin"           "Other\""         
 [13] " \"TypeScript"    "JavaScript"       "CSS"              "Inno Setup"      
 [17] "Shell"            "HTML\""           " \"Vue"           "JavaScript"      
 [21] "CSS"              "HTML\""           " \"HTML"          "JavaScript"      
 [25] "CSS\""            " \"JavaScript"    "HTML"             "CSS"             
 [29] "Other\""          " NA"              " \"Vim script"    "Ruby"            
 [33] "Shell"            "Python"           "CoffeeScript"     "Makefile"        
 [37] "Other\""          " \"PHP\""         " \"JavaScript"    "TypeScript"      
 [41] "Other\""          " \"JavaScript"    "Other\""          " \"JavaScript"   
 [45] "CSS"              "Shell\""          " \"Ruby"          "JavaScript"      
 [49] "HTML"             "Vue"              "CSS"              "Shell\""         
 [53] " \"Go"            "Assembly"         "HTML"             "C"               
 [57] "Shell"            "Perl\""           " \"Go"            "HCL"             
 [61] "Other\""          " \"JavaScript\""  " \"C++"           "JavaScript"      
 [65] "Python"           "Go"               "Shell"            "C\""             
 [69] " \n\"JavaScript"  "CSS"              "HTML"             "Other\""         
 [73] " \"C++"           "Cuda"             "C"                "CMake"           
 [77] "Java"             "Python"           "Other\""          " \"JavaScript"   
 [81] "GLSL\""           " \"JavaScript"    "TypeScript"       "CSS\""           
 [85] " \"Kotlin"        "C"                "Makefile"         "HTML"            
 [89] "C++"              "Java"             "Other\""          " \"Java"         
 [93] "Other\""          " \"Python"        "Jupyter Notebook" "C++"             
 [97] "HTML"             "Shell"            "JavaScript\""     " \"CSS"          
[101] "JavaScript"       "HTML"             "Other\""          " \"HTML"         
[105] "CSS"              "JavaScript\")"   

unique(al)只是返回相同的字符串。

我也试着把这一切都放在一个角色上:

代码语言:javascript
复制
al <- gh[1,'lang_name']
i = 2
while(i < nrow(gh)) {
    al <- paste(al, ",", gh[i+1,'lang_name'])
    i = i + 1
  }
}

这将导致以下字符:[1] "Objective-C,Swift,Other , Ruby,HTML,Shell , Java,HTML,Kotlin,Other , TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML , Vue,JavaScript,CSS,HTML , HTML,JavaScript,CSS , JavaScript,HTML,CSS,Other , NA , Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other , PHP , JavaScript,TypeScript,Other , JavaScript,Other , JavaScript,CSS,Shell , Ruby,JavaScript,HTML,Vue,CSS,Shell , Go,Assembly,HTML,C,Shell,Perl , Go,HCL,Other , JavaScript , C++,JavaScript,Python,Go,Shell,C , JavaScript,CSS,HTML,Other , C++,Cuda,C,CMake,Java,Python,Other , JavaScript,GLSL , JavaScript,TypeScript,CSS , Kotlin,C,Makefile,HTML,C++,Java,Other , Java,Other , Python,Jupyter Notebook,C++,HTML,Shell,JavaScript , CSS,JavaScript,HTML,Other , HTML,CSS,JavaScript"

我不知道如何将其转换为运行unique的字符串。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-06-04 09:08:42

如果您喜欢tidyverse/purrr函数,可以在一个管道步骤中完成此操作。stringr::str_splitstringi::stri_split的便捷包装器。purrr::reduce允许您重复应用函数(在本例中为c ),直到将str_split返回的整个向量列表缩减为一个字符向量。对于像这样的任务,从R开始的unlist也可以很好地取代reduce-I have very purrr-focused的习惯,但这不一定是一个简单任务的默认设置。

代码语言:javascript
复制
library(tidyverse)

al$lang_name %>%
  str_split(",") %>%
  reduce(c) %>%
  unique()
#>  [1] "Objective-C"  "Swift"        "Other"        "Ruby"        
#>  [5] "Shell"        "HTML"         "Java"         "Kotlin"      
#>  [9] "TypeScript"   "JavaScript"   "CSS"          "Inno Setup"  
#> [13] "Vue"          NA             "Vim script"   "Python"      
#> [17] "CoffeeScript" "Makefile"

reprex package创建于2018-06-03 (v0.2.0)。

票数 1
EN

Stack Overflow用户

发布于 2018-06-04 07:07:21

我希望这能给你想要的:

代码语言:javascript
复制
library(tibble)

al <- tibble(lang_name=
c("Objective-C,Swift,Other",                                 
"Ruby,Shell",                                              
"Ruby,HTML,Shell",                                         
"Java,HTML,Kotlin,Other",                          
"TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML",         
"Vue,JavaScript,CSS,HTML",                                 
"HTML,JavaScript,CSS",                                     
"JavaScript,HTML,CSS,Other",                               
NA,                                                      
"Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other"))

l1 <- strsplit(al$lang_name,",")
l1

# [[1]]
# [1] "Objective-C" "Swift"       "Other"      
# 
# [[2]]
# [1] "Ruby"  "Shell"
# 
# [[3]]
# [1] "Ruby"  "HTML"  "Shell"
# 
# [[4]]
# [1] "Java"   "HTML"   "Kotlin" "Other" 
# 
# [[5]]
# [1] "TypeScript" "JavaScript" "CSS"        "Inno Setup" "Shell"      "HTML"      
# 
# [[6]]
# [1] "Vue"        "JavaScript" "CSS"        "HTML"      
# 
# [[7]]
# [1] "HTML"       "JavaScript" "CSS"       
# 
# [[8]]
# [1] "JavaScript" "HTML"       "CSS"        "Other"     
# 
# [[9]]
# [1] NA
# 
# [[10]]
# [1] "Vim script"   "Ruby"         "Shell"        "Python"       "CoffeeScript" "Makefile"     "Other"  

l2 <- unlist(l1)
l2
# [1] "Objective-C"  "Swift"        "Other"        "Ruby"         "Shell"        "Ruby"         "HTML"         "Shell"       
# [9] "Java"         "HTML"         "Kotlin"       "Other"        "TypeScript"   "JavaScript"   "CSS"          "Inno Setup"  
# [17] "Shell"        "HTML"         "Vue"          "JavaScript"   "CSS"          "HTML"         "HTML"         "JavaScript"  
# [25] "CSS"          "JavaScript"   "HTML"         "CSS"          "Other"        NA             "Vim script"   "Ruby"        
# [33] "Shell"        "Python"       "CoffeeScript" "Makefile"     "Other" 

l3 <- unique(l2)
l3

# [1] "Objective-C"  "Swift"        "Other"        "Ruby"         "Shell"        "HTML"         "Java"         "Kotlin"      
# [9] "TypeScript"   "JavaScript"   "CSS"          "Inno Setup"   "Vue"          NA             "Vim script"   "Python"      
# [17] "CoffeeScript" "Makefile"
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50670770

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档