文章/答案/技术大牛

发布

社区首页 >问答首页 >在单独的列(tidyr)中"'into‘是缺失的“

问在单独的列(tidyr)中"'into‘是缺失的“
EN

Stack Overflow用户

提问于 2017-08-24 09:39:32

回答 1查看 98关注 0票数 0

我建立了10篇论文的元数据。dput()的结果如下所示：

> dput(itemlist)
structure(list(title = c("钱学森工程科学思想的实践者 [科普文章]", 
"超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", "Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"一种热机械疲劳实验的装置和方法 [专利]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]"
), publish = c("2014", "2014", " 2013", "专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"2012", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10"
), author = c("丁雁生; 洪友士; 金和", "洪友士; 中国科学院老科技工作者协会工程力学分会", 
"Sih G C; Hong YS(洪友士)", "谢季佳; 赵爱国; 武晓东; 洪友士", 
"陈杰; 刘洋; 汤亚南; 洪友士", "赵爱国; 洪友士; 谢季佳", "雷铮强; 洪友士; 谢季佳; 赵爱国", 
"Zhang SY(张双寅); Wang L(王雷); Hong YS(洪友士)", "Lei ZQ(雷铮强); Xie JJ(谢季佳); Zhao AG(赵爱国); Hong YS(洪友士)", 
"Wu XD(武晓东); Ge F(葛斐); Hong YS(洪友士)")), .Names = c("title", 
"publish", "author"), row.names = c(NA, 10L), class = "data.frame")

我发现tidyr可以通过一个属性中的每个元素来分离列表。在本例中，我将“author”分隔为不同的行：

> dput(itemlist_tidy)
structure(list(title = c("钱学森工程科学思想的实践者 [科普文章]", 
"钱学森工程科学思想的实践者 [科普文章]", "钱学森工程科学思想的实践者 [科普文章]", 
"超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", "超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", 
"Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"一种热机械疲劳实验的装置和方法 [专利]", "一种热机械疲劳实验的装置和方法 [专利]", 
"一种热机械疲劳实验的装置和方法 [专利]", "一种热机械疲劳实验的装置和方法 [专利]", 
"IUTAM和ICTAM的起源和历程 [科普文章]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"IUTAM和ICTAM的起源和历程 [科普文章]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "加载频率对金属材料超高周疲劳性能的影响 [会议论文]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", "Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]"
), publish = c("2014", "2014", "2014", "2014", "2014", " 2013", 
" 2013", "专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"2012", "2012", "2012", "2012", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10"
), author = c("丁雁生", " 洪友士", " 金和", "洪友士", " 中国科学院老科技工作者协会工程力学分会", 
"Sih G C", " Hong YS(洪友士)", "谢季佳", " 赵爱国", " 武晓东", 
" 洪友士", "陈杰", " 刘洋", " 汤亚南", " 洪友士", "赵爱国", " 洪友士", 
" 谢季佳", "雷铮强", " 洪友士", " 谢季佳", " 赵爱国", "Zhang SY(张双寅)", 
" Wang L(王雷)", " Hong YS(洪友士)", "Lei ZQ(雷铮强)", " Xie JJ(谢季佳)", 
" Zhao AG(赵爱国)", " Hong YS(洪友士)", "Wu XD(武晓东)", " Ge F(葛斐)", 
" Hong YS(洪友士)")), row.names = c(NA, -32L), class = "data.frame", .Names = c("title", 
"publish", "author"))

我的重点是“作者”专栏：

所有作者都用分号分隔(‘；’)
并不是所有的论文都有相同的作者数量。

现在，我想将“author”列划分为不同的列，以便通过iGraph绘制一个合著者图。“tidyr”似乎是最好的选择，但却行不通：

> library(tidyr)
> v_t <- separate(itemlist, col="author", sep = ";", remove = TRUE, convert = FALSE)
Error in simplifyPieces(pieces, n, fill == "left") : 
  argument "into" is missing, with no default

我不明白错误信息到底是什么意思。我们需要满足什么条件才能将“作者”分成多个专栏。我认为，由于tidyr提供了分隔行或列的函数，所以必须使用分隔的这些表。我们应该意识到这一点吗？

dataframe

tidyr

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-08-24 10:05:25

分离需要函数中的参数into。这些应该是要创建的变量的名称。您的调用不包括参数。

帮助文件中的一个经过调整的示例：

library(dplyr)
library(tidyr)
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
separate(data = df, col = x, into = c("A", "B"))
     A    B
1 <NA> <NA>
2    a    b
3    a    d
4    b    c

您可以使用str_count()从stringr确定作者列中的最大作者数，然后使用它指定要在separate()函数中创建的列数。我用这个q&a作为灵感：Separate a String using Tidyr's "separate" into Multiple Columns and then Create a New Column with Counts

下面是一个来自简化数据集的示例：

df <- data.frame(id = c(1,2,3), 
             author = c("name1; name2; name3", 
                        "name1; name2", "name1"))

df
  id              author
1  1 name1; name2; name3
2  2        name1; name2
3  3               name1
library(tidyr)
library(stringr)
str_count(df$author, ";")
[1] 2 1 0
max_n_authors <- max(str_count(df$author, ";")) + 1
max_n_authors
[1] 3
paste("author", 1:max_n_authors)
[1] "author 1" "author 2" "author 3"
df <- df %>% 
    separate(., col = author, into = paste("author", 1:max_n_authors))
Warning message:
Too few values at 2 locations: 2, 3 
df
  id author 1 author 2 author 3
1  1    name1    name2    name3
2  2    name1    name2     <NA>
3  3    name1     <NA>     <NA>

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45858133

复制

相似问题

问在单独的列(tidyr)中"'into‘是缺失的“
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在单独的列(tidyr)中"'into‘是缺失的“EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在单独的列(tidyr)中"'into‘是缺失的“
EN