我想合并一些列,并创建一个由分号分隔的列包含列表(或类似python中的字典)。
基本上,我有这样的数据帧:(空格是缺失值)
ID Event Category Start Time End Time Account No. Dosage Doctor's_ID
1 Stroke 1/1/2011
1 Admitted 1/6/2011 24287939 5487
1 Diagnosed 1/25/2011
6 Diagnosed 1/1/2011
6 Drug A 1/2/2011 1/10/2011 "high"
6 Drug B 1/7/2011 1/20/2011 35287930 "medium"
10 Drug A 1/3/2011 1/6/2011 "low"
10 Drug B 1/9/2011 1/13/2011 "high"
10 Stroke 1/8/2011
我想创建一个列attribute
,它合并列中的几个列,并在分号分隔符。
输出文件(可以是文本文件)外观:
ID Event Category Start Time End Time attributes
1 Stroke 1/1/2011
1 Admitted 1/6/2011 Account No.="24287939"; Doctor's_ID="5487"
1 Diagnosed 1/25/2011
6 Diagnosed 1/1/2011
6 Drug A 1/2/2011 1/10/2011 Dosage="high"
6 Drug B 1/7/2011 1/20/2011 Account No.="35287930"; Dosage="medium"
10 Drug A 1/3/2011 1/6/2011 Dosage="low"
10 Drug B 1/9/2011 1/13/2011 Dosage="high"
10 Stroke 1/8/2011
我的目的是编写一个文本文件,其中的列由制表符分隔符("\t")和属性数据(最后一列)分隔,就像一个由";“分隔的列表。
有关所需输出的更多详细信息,请单击此处http://www.cs.umd.edu/hcil/eventflow/manual/chapter_start.html#1.4
我怎样才能在R中做到这一点?
发布于 2018-06-09 06:07:45
一种选择是使用apply
函数并按行传递最后3列的数据。apply
的好处是将行数据传递给函数作为named-vector
,其中name
与列名匹配。
现在,必须首先使用paste
将命名向量的name
和value
组合在一起,然后使用paste0
函数的collapse=";"
参数再次合并到一个字符串中。解决方案如下:
cbind(df[1:4],Attribute =
apply(df[,5:7],1, function(x)paste0(paste(names(x[!is.na(x)]),x[!is.na(x)], sep = "="),
collapse = ";")))
# ID Event.Category Start.Time End.Time Attribute
# 1 1 Stroke 1/1/2011 <NA>
# 2 1 Admitted 1/6/2011 <NA> Account.No.=24287939;Doctor.s_ID=5487
# 3 1 Diagnosed 1/25/2011 <NA>
# 4 6 Diagnosed 1/1/2011 <NA>
# 5 6 Drug A 1/2/2011 1/10/2011 Dosage=high
# 6 6 Drug B 1/7/2011 1/20/2011 Account.No.=35287930;Dosage=medium
# 7 10 Drug A 1/3/2011 1/6/2011 Dosage=low
# 8 10 Drug B 1/9/2011 1/13/2011 Dosage=high
# 9 10 Stroke 1/8/2011 <NA>
数据:
df <- read.table(text =
'ID "Event Category" "Start Time" "End Time" "Account No." Dosage Doctor\'s_ID
1 Stroke 1/1/2011 NA NA NA NA
1 Admitted 1/6/2011 NA 24287939 NA 5487
1 Diagnosed 1/25/2011 NA NA NA NA
6 Diagnosed 1/1/2011 NA NA NA NA
6 "Drug A" 1/2/2011 1/10/2011 NA "high" NA
6 "Drug B" 1/7/2011 1/20/2011 35287930 "medium" NA
10 "Drug A" 1/3/2011 1/6/2011 NA "low" NA
10 "Drug B" 1/9/2011 1/13/2011 NA "high" NA
10 Stroke 1/8/2011 NA NA NA NA',
stringsAsFactors = FALSE, header = TRUE)
https://stackoverflow.com/questions/50768747
复制相似问题