我有以下数据集:
df<-data.frame(
identifer=c(1,2,3,4),
DF=c("Tablet","Powder","Suspension","System"),
DF_source1=c("Capsule","Powder,Metered","Tablet",NA),
DF_source2=c(NA,NA,"Tablet",NA),
DF_source3=c("Tablet, Extended Release","Liquid","Tablet",NA),
Route_source1=c("Oral","INHALATION","Oral",NA),
Route_source2=c(NA,"TOPICAL","Oral",NA),
Route_source3=c("Oral","IRRIGATION","oral",NA))我想知道哪个DF_source与DF匹配,另外我还想知道我应该采用哪条相关的路由。
我希望输出如下所示:
df_out<-data.frame(
identifer=c(1,2,3,4),
DF=c("Tablet","Powder","Suspension","System"),
DF_match=c("Tablet, Extended Release","Powder,Metered;Powder",NA,NA),
Route_match=c("Oral","INHALATION;TOPICAL",NA,NA),
DF_match_count=c(1,2,0,0),
DF_route_count=c(1,2,0,0))我试过了,但我不确定如何拉取DF_match和Route_ Match的值
df%>%mutate_at(vars(matches("(DF_source)")),
list(string_detect = ~str_detect(tolower(DF),tolower(str_replace_all(.,"/|,(\\s)?|(?<!,)\\s","|")))))如有任何帮助,将不胜感激,谢谢!
发布于 2020-06-11 04:02:19
我不完全确定这是您的想法,但希望这可能会有所帮助。
您的最终结果似乎与您的示例数据不匹配(例如,缺少主题)。
有了pivot_longer,这可能会在更整洁的表单中更容易实现。
编辑:如果列是系数,则在filter中转换为str_detect的字符。
library(tidyverse)
library(stringr)
df %>%
mutate_if(is.factor, as.character) %>%
pivot_longer(cols = -c(identifer, DF), names_to = c(".value", "number"), names_pattern = "(\\w+)(\\d+)") %>%
filter(str_detect(DF_source, DF)) %>%
group_by(identifer) %>%
summarise(DF_match = paste(DF_source, collapse = ';'),
Route_match = paste(Route_source, collapse = ';'),
match_count = n()) %>%
right_join(df[,c("identifer", "DF")], by = "identifer") %>%
select(c(identifer, DF, DF_match, Route_match, match_count))输出
# A tibble: 4 x 5
identifer DF DF_match Route_match match_count
<dbl> <chr> <chr> <chr> <int>
1 1 Tablet Tablet, Extended Release Oral 1
2 2 Powder Powder,Metered;Powder INHALATION;TOPICAL 2
3 3 Suspension NA NA NA
4 4 System NA NA NAhttps://stackoverflow.com/questions/62309510
复制相似问题