首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >扩展以data.table为参数的函数以使用完整的表(而不是子集)

扩展以data.table为参数的函数以使用完整的表(而不是子集)
EN

Stack Overflow用户
提问于 2016-02-18 13:27:43
回答 2查看 96关注 0票数 5

对于1行的data.table (data.frame),我有一个可以工作的函数,但对完整的data.table不起作用。我想对函数进行扩展,以考虑到输入data.table的所有行。

这一论点的要点如下:

字段为字符串的data.table (tryshort3)需要替换为来自另一个data.table (mapping)的另一个字符串,如下所示:

代码语言:javascript
运行
复制
#this is the original data.table
tryshort3 <- structure(list(country = c("AT", "AT", "MT", "DE", "CH", "XK"
), name = c("ASDF AG", "ASDF GMBH", "ASDF DF", "ASDF KG", "ASDF SA", 
"ASDF DAF"), address = c("ACDSTR. 3", "ACDSTR. 4", "ACDSTR. 5", 
"ACDSTR. 6", "ACDSTR. 7", "ACDSTR. 8")), .Names = c("country", 
"name", "address"), row.names = c(NA, -6L), class = c("data.table", 
"data.frame"))



#this is the "mapping
mapping <- structure(list(country = c("AT", "AT", "DE", "DE", "HU"), short.form = c("AG", 
"GMBH", "GMBH", "EV", "EV"), long.form = c("AKTIENGESELLSCHAFT", 
"GESELLSCHAFT MIT BESCHRANKTER HAFTUNG", "GESELLSCHAFT MIT BESCHRANKTER HAFTUNG", 
"EINGETRAGENE VEREIN", "EGYENI VALLALKOZO")), .Names = c("country", 
"short.form", "long.form"), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"), sorted = "country")


#this is the function that I am using (please not that both data.tables are keyed, but that has currently no say in the output (just avoids throwing an error):

substituting_short_form <- function(input) {
  #supply one data.frame of 1 row, the other data.frame is external to the function
  #get country from input
  setkey(input,country)
  setkey(mapping,country)
  matched_country <- input$country
  #subset of mapping to only the country from the input
  matched_map <- mapping[country == matched_country]
  #get list of short.forms from matched 
  list_of_relevant_short_forms <- matched_map[,short.form]
  #which one matches will return true if there is any match, THIS IS A NUMBER THAT WILL HAVE TO BE MATCHED TO mapping again to retrieve the correct form
  #error catching for when there is no short form found, or no country found if there is no long form it does not matter!
  indextrue <- tryCatch(which(unlist(lapply(list_of_relevant_short_forms, function(y) grepl(y, input$name)))), error = function(e) return(input))
  #substitute
  pattern_to_substitute <- paste0("(\\s|^)", matched_map[indextrue,short.form], "(\\s|$)")
  pattern_to_replace <- paste0("\\1", matched_map[indextrue,long.form], "\\2")
  input$name[1] <- gsub(pattern = pattern_to_substitute, replacement = pattern_to_replace,input$name ,    perl = TRUE)
  return(input)
}

简而言之,此函数所做的是将tryshort3 asn作为输入(目前只使用tryshort3[1,]),并在字段中替换mapping表中找到的值,如下所示:

代码语言:javascript
运行
复制
> tryshort3[1,]
   country    name   address
1:      AT ASDF AG ACDSTR. 3
> substituting_short_form(tryshort3[1,])
   country                    name   address
1:      AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3

我想要的是,作为输入提供完整的data.table,并得到相同的输出(长度相同的data.table ),下面是我的预期输出:

代码语言:javascript
运行
复制
   country                    name   address
1:      AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3
2:      AT ASDF GESELLSCHAFT MIT BESCHRANKTER HAFTUNG ACDSTR. 4
3:      CH ASDF SA ACDSTR. 7
4:      DE ASDF KG ACDSTR. 6
5:      MT ASDF DF ACDSTR. 5
6:      XK ASDF DAF ACDSTR. 8

我想要的解决方案是函数apply(tryshort3, 1, function(x) substituting_short_form(x) )中的一些内容,也许使用两个data.tables的索引功能,或者从内部使用来自nlmegapply

EN

Stack Overflow用户

回答已采纳

发布于 2016-02-18 14:14:00

也许你可以尝试几个步骤:

代码语言:javascript
运行
复制
# create the shortform variable in tryshort3
tryshort3[, short.form := sub(".+\\s([^s]+)$", "\\1", name)]

# add the info from mapping
tryshort3long <- merge(tryshort3, mapping, all.x=TRUE, by=c("country", "short.form"))

# replace the short form by long form in the name and suppress the variables you don't need 
# (thanks to @DavidArenburg for the simplification of the "replace" part!)
tryshort3long[!is.na(long.form), 
              name := paste(sub(" .*", "", name), long.form)
              ][, c("long.form", "short.form") := NULL]

tryshort3long
   # country                                       name   address
# 1:      AT                    ASDF AKTIENGESELLSCHAFT ACDSTR. 3
# 2:      AT ASDF GESELLSCHAFT MIT BESCHRANKTER HAFTUNG ACDSTR. 4
# 3:      CH                                    ASDF SA ACDSTR. 7
# 4:      DE                                    ASDF KG ACDSTR. 6
# 5:      MT                                    ASDF DF ACDSTR. 5
# 6:      XK                                   ASDF DAF ACDSTR. 8

注:对不起,我只是把它放在你的例子data.table上,而不是作为一个函数

票数 4
EN
查看全部 2 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/35482781

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档