文章/答案/技术大牛

发布

社区首页 >问答首页 >将类别代码值合并到R中的数据集

问将类别代码值合并到R中的数据集
EN

Stack Overflow用户

提问于 2014-12-02 02:46:00

回答 3查看 243关注 0票数 0

我有一个政治捐赠数据集，其中包含行业类别的字母数字代码。单独的文本文档列出了这些字母数字代码如何转换为行业名称、部门名称和行业类别名称。

例如，"A1200“是农业企业中的作物生产行业，属于甘蔗产业类别。我想知道如何在单独的列中将字母数字代码与它们各自的扇区、行业和类别值配对。

现在，代码值数据集位于

    Catcode Catname     Catorder    Industry             Sector      
    A1200   Sugar cane  A01         Crop Production  Agribusiness

而这个行业捐赠数据集：

Business name    Amount donated    Year   Category
Sarah Farms      1000              2010   A1200

类别数据集约为444行，捐赠集约为100万行。我如何感觉捐赠数据集，所以它看起来像这样。类别将是常见的名称。

    Catcode Catname     Catorder    Industry             Sector          Business name    Amount donated    Year   Category
    A1200   Sugar cane  A01         Crop Production  Agribusiness     Sarah Farms      1000              2010   A1200

我对这些论坛有点陌生，所以如果有更好的方式来问这个问题，请告诉我。谢谢你的帮助!

merge

回答 3

Stack Overflow用户

回答已采纳

发布于 2014-12-02 03:52:58

如果速度是问题，您可能需要使用data.table或dplyr。在这里，我对您的示例数据进行了一些修改，以提供一些想法。

df1 <- data.frame(Catcode = c("A1200", "B1500", "C1800"),
                  Catname = c("Sugar", "Salty", "Butter"),
                  Catorder = c("cane A01", "cane A01", "cane A01"),
                  Industry = c("Crop Production", "Crop Production", "Crop Production"),
                  Sector = c("Agribusiness", "Agribusiness", "Agribusiness"),
                  stringsAsFactors = FALSE)

#  Catcode Catname Catorder        Industry       Sector
#1   A1200   Sugar cane A01 Crop Production Agribusiness
#2   B1500   Salty cane A01 Crop Production Agribusiness
#3   C1800  Butter cane A01 Crop Production Agribusiness

df2 <- data.frame(BusinessName = c("Sarah Farms", "Ben Farms"),
                  AmountDonated = c(100, 200),
                  Year = c(2010, 2010),
                  Category = c("A1200", "B1500"),
                  stringsAsFactors = FALSE)

#  BusinessName AmountDonated Year Category
#1  Sarah Farms           100 2010    A1200
#2    Ben Farms           200 2010    B1500

library(dplyr)
library(data.table)

# 1) dplyr option
# Catcode C1800 will be dropped since it does not exist in both data frames.
inner_join(df1, df2, by = c("Catcode" = "Category"))

#      Catcode Catname Catorder        Industry       Sector BusinessName AmountDonated Year
#1   A1200   Sugar cane A01 Crop Production Agribusiness  Sarah Farms           100 2010
#2   B1500   Salty cane A01 Crop Production Agribusiness    Ben Farms           200 2010

# Catcide C1800 remains
left_join(df1, df2, by = c("Catcode" = "Category"))

#      Catcode Catname Catorder        Industry       Sector BusinessName AmountDonated Year
#1   A1200   Sugar cane A01 Crop Production Agribusiness  Sarah Farms           100 2010
#2   B1500   Salty cane A01 Crop Production Agribusiness    Ben Farms           200 2010
#3   C1800  Butter cane A01 Crop Production Agribusiness         <NA>            NA   NA

# 2) data.table option
# Convert data.frame to data.table
setDT(df1)
setDT(df2)

#Set columns for merge
setkey(df1, "Catcode")
setkey(df2, "Category")

df1[df2]

#   Catcode Catname Catorder        Industry       Sector BusinessName AmountDonated Year
#1:   A1200   Sugar cane A01 Crop Production Agribusiness  Sarah Farms           100 2010
#2:   B1500   Salty cane A01 Crop Production Agribusiness    Ben Farms           200 2010

df2[df1]
#   BusinessName AmountDonated Year Category Catname Catorder        Industry       Sector
#1:  Sarah Farms           100 2010    A1200   Sugar cane A01 Crop Production Agribusiness
#2:    Ben Farms           200 2010    B1500   Salty cane A01 Crop Production Agribusiness
#3:           NA            NA   NA    C1800  Butter cane A01 Crop Production Agribusiness

票数 2

Stack Overflow用户

发布于 2014-12-02 02:57:59

我想你是在问如何质疑..。不是你吗?

SELECT * 
FROM
code values dataset(your table for this) a
LEFT JOIN industry donation dataset(your table for this) b 
ON a.CatCode = b.Category

票数 0

Stack Overflow用户

发布于 2014-12-02 02:59:11

正如krlmlr所建议的：

> merge(df1, df2, by.x = "Catcode", by.y = "Category", all = T)
  Catcode    Catname Catorder        Industry       Sector Business_name Amount_donated Year
1   A1200 Sugar_cane      A01 Crop_Production Agribusiness   Sarah_Farms           1000 2010

但是，您应该避免列名和值中的空格。我用_代替了它们

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/27241043

复制

相似问题

问将类别代码值合并到R中的数据集
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将类别代码值合并到R中的数据集EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将类别代码值合并到R中的数据集
EN