首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >将类别代码值合并到R中的数据集

将类别代码值合并到R中的数据集
EN

Stack Overflow用户
提问于 2014-12-02 02:46:00
回答 3查看 243关注 0票数 0

我有一个政治捐赠数据集,其中包含行业类别的字母数字代码。单独的文本文档列出了这些字母数字代码如何转换为行业名称、部门名称和行业类别名称。

例如,"A1200“是农业企业中的作物生产行业,属于甘蔗产业类别。我想知道如何在单独的列中将字母数字代码与它们各自的扇区、行业和类别值配对。

现在,代码值数据集位于

代码语言:javascript
运行
复制
    Catcode Catname     Catorder    Industry             Sector      
    A1200   Sugar cane  A01         Crop Production  Agribusiness

而这个行业捐赠数据集:

代码语言:javascript
运行
复制
Business name    Amount donated    Year   Category
Sarah Farms      1000              2010   A1200

类别数据集约为444行,捐赠集约为100万行。我如何感觉捐赠数据集,所以它看起来像这样。类别将是常见的名称。

代码语言:javascript
运行
复制
    Catcode Catname     Catorder    Industry             Sector          Business name    Amount donated    Year   Category
    A1200   Sugar cane  A01         Crop Production  Agribusiness     Sarah Farms      1000              2010   A1200

我对这些论坛有点陌生,所以如果有更好的方式来问这个问题,请告诉我。谢谢你的帮助!

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2014-12-02 03:52:58

如果速度是问题,您可能需要使用data.tabledplyr。在这里,我对您的示例数据进行了一些修改,以提供一些想法。

代码语言:javascript
运行
复制
df1 <- data.frame(Catcode = c("A1200", "B1500", "C1800"),
                  Catname = c("Sugar", "Salty", "Butter"),
                  Catorder = c("cane A01", "cane A01", "cane A01"),
                  Industry = c("Crop Production", "Crop Production", "Crop Production"),
                  Sector = c("Agribusiness", "Agribusiness", "Agribusiness"),
                  stringsAsFactors = FALSE)

#  Catcode Catname Catorder        Industry       Sector
#1   A1200   Sugar cane A01 Crop Production Agribusiness
#2   B1500   Salty cane A01 Crop Production Agribusiness
#3   C1800  Butter cane A01 Crop Production Agribusiness

df2 <- data.frame(BusinessName = c("Sarah Farms", "Ben Farms"),
                  AmountDonated = c(100, 200),
                  Year = c(2010, 2010),
                  Category = c("A1200", "B1500"),
                  stringsAsFactors = FALSE)

#  BusinessName AmountDonated Year Category
#1  Sarah Farms           100 2010    A1200
#2    Ben Farms           200 2010    B1500

library(dplyr)
library(data.table)

# 1) dplyr option
# Catcode C1800 will be dropped since it does not exist in both data frames.
inner_join(df1, df2, by = c("Catcode" = "Category"))

#      Catcode Catname Catorder        Industry       Sector BusinessName AmountDonated Year
#1   A1200   Sugar cane A01 Crop Production Agribusiness  Sarah Farms           100 2010
#2   B1500   Salty cane A01 Crop Production Agribusiness    Ben Farms           200 2010

# Catcide C1800 remains
left_join(df1, df2, by = c("Catcode" = "Category"))

#      Catcode Catname Catorder        Industry       Sector BusinessName AmountDonated Year
#1   A1200   Sugar cane A01 Crop Production Agribusiness  Sarah Farms           100 2010
#2   B1500   Salty cane A01 Crop Production Agribusiness    Ben Farms           200 2010
#3   C1800  Butter cane A01 Crop Production Agribusiness         <NA>            NA   NA

# 2) data.table option
# Convert data.frame to data.table
setDT(df1)
setDT(df2)

#Set columns for merge
setkey(df1, "Catcode")
setkey(df2, "Category")

df1[df2]

#   Catcode Catname Catorder        Industry       Sector BusinessName AmountDonated Year
#1:   A1200   Sugar cane A01 Crop Production Agribusiness  Sarah Farms           100 2010
#2:   B1500   Salty cane A01 Crop Production Agribusiness    Ben Farms           200 2010

df2[df1]
#   BusinessName AmountDonated Year Category Catname Catorder        Industry       Sector
#1:  Sarah Farms           100 2010    A1200   Sugar cane A01 Crop Production Agribusiness
#2:    Ben Farms           200 2010    B1500   Salty cane A01 Crop Production Agribusiness
#3:           NA            NA   NA    C1800  Butter cane A01 Crop Production Agribusiness
票数 2
EN

Stack Overflow用户

发布于 2014-12-02 02:57:59

我想你是在问如何质疑..。不是你吗?

代码语言:javascript
运行
复制
SELECT * 
FROM
code values dataset(your table for this) a
LEFT JOIN industry donation dataset(your table for this) b 
ON a.CatCode = b.Category
票数 0
EN

Stack Overflow用户

发布于 2014-12-02 02:59:11

正如krlmlr所建议的:

代码语言:javascript
运行
复制
> merge(df1, df2, by.x = "Catcode", by.y = "Category", all = T)
  Catcode    Catname Catorder        Industry       Sector Business_name Amount_donated Year
1   A1200 Sugar_cane      A01 Crop_Production Agribusiness   Sarah_Farms           1000 2010

但是,您应该避免列名和值中的空格。我用_代替了它们

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/27241043

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档