我有一个政治捐赠数据集,其中包含行业类别的字母数字代码。单独的文本文档列出了这些字母数字代码如何转换为行业名称、部门名称和行业类别名称。
例如,"A1200“是农业企业中的作物生产行业,属于甘蔗产业类别。我想知道如何在单独的列中将字母数字代码与它们各自的扇区、行业和类别值配对。
现在,代码值数据集位于
Catcode Catname Catorder Industry Sector
A1200 Sugar cane A01 Crop Production Agribusiness而这个行业捐赠数据集:
Business name Amount donated Year Category
Sarah Farms 1000 2010 A1200类别数据集约为444行,捐赠集约为100万行。我如何感觉捐赠数据集,所以它看起来像这样。类别将是常见的名称。
Catcode Catname Catorder Industry Sector Business name Amount donated Year Category
A1200 Sugar cane A01 Crop Production Agribusiness Sarah Farms 1000 2010 A1200我对这些论坛有点陌生,所以如果有更好的方式来问这个问题,请告诉我。谢谢你的帮助!
发布于 2014-12-02 03:52:58
如果速度是问题,您可能需要使用data.table或dplyr。在这里,我对您的示例数据进行了一些修改,以提供一些想法。
df1 <- data.frame(Catcode = c("A1200", "B1500", "C1800"),
Catname = c("Sugar", "Salty", "Butter"),
Catorder = c("cane A01", "cane A01", "cane A01"),
Industry = c("Crop Production", "Crop Production", "Crop Production"),
Sector = c("Agribusiness", "Agribusiness", "Agribusiness"),
stringsAsFactors = FALSE)
# Catcode Catname Catorder Industry Sector
#1 A1200 Sugar cane A01 Crop Production Agribusiness
#2 B1500 Salty cane A01 Crop Production Agribusiness
#3 C1800 Butter cane A01 Crop Production Agribusiness
df2 <- data.frame(BusinessName = c("Sarah Farms", "Ben Farms"),
AmountDonated = c(100, 200),
Year = c(2010, 2010),
Category = c("A1200", "B1500"),
stringsAsFactors = FALSE)
# BusinessName AmountDonated Year Category
#1 Sarah Farms 100 2010 A1200
#2 Ben Farms 200 2010 B1500
library(dplyr)
library(data.table)
# 1) dplyr option
# Catcode C1800 will be dropped since it does not exist in both data frames.
inner_join(df1, df2, by = c("Catcode" = "Category"))
# Catcode Catname Catorder Industry Sector BusinessName AmountDonated Year
#1 A1200 Sugar cane A01 Crop Production Agribusiness Sarah Farms 100 2010
#2 B1500 Salty cane A01 Crop Production Agribusiness Ben Farms 200 2010
# Catcide C1800 remains
left_join(df1, df2, by = c("Catcode" = "Category"))
# Catcode Catname Catorder Industry Sector BusinessName AmountDonated Year
#1 A1200 Sugar cane A01 Crop Production Agribusiness Sarah Farms 100 2010
#2 B1500 Salty cane A01 Crop Production Agribusiness Ben Farms 200 2010
#3 C1800 Butter cane A01 Crop Production Agribusiness <NA> NA NA
# 2) data.table option
# Convert data.frame to data.table
setDT(df1)
setDT(df2)
#Set columns for merge
setkey(df1, "Catcode")
setkey(df2, "Category")
df1[df2]
# Catcode Catname Catorder Industry Sector BusinessName AmountDonated Year
#1: A1200 Sugar cane A01 Crop Production Agribusiness Sarah Farms 100 2010
#2: B1500 Salty cane A01 Crop Production Agribusiness Ben Farms 200 2010
df2[df1]
# BusinessName AmountDonated Year Category Catname Catorder Industry Sector
#1: Sarah Farms 100 2010 A1200 Sugar cane A01 Crop Production Agribusiness
#2: Ben Farms 200 2010 B1500 Salty cane A01 Crop Production Agribusiness
#3: NA NA NA C1800 Butter cane A01 Crop Production Agribusiness发布于 2014-12-02 02:57:59
我想你是在问如何质疑..。不是你吗?
SELECT *
FROM
code values dataset(your table for this) a
LEFT JOIN industry donation dataset(your table for this) b
ON a.CatCode = b.Category发布于 2014-12-02 02:59:11
正如krlmlr所建议的:
> merge(df1, df2, by.x = "Catcode", by.y = "Category", all = T)
Catcode Catname Catorder Industry Sector Business_name Amount_donated Year
1 A1200 Sugar_cane A01 Crop_Production Agribusiness Sarah_Farms 1000 2010但是,您应该避免列名和值中的空格。我用_代替了它们
https://stackoverflow.com/questions/27241043
复制相似问题