我有一个数据集,这是基础&还有7个其他的数据集,用于7个不同的年份&对于3个不同的地区,.These数据集包括数量,地区和年份,这是基本数据的共同之处。
但是,我需要将7个数据集逐个合并到基本dataset.How中,我是否可以做到这一点?
基本数据集:
company_region raised_amount_usd Year
SF Bay Area 1000050 2011
SF Bay Area 2520000 2011
SF Bay Area 15000 2010
Singapore 615000 2011
2007年:
raised_amount_usd z e Year company_region
1.00E+06 5 0 2007 Singapore
8.00E+06 6 1 2007 Singapore
50000 3 0 2007新加坡35000 3 0 2007新加坡
类似地,我有2008-2012年的其他年份的数据。我需要在我的基础数据set.Instead中使用列z和e来编写7个merge语句,如何通过一个函数来实现?
如果有人能提前帮助out.Thanks那就太好了!
发布于 2016-08-08 20:42:02
如果您想保留z和e列,dplyr包中的bind_rows()似乎就是答案(另请参阅此处的Combine two data frames by rows (rbind) when they have different sets of columns)。
# Create example
a <- c(rep("SF Bay Area",3),"Singapore")
b <- c(1000050,2520000,15000,615000)
c <- c(2011,2010,2011,2011)
base <- cbind.data.frame(a,b,c,stringsAsFactors =F)
colnames(base) <- c("company_region","raised_amount_usd","Year")
a <- c(rep("Germany",4))
b <- c(100055,2524400,150020,68880)
c <- c(2007,2007,2007,2007)
e <- c(1,1,1,1)
z <- c(1,1,1,1)
data_germany <- cbind.data.frame(a,b,c,e,z,stringsAsFactors =F)
colnames(data_germany) <- c("company_region","raised_amount_usd","Year","e","z")
a <- c(rep("Italy",4))
b <- c(100055,2524400,150020,68880)
c <- c(2007,2007,2007,2007)
e <- c(1,1,1,1)
z <- c(1,1,1,1)
data_italy <- cbind.data.frame(a,b,c,e,z,stringsAsFactors =F)
colnames(data_italy) <- c("company_region","raised_amount_usd","Year","e","z")
# bin german and italian data at once with dplyr
library(dplyr)
base %>%
bind_rows(data_germany) %>%
bind_rows(data_italy) -> base
如果你不想保留z和e,你可以这样做:
# Function to extent base dataframe
# base_df = base dataframe to extent
# add_df = dataframe that should be added to the base dataframe
fun_extent_data <- function(base_df,add_df) {
library(dplyr)
base_df <- base_df
add_df <- add_df
# Choose all necessary columns
add_df %>%
select(company_region,raised_amount_usd,Year) -> add_df_light
# Bind the data to the base dataframe
rbind.data.frame(base_df,add_df_light,stringsAsFactors = FALSE) -> base_df
return(base_df)
}
# Use function
fun_extent_data(base,data_germany) -> base
# Use function for german and italian data at once with dplyr
library(dplyr)
base %>%
fun_extent_data(.,data_germany) %>%
fun_extent_data(.,data_italy) -> base
https://stackoverflow.com/questions/38828768
复制相似问题