目的:为了能够快速建立哑变量,兼容更多的算法
其实类似的包有很多,但是这个包用起来比较舒服,简单 特点:
dummy_cols(
.data,
select_columns = NULL,
remove_first_dummy = FALSE,
remove_most_frequent_dummy = FALSE,
ignore_na = FALSE,
split = NULL,
remove_selected_columns = FALSE
)
crime <- data.frame(city = c("SF", "SF", "NYC"),
year = c(1990, 2000, 1990),
crime = 1:3)
dummy_cols(crime)
# Include year column
dummy_cols(crime, select_columns = c("city", "year"))
# Remove first dummy for each pair of dummy columns made
dummy_cols(crime, select_columns = c("city", "year"),
remove_first_dummy = TRUE)
如果批量处理的话,完全可以纳入自定义函数中,可以结合更多的功能实现批量话的目的
to_dummy <- function(data,to_dumvar ) {
library(fastDummies)
data_dum <- dummy_cols(data,
select_columns=to_dumvar,
remove_most_frequent_dummy = TRUE,
ignore_na=TRUE,
split="_",
remove_selected_columns=TRUE)
return(data_dum)
}
love & peace