前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >R语言包_plyr

R语言包_plyr

作者头像
用户1147754
发布2019-05-26 12:08:26
1.1K0
发布2019-05-26 12:08:26
举报
文章被收录于专栏:YoungGyYoungGy
  • 基础
  • R函数和plyr
  • plyr包中一些有用的函数
  • R程序
  • 参考资料

plyr: The split-apply-combine strategy for R

不再是循环,而是向量操作,这个包的目的是简化apply类函数。 其相当于splitapply函数的整合。

基础

这里写图片描述
这里写图片描述

R函数和plyr

这里写图片描述
这里写图片描述
代码语言:javascript
复制
#split
pieces = split(baseball[,6:9],baseball$year)
#apply
results = vector("list",length(pieces))
names = names(pieces)
for (i in seq(1,length(pieces))) {
    piece = pieces[[i]]
    results[[i]] = colMeans(piece)
}
#conbine
result = do.call("rbind",results)
result = as.data.frame(result)
result$name = names(pieces)

# an easy way
result2 = ddply(baseball,"year",function(df) colMeans(df[,6:9]))

# contrast
head(result2)
head(result)

plyr包中一些有用的函数

代码语言:javascript
复制
# each 将多个函数放在一起输出
each(min, max)(1:10)
each(length, mean, var)(rnorm(100))
each("min", "max")(1:10)
each(c("min", "max"))(1:10)
each(c(min, max))(1:10)
# colwise 将原来只能计算vector的函数转化为可以计算df的列
nmissing <- function(x) sum(is.na(x))
colwise(nmissing)(baseball)
ddply(baseball, .(year), colwise(nmissing))
ddply(baseball, .(year), colwise(nmissing, c("sb", "cs", "so")))
ddply(baseball, .(year), colwise(nmissing, ~ sb + cs + so))
ddply(baseball, .(year), colwise(nmissing, is.character))
ddply(baseball, .(year), colwise(nmissing, is.numeric))
ddply(baseball, .(year), colwise(nmissing, is.discrete))
ddply(baseball, .(year), numcolwise(nmissing))
ddply(baseball, .(year), catcolwise(nmissing))
numcolwise(mean)(baseball, na.rm = TRUE)
numcolwise(mean, na.rm = TRUE)(baseball)
# arrange 省略了order繁琐的步骤,可以给df快速排序
mtcars[with(mtcars, order(cyl, disp)), ]
arrange(mtcars, cyl, disp)
myCars = cbind(vehicle=row.names(mtcars), mtcars)
arrange(myCars, cyl, disp)
arrange(myCars, cyl, desc(disp))
# rename 可以根据变量名而不是变量位置重新命名
x <- c("a" = 1, "b" = 2, d = 3, 4)
x <- rename(x, replace = c("d" = "c"))
rename(mtcars, c("disp" = "displacement"))
# count 等效as.data.frame(table(x))
count(baseball[1:100,], vars = "id")
count(baseball[1:100,], vars = "id", wt_var = "g")
count(baseball[1:100,], c("id", "year"))
# match_df 配合count,选出符合条件的行
longterm <- subset(count(baseball, "id"), freq > 25)
bb_longterm <- match_df(baseball, longterm, on="id")
# join 类似sql中的join,比merge速度更快
first <- ddply(baseball, "id", summarise, first = min(year))
system.time(b2 <- merge(baseball, first, by = "id", all.x = TRUE))
system.time(b3 <- join(baseball, first, by = "id"))

R程序

代码语言:javascript
复制
# a simple example
set.seed(1)
d = data.frame(year=rep(2000:2002,each=3), count=round(runif(9,0,20)))
d
ddply(d,"year",function(x) {
    mean.count = mean(x$count)
    sd.count = sd(x$count)
    cv = sd.count/mean.count
    data.frame(cv.count=cv)
})

# transform summarise mutate(like transform)
ddply(d,"year",summarise,mu=mean(count),sigma=sd(count),cv=sigma/mu)
ddply(d,"year",transform,mu=mean(count),sigma=sd(count))
ddply(d,"year",mutate,mu=mean(count),sigma=sd(count),cv=sigma/mu)

# build seperate models
model = function(df) {
    lm(hwy~year,data=df)
}
models = dlply(mpg,.(cyl),model)
coefs = ldply(models,function(x) coef(x))

# plot
opar = par()
par(opar)
par(mfrow=c(1,3), mar=c(2,2,1,1), oma=c(3,3,0,0))
d_ply(d,"year",summarise,plot(count,main=unique(year),type="o"))
mtext("count",side=1,outer=T,line=1)
mtext("frequency",side=2,outer=T,line=1)

library(ggplot2)
ggplot(d,aes(x=year,y=count)) + geom_line() + facet_grid(year~.)

# nested chunking of the data
baseball.dat = subset(baseball,year>2000)
head(baseball.dat)
x = ddply(baseball.dat,c("year","team"),summarize,homeruns=sum(hr))
head(x)

# deal with errors
f = function(x) if(x==1) stop("error!") else 1
safe.f = failwith(NA,f,quiet = T)
llply(1:2,f)
llply(1:2,safe.f)

# parallel processing
x = c(1:10)
wait = function(i) Sys.sleep(0.1)
system.time(llply(x,wait))
system.time(sapply(x,wait))
install.packages("doMC")
library(doMC)
registerDoMC(2)
system.time(llply(x,wait,.parallel=T))

# plyr flaws: low speed than build-in function
system.time(ddply(baseball,"id",summarize,length(year)))
system.time(tapply(baseball$year,baseball$id,function(x) length(x)))

参考资料

Sean Anderson 的R教程

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2015年09月01日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 基础
  • R函数和plyr
  • plyr包中一些有用的函数
  • R程序
  • 参考资料
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档