安装完RHadoop,当然要进行一下例子测试,看了网上相关的关于wordcount的例子,还是有不少,有些还比较模糊,于是就把自己下载的代码与编译结果记录一下:
library(rmr2) library(rhdfs) hdfs.init() rmr.options(backend = "local") # Word count -------------------------------------------------------------- ebookLocation <- "/home/ndscbigdata/wofile.txt" m <- mapreduce(input = ebookLocation, input.format = "text", map = function(k, v){ words <- unlist(strsplit(v, split = "[[:space:][:punct:]]")) words <- tolower(words) words <- gsub("[0-9]", "", words) words <- words[words != ""] wordcount <- table(words) keyval( key = names(wordcount), val = as.numeric(wordcount) ) }, reduce = function(k, counts){ keyval(key = k, val = sum(counts)) } ) # Retrieve results and prepare to plot ------------------------------------ x <- from.dfs(m) dat <- data.frame( word = keys(x), count = values(x) ) dat <- dat[order(dat$count, decreasing=TRUE), ] head(dat, 50) with(head(dat, 25), plot(count, names = word))
其结果呈现在RStudio示例如下: