原文地址 https://www.kaggle.com/jonathanbouchet/nuclear-power-plant-geo-dataNuclear Power Plant Locations data
skimr
: skimr
is designed to provide summary statistics about variables. It is opinionated in its defaults, but easy to modify. In base R, the most similar functions are summary()
for vectors and data frames and fivenum()
for numeric vectors. 简单理解 skim()
函数是 summary()
函数的升级版help(package="skimr")
命令查看帮助文档里面提供的小例子>summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
>fivenum(iris$Sepal.Length)
[1] 4.3 5.1 5.8 6.4 7.9
>skim(iris)
Skim summary statistics
n obs: 150
n variables: 5
-- Variable type:factor --------------------------------------------------------
variable missing complete n n_unique top_counts ordered
Species 0 150 150 3 set: 50, ver: 50, vir: 50, NA: 0 FALSE
-- Variable type:numeric -------------------------------------------------------
variable missing complete n mean sd p0 p25 p50 p75 p100 hist
Petal.Length 0 150 150 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▁▂▅▅▃▁
Petal.Width 0 150 150 1.2 0.76 0.1 0.3 1.3 1.8 2.5 ▇▁▁▅▃▃▂▂
Sepal.Length 0 150 150 5.84 0.83 4.3 5.1 5.8 6.4 7.9 ▂▇▅▇▆▅▂▂
Sepal.Width 0 150 150 3.06 0.44 2 2.8 3 3.3 4.4 ▁▂▅▇▃▂▁▁
>
lubridate
: Functions to work with data-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a data-time.简单理解就是提供处理时间格式的函数> ymd("20110604")
[1] "2011-06-04"
> mdy("06-04-2011")
[1] "2011-06-04"
> dmy("04/06/2011")
[1] "2011-06-04"
>
viridis
:调色板 The viridis color palettes: Use the color scales in this package to make plots that are pretty, better represent your data, easier to read by those with colorblindness, and print well in grey scale.ggplot(mtcars,aes(wt,mpg))+
geom_point(size=4,aes(colour=factor(cyl)))+
scale_color_viridis_d()+theme_bw()
broom
:Convert Statistical Analysis Objects into Tidy Tibbles.将统计计算结果装换成数据框格式> lmfit<-lm(mpg~wt,mtcars)
> lmfit
Call:
lm(formula = mpg ~ wt, data = mtcars)
Coefficients:
(Intercept) wt
37.285 -5.344
> summary(lmfit)
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
> broom::tidy(lmfit)
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 37.3 1.88 19.9 8.24e-19
2 wt -5.34 0.559 -9.56 1.29e-10
> broom::glance(lmfit)
# A tibble: 1 x 11
r.squared adj.r.squared sigma statistic p.value df logLik AIC
* <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
1 0.753 0.745 3.05 91.4 1.29e-10 2 -80.0 166.
# ... with 3 more variables: BIC <dbl>, deviance <dbl>,
# df.residual <int>
> broom::augment(lmfit)
# A tibble: 32 x 10
.rownames mpg wt .fitted .se.fit .resid .hat .sigma .cooksd
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Mazda RX4 21 2.62 23.3 0.634 -2.28 0.0433 3.07 1.33e-2
2 Mazda RX~ 21 2.88 21.9 0.571 -0.920 0.0352 3.09 1.72e-3
3 Datsun 7~ 22.8 2.32 24.9 0.736 -2.09 0.0584 3.07 1.54e-2
4 Hornet 4~ 21.4 3.22 20.1 0.538 1.30 0.0313 3.09 3.02e-3
5 Hornet S~ 18.7 3.44 18.9 0.553 -0.200 0.0329 3.10 7.60e-5
6 Valiant 18.1 3.46 18.8 0.555 -0.693 0.0332 3.10 9.21e-4
7 Duster 3~ 14.3 3.57 18.2 0.573 -3.91 0.0354 3.01 3.13e-2
8 Merc 240D 24.4 3.19 20.2 0.539 4.16 0.0313 3.00 3.11e-2
9 Merc 230 22.8 3.15 20.5 0.540 2.35 0.0314 3.07 9.96e-3
10 Merc 280 19.2 3.44 18.9 0.553 0.300 0.0329 3.10 1.71e-4
# ... with 22 more rows, and 1 more variable: .std.resid <dbl>
left_join
简单理解就是按照相同的列合并两个数据框使用 dplyr::rename
函数的时候报错 Error:`petal_length`=Petal.Lengthmust be a symbolorastring,nota formula
;搜索报错找到了一个解决办法https://stackoverflow.com/questions/47755534/dplyr-rename-error-new-name-old-name-must-be-a-symbol-or-a-string-not-fo自己把R由R-3.4.2换成了R-3.5.1就不在有这个报错了
fortify()
暂时还没有搞懂这个函数是什么作用,帮助文档中说这个函数可能会被舍弃 fortity may be deprecated in the future. I now recommend using the broom packagelibrary(rworldmap)
library(ggplot2)
worldMap <- fortify(map_data("world"), region = "region")
ggplot() +
geom_map(data = worldMap,
map = worldMap,aes(x = long, y = lat,
map_id = region,
group = group),
fill = "white", color = "black", size = 0.1) +
theme_fivethirtyeight(10)
library(ggplot2)
library(rworldmap)
ggplot(res) +
geom_polygon(aes(x=long, y=lat,group=group,fill=totMWe),
color='white', size=.1) +
theme_fivethirtyeight() +
theme(panel.grid.major = element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank()) +
scale_fill_gradientn(name="",
colors = rev(viridis::viridis(50))) +
guides(fill = guide_colorbar(barwidth = 20, barheight = .5)) +
labs(title="Nuclear power plant landscape in 2019",
subtitle='energy produced(MWe) by nuclear source from active powerplant')
根据上图可以得到的结论: Top 3 producers: 美国;法国;中国 朝鲜:No production