3 种葡萄酒;测量13个指标;总共178个样本
数据集下载链接 https://acadgildsite.s3.amazonaws.com/wordpress_images/r/wineDataset_Kmeans/Wine.csv
df<-read.csv("Wine.csv",header = T)
head(df)
df$Customer_Segment<-as.factor(df$Customer_Segment)
summary(df)
dim(df)
winepca<-prcomp(df[,1:13],scale. = T)
library(factoextra)
fviz_eig(winepca,addlabels = T)
fviz_pca_ind(winepca,col.ind = df$Customer_Segment,
addEllipses = T,geom=("point"),legend.title="")
image.png
image.png
原文链接 Analyzing Wine dataset using K-means Clustering
K均值聚类是最简单也是最常用的聚类算法之一。他试图找到代表数据特定区域的簇中心。算法交替执行以下两个步骤:将每个数据点分配给最近的簇中心,然后将每个簇中心设置为所分配的所有数据点的平均值。如果簇的分配不在发生变化,那么算法结束。
--《Python机器学习基础教程》
library(factoextra)
df<-read.csv("Wine.csv",header = T)
winescale<-scale(df[,1:13])
head(winescale)
fviz_nbclust(winescale,kmeans,method='wss')+
geom_vline(xintercept=3,linetype=5,col="darkred")
winekmeans<-kmeans(winescale,3,nstart=25)
winekmeans
winekmeans$centers
winekmeans$size
fviz_cluster(object=winekmeans,data=winescale,ellipse.type = "norm",
geom = ("point"),palette='jco',main="",
ggtheme=theme_minimal())
image.png
image.png