# 写给开发者的机器学习指南（六）

Practicalexamples

• Labeling ISP's based on their down/upload speed (K-NN)

• Classifying email as spam or ham (Naive Bayes)

• Ranking emails based on their content (Recommendation system)

• Predicting weight based on height (Linear Regression OLS)

• An attempt at rank prediction for top selling books using textregression

• Using unsupervised learning to merge features (PCA)

• Using Support Vector Machines (SVMS)

Labeling ISPs based on their down/uploadspeed (K-NN using Smile in Scala)

```object KNNExample {
def main(args:Array[String]): Unit = {
val basePath ="/.../KNN_Example_1.csv"
val testData =getDataFromCSV(new File(basePath))
}
defgetDataFromCSV(file: File): (Array[Array[Double]], Array[Int]) = {
val source =scala.io.Source.fromFile(file)
val data =source
.getLines()
.drop(1)
.map(x=> getDataFromString(x))
.toArray
source.close()
val dataPoints= data.map(x => x._1)
valclassifierArray = data.map(x => x._2)
return(dataPoints, classifierArray)
}
defgetDataFromString(dataString: String): (Array[Double], Int) = {
//Split thecomma separated value string into an array of strings
val dataArray:Array[String] = dataString.split(',')
//Extract thevalues from the strings
valxCoordinate: Double = dataArray(0).toDouble
valyCoordinate: Double = dataArray(1).toDouble
val classifier:Int = dataArray(2).toInt
//And returnthe result in a format that can later
//easily beused to feed to Smile
return(Array(xCoordinate, yCoordinate), classifier)
}
}```

```object KNNExample extends SimpleSwingApplication {
def top = newMainFrame {
title ="KNN Example"
val basePath ="/.../KNN_Example_1.csv"
val testData =getDataFromCSV(new File(basePath))
val plot =ScatterPlot.plot(testData._1,
testData._2,
'@',
Array(Color.red, Color.blue)
)
peer.setContentPane(plot)
size = newDimension(400, 400)
}```

...

```def main(args: Array[String]): Unit = {
val basePath ="/.../KNN_Example_1.csv"
val testData =getDataFromCSV(new File(basePath))
//Define theamount of rounds, in our case 2 and
//initialisethe cross validation
val cv = newCrossValidation(testData._2.length, validationRounds)
val testDataWithIndices = (testData
._1
.zipWithIndex,
testData
._2
.zipWithIndex)
val trainingDPSets= cv.train
.map(indexList => indexList
.map(index=> testDataWithIndices
._1.collectFirst { case (dp, `index`) => dp}.get))
valtrainingClassifierSets = cv.train
.map(indexList => indexList
.map(index=> testDataWithIndices
._2.collectFirst { case (dp, `index`) => dp}.get))
valtestingDPSets = cv.test
.map(indexList => indexList
.map(index=> testDataWithIndices
._1.collectFirst { case (dp, `index`) => dp}.get))
val testingClassifierSets= cv.test
.map(indexList => indexList
.map(index=> testDataWithIndices
._2.collectFirst { case (dp, `index`) => dp}.get))
valvalidationRoundRecords = trainingDPSets
.zipWithIndex.map(x => ( x._1,
trainingClassifierSets(x._2),
testingDPSets(x._2),
testingClassifierSets(x._2)
)
)
validationRoundRecords
.foreach {record =>
val knn =KNN.learn(record._1, record._2, 3)
//And foreach test data point make a prediction with the model
valpredictions = record
._3
.map(x=> knn.predict(x))
.zipWithIndex
//Finallyevaluate the predictions as correct or incorrect
//and countthe amount of wrongly classified data points.
val error =predictions
.map(x=> if (x._1 != record._4(x._2)) 1 else 0)
.sum
println("False prediction rate: " + error / predictions.length* 100 + "%")
}
}```

```val knn = KNN.learn(record._1, record._2, 3)
val unknownDataPoint = Array(5.3, 4.3)
val result = knn.predict(unknownDatapoint)
if (result == 0)
{
println("Internet Service Provider Alpha")
}
else if (result == 1)
{
println("Internet Service Provider Beta")
}
else
{
println("Unexpected prediction")
}```

246 篇文章31 人订阅

0 条评论

## 相关文章

1512

### 机器学习（2） - KNN识别MNIST

https://github.com/s055523/MNISTTensorFlowSharp

2172

### Python数据分析、挖掘常用工具

? 作者：深度沉迷学习 Python爱好者社区专栏作者 简书地址：https://www.jianshu.com/u/d76c6535dbc5 Python...

6448

### Numpy和MatplotlibPython科学计算——Numpy线性代数模块（linalg）随机模块（random）Python的可视化包 – Matplotlib2D图表3D图表图像显示

Python科学计算——Numpy Numpy(Numerical Python extensions)是一个第三方的Python包，用于科学计算。这个库的前身...

6094

### 基于Tensorflow实现DeepFM前言网络结构代码部分

DeepFM，Ctr预估中的大杀器，哈工大与华为诺亚方舟实验室荣耀出品，算法工程师面试高频考题，有效的结合了神经网络与因子分解机在特征学习中的优点：同时提取到低...

2314

50013

4738

3806

### 肤色检测算法 - 基于二次多项式混合模型的肤色检测。

由于能力有限，算法层面的东西自己去创新的很少，很多都是从现有的论文中学习，然后实践的。       本文涉及的很多算法，在网络上也有不少同类型的文章，但是...

28611

77529