写给开发者的机器学习指南（十二）

哒呵呵

发布于 2018-08-06 17:39:39

3500

发布于 2018-08-06 17:39:39

文章被收录于专栏：鸿的学习笔记

为了能够看到我们的PCA组合词，我们现在应该通过调整代码将道琼斯指数添加到图中：

First we add this to the def top method

 //Verificationagainst DJI
    valverificationDataPath = basePath + "PCA_Example_2.csv"
    valverificationData = getDJIFromFile(new File(verificationDataPath))
    val DJIIndex =getDJIFromFile(new File(verificationDataPath))
   canvas.line("Dow Jones Index", DJIIndex._2,Line.Style.DOT_DASH, Color.BLUE)

接下我们将引进下面两个方法：

def getDJIRecordFromString(dataString: String):(Date,Double) = {
    //Split thecomma separated value string into an array of strings
    val dataArray:Array[String] = dataString.split(',')
    val format =new SimpleDateFormat("yyyy-MM-dd")
    //Extract thevalues from the strings
    val date =format.parse(dataArray(0))
    val close: Double= dataArray(4).toDouble
    //And returnthe result in a format that can later 
    //easily beused to feed to Smile
    (date,close)
  }
  defgetDJIFromFile(file: File): (Array[Date],Array[Double]) = {
    val source =scala.io.Source.fromFile(file)
    //Get all therecords (minus the header)
    val data =source
        .getLines()
        .drop(1)
        .map(x=> getDJIRecordFromString(x)).toArray
    source.close()
    //turn thetuples into two separate arrays for easier use later on
    val sortedData= data.sortBy(x => x._1)
    val dates =sortedData.map(x => x._1)
    val doubles =sortedData.map(x =>   x._2 )
    (dates,doubles)
  }

此代码加载DJI数据，并将其添加到已经包含我们自己的股票市场指数的图形上。但是，当我们执行这段代码时，结果如下。

正如你所看到的，DJI的范围和我们计算的特征都很远。这就是为什么我们现在需要规范化数据。这个想法是，我们根据数据的范围来缩放数据，这样两个数据集的规模都是相同的。

Replace the getDJIFromFile method with the following:

def getDJIFromFile(file: File):(Array[Date],Array[Double]) = {
    val source =scala.io.Source.fromFile(file)
    //Get all the records (minus the header)
    val data =source
        .getLines()
        .drop(1)
        .map(x=> getDJIRecordFromString(x))
        .toArray
    source.close()
    //turn thetuples into two separate arrays for easier use later on
    val sortedData= data.sortBy(x => x._1)
    val dates =sortedData.map(x => x._1)
    val maxDouble =sortedData.maxBy(x => x._2)._2
    val minDouble =sortedData.minBy(x => x._2)._2
    val rangeValue= maxDouble - minDouble
    val doubles =sortedData.map(x =>   x._2 /rangeValue )
    (dates,doubles)
  }
and replace the plotData definition in the methoddef top with
val maxDataValue = points.maxBy(x => x(0))
val minDataValue = points.minBy(x => x(0))
val rangeValue = maxDataValue(0) - minDataValue(0)
val plotData = points
    .zipWithIndex
    .map(x =>Array(x._2.toDouble, -x._1(0) / rangeValue))

我们看到，即使DJI的数据在0.8和1.8之间，但我们的新特征的范围在-0.5和0.5之间，回归线对应得很好。有了这个例子，并在一般部分中的PCA的解释，你现在应该能够使用PCA并将其应用到您自己的数据。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2016-11-15，如有侵权请联系 cloudcommunity@tencent.com 删除

其他

本文分享自鸿的学习笔记微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

其他

登录后参与评论

0 条评论

热度

写给开发者的机器学习指南（十二）

写给开发者的机器学习指南（十二）

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐