写给开发者的机器学习指南（十一）

哒呵呵

发布于 2018-08-06 17:38:53

2920

发布于 2018-08-06 17:38:53

文章被收录于专栏：鸿的学习笔记

Using unsupervised learning to mergefeatures (PCA)

PCA的基本思想是减少问题的维度。这对于消除维度诅咒或合并数据可能是有用的，这样您可以看到数据中的趋势，而没有相关数据的噪声。

在这个例子中，我们将使用PCA去合并2002-2012年间来自24个股票的股票价格为1.这个单一价值（随着时间的推移）表示基于这24个股票的数据的股票市场指数。将这24个股票价格合并为1大量减少了要处理的数据量，并减少了我们的数据的维度，这是一个很大的优势，如果我们后来应用其他机器学习算法，如回归预测。为了看到我们的特征从24减少到1，我们将比较同一时间段的我们的结果与道琼斯指数（DJI）。

下一步是加载数据。为此，我们为您提供2个文件：数据文件1和数据文件2。

object PCA extends SimpleSwingApplication{
  def top = newMainFrame {
    title ="PCA Example"
    //Get theexample data
    val basePath ="/users/.../Example Data/"
    valexampleDataPath = basePath + "PCA_Example_1.csv"
    val trainData =getStockDataFromCSV(new File(exampleDataPath))
    }
  defgetStockDataFromCSV(file: File): (Array[Date],Array[Array[Double]]) = {
    val source =scala.io.Source.fromFile(file)
    //Get all therecords (minus the header)
    val data =source
        .getLines()
        .drop(1)
        .map(x=> getStockDataFromString(x))
        .toArray
    source.close()
    //group allrecords by date, and sort the groups on date ascending
    valgroupedByDate = data.groupBy(x => x._1).toArray.sortBy(x => x._1)
    //extract thevalues from the 3-tuple and turn them into
    // an array oftuples: Array[(Date, Array[Double)]
    valdateArrayTuples = groupedByDate
        .map(x=> (x._1, x
                       ._2
                       .sortBy(x => x._2)
                       .map(y => y._3)
                   )
            )
    //turn thetuples into two separate arrays for easier use later on
    val dateArray =dateArrayTuples.map(x => x._1).toArray
    val doubleArray= dateArrayTuples.map(x => x._2).toArray
   (dateArray,doubleArray)
  }
  defgetStockDataFromString(dataString: String): (Date,String,Double) = {
    //Split thecomma separated value string into an array of strings
    val dataArray:Array[String] = dataString.split(',')
    val format =new SimpleDateFormat("yyyy-MM-dd")
    //Extract thevalues from the strings
    val date =format.parse(dataArray(0))
    val stock:String = dataArray(1)
    val close:Double = dataArray(2).toDouble
    //And returnthe result in a format that can later 
    //easily beused to feed to Smile
   (date,stock,close)
  }
}
With this training data, and the fact that we already know that we want tomerge the 24 features into 1 single feature, we can do the PCA and retrieve thevalues for the datapoints as follows.
//Add to `def top`
val pca = new PCA(trainData._2)
pca.setProjection(1)
val points = pca.project(trainData._2)
val plotData = points
    .zipWithIndex
    .map(x =>Array(x._2.toDouble, -x._1(0) ))
val canvas: PlotCanvas = LinePlot.plot("MergedFeatures Index",
                                        plotData, 
                                        Line.Style.DASH,
                                        Color.RED);
peer.setContentPane(canvas)
size = new Dimension(400, 400)

该代码不仅是PCA，而且也绘制了结果，特征值在y轴上，单独的天数在x轴上。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2016-11-14，如有侵权请联系 cloudcommunity@tencent.com 删除

其他

本文分享自鸿的学习笔记微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

其他

登录后参与评论

0 条评论

热度

写给开发者的机器学习指南（十一）

写给开发者的机器学习指南（十一）

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐