为了能够看到我们的PCA组合词,我们现在应该通过调整代码将道琼斯指数添加到图中:
First we add this to the def top method
//Verificationagainst DJI
valverificationDataPath = basePath + "PCA_Example_2.csv"
valverificationData = getDJIFromFile(new File(verificationDataPath))
val DJIIndex =getDJIFromFile(new File(verificationDataPath))
canvas.line("Dow Jones Index", DJIIndex._2,Line.Style.DOT_DASH, Color.BLUE)
接下我们将引进下面两个方法:
def getDJIRecordFromString(dataString: String):(Date,Double) = {
//Split thecomma separated value string into an array of strings
val dataArray:Array[String] = dataString.split(',')
val format =new SimpleDateFormat("yyyy-MM-dd")
//Extract thevalues from the strings
val date =format.parse(dataArray(0))
val close: Double= dataArray(4).toDouble
//And returnthe result in a format that can later
//easily beused to feed to Smile
(date,close)
}
defgetDJIFromFile(file: File): (Array[Date],Array[Double]) = {
val source =scala.io.Source.fromFile(file)
//Get all therecords (minus the header)
val data =source
.getLines()
.drop(1)
.map(x=> getDJIRecordFromString(x)).toArray
source.close()
//turn thetuples into two separate arrays for easier use later on
val sortedData= data.sortBy(x => x._1)
val dates =sortedData.map(x => x._1)
val doubles =sortedData.map(x => x._2 )
(dates,doubles)
}
此代码加载DJI数据,并将其添加到已经包含我们自己的股票市场指数的图形上。但是,当我们执行这段代码时,结果如下。
正如你所看到的,DJI的范围和我们计算的特征都很远。 这就是为什么我们现在需要规范化数据。这个想法是,我们根据数据的范围来缩放数据,这样两个数据集的规模都是相同的。
Replace the getDJIFromFile method with the following:
def getDJIFromFile(file: File):(Array[Date],Array[Double]) = {
val source =scala.io.Source.fromFile(file)
//Get all the records (minus the header)
val data =source
.getLines()
.drop(1)
.map(x=> getDJIRecordFromString(x))
.toArray
source.close()
//turn thetuples into two separate arrays for easier use later on
val sortedData= data.sortBy(x => x._1)
val dates =sortedData.map(x => x._1)
val maxDouble =sortedData.maxBy(x => x._2)._2
val minDouble =sortedData.minBy(x => x._2)._2
val rangeValue= maxDouble - minDouble
val doubles =sortedData.map(x => x._2 /rangeValue )
(dates,doubles)
}
and replace the plotData definition in the methoddef top with
val maxDataValue = points.maxBy(x => x(0))
val minDataValue = points.minBy(x => x(0))
val rangeValue = maxDataValue(0) - minDataValue(0)
val plotData = points
.zipWithIndex
.map(x =>Array(x._2.toDouble, -x._1(0) / rangeValue))
我们看到,即使DJI的数据在0.8和1.8之间,但我们的新特征的范围在-0.5和0.5之间,回归线对应得很好。 有了这个例子,并在一般部分中的PCA的解释,你现在应该能够使用PCA并将其应用到您自己的数据。