【全网首发】机器学习该如何应用到量化投资系列(二)

有一些单纯搞计算机、数学或者物理的人会问,究竟怎么样应用 ML 在量化投资。他们能做些什么自己擅长的工作。虽然在很多平台或者自媒体有谈及有关的问题,但是不够全面和完整。从今日起,量化投资与机器学习公众号将推出一个系列【机器学习该如何应用到】。今日的推文,是编辑部人员对国外一篇关于深度学习在量化投资中的应用的博客论文进行了翻译。为此也希望大家从国外研究者眼中,看到DL的应用。

前言

深度学习技术在交易中的研究

深度学习最近受到了很多关注,特别是在图像分类和语音识别领域。然而,它的应用似乎并没有广泛应用到交易当中。这项调查涵盖了到目前为止作者(Greg Harris)发现相关的系统交易。(点击阅读原文获取原文PDF)

一些名词:

DBN = Deep BeliefNetwork(深度信念网络)

LSTM = LongShort-Term Memory(长短期记忆),一种时间递归神经网络

MLP = Multi-layer Perceptron(多层神经网络)

RBM = RestrictedBoltzmann Machine(限制玻尔兹曼机)

ReLU = RectifiedLinear Units(修正线性单元),激活函数

CNN =Convolutional Neural Network(卷积神经网络)

Limit OrderBook模型

Sirignano(2016)预测了limit order books的变化。他设计了一个可以利用局部空间结构的“空间神经网络”,他设计的网络可作为分类器而且比一般的神经网络计算效率更高。他建立模型以求出下一个状态的最佳买价、卖价的联合分布情况。同时,也能求出其中之一(买/卖价)的改变对另外一个的影响。

Architecture – Each neural network has 4 layers. The standard neuralnetwork has 250 neurons per hidden layer, and the spatial neural network has50. He uses the tanh activation function on the hidden layer neurons.

Training – He trained and tested on order books from 489 stocks from 2014 to 2015(a separate model for each stock). He uses Level III limit order book data fromthe NASDAQ with event times having nanosecond decimal precision. Traininginvolved 50TB of data and used a cluster with 50 GPUs. He includes 200features: the price and size of the limit order book across the first 50non-zero bid and ask levels. He uses dropout to prevent overfitting. He usesbatch normalization between each hidden layer to prevent internal covariateshift. Training is done with the RMSProp algorithm. RMSProp is similar tostochastic gradient descent with momentum but it normalizes the gradient by arunning average of the past gradients. He uses an adaptive learning rate wherethe learning rate is decreased by a constant factor whenever the training errorincreases over a training epoch. He uses early stopping imposed via avalidation set to reduce overfitting. He also includes an l^2 penalty whentraining in order to reduce overfitting.

Results – He shows that limit order books exhibit some degree of local spatialstructure. He predicts the order book 1 second ahead and also at the time ofthe next bid/ask change. The spatial neural network outperforms the standardneural network and logistic regression with non-linear features. Both neuralnetworks have 10% lower error than logistic regression.

基于价格的分类模型

Dixon(etal.)(2016)使用了一个深度神经网络去预测未来5分钟的价格变化的信号,曾在43种大宗商品和外汇期货中使用。

Architecture – Their input layer has 9,896 neurons for inputfeatures made up of lagged price differences and co-movements betweencontracts. There are 5 learned fully-connected layers. The first of the fourhidden layers contains 1,000 neurons, and each subsequent layer tapers by 100neurons. The output layer has 135 neurons (3 for each class {-1, 0, 1} times 43contracts).

Training – They used the standard back-propagation with stochastic gradientdescent. They speed up training by using mini-batching (computing the gradienton several training examples at once rather than individual examples). Ratherthan an nVidia GPU, they used an Intel Xeon Phi co-processor.

Results – They report 42% accuracy, overall, for three-class classification.They do some walk-forward training instead of a traditional backtest. Theirboxplot shows some generally positive Sharpe ratios from the mini-backtests foreach contract. They did not include transaction costs or crossing the bid-askspread. All their predictions and features were based on the mid-price at theend of each 5-minute time period.

Takkeuchi andLee(2013)研究了动量效应对预测股票月收益率的影响。

Architecture – They use an auto-encoder composed of stacked RBMs toextract features from stock prices which they then pass to a feed-forwardneural network classifier. Each RBM consists of one layer of visible units andone layer of hidden units connected by symmetric links. The first layer has 33units for input features from one stock at a time. For every month t, thefeatures include the 12 monthly returns for month t-2 through t-13 and the 20daily returns approximately corresponding to month t. They normalize each ofthe return features by calculating the z-score relative to the cross-section ofall stocks for each month or day. The number of hidden units in the final layerof the encoder is sharply reduced, forcing dimensionality reduction. The outputlayer has 2 units, corresponding to whether the stock ended up above or belowthe median return for the month. Final layer sizes are 33-40-4-50-2.

Training – During pre-training, they split the dataset into smaller,non-overlapping mini-batches. Afterwards, they un-roll the RBMs to form anencoder-decoder, which is fine-tuned using back-propagation. They consider allstocks trading on the NYSE, AMEX, or NASDAQ with a price greater than $5. Theytrain on data from 1965 to 1989 (848,000 stock-month samples) and test on datafrom 1990 to 2009 (924,300 stock-month samples). Some training data held-outfor validation for the number of layers and the number of units per layer.

Results – Their overall accuracy is around 53%. When they consider thedifference between the top decile and the bottom decile predictions, they get3.35% per month, or 45.93% annualized return.

Batres-Estrada(2015)预测了在给定的交易日中哪些股票会有高于中位数的回报(基于标准普尔500)。他的研究对Takeuchi和Lee(2013)的研究也产生了影响。

Architecture – He uses a 3-layer DBN coupled to an MLP. He uses 400neurons in each hidden layer, and he uses a sigmoid activation function. Theoutput layer is a softmax layer with two output neurons for binaryclassification (above median or below). The DBN is composed of stacked RBMs,each trained sequentially.

Training – He first pre-trains the DBN module, then fine-tunes the entire DBN-MLPusing back-propagation. The input includes 33 features: monthly log-returns formonths t-2 to t-13, 20 daily log-returns for each stock at month t, and anindicator variable for the January effect. The features are normalized usingthe Z-score for each time period. He uses S&P 500 constituent data from1985 to 2006 with a 70-15-15 split for training-validataion-test. He uses thevalidation data to choose the number of layers, the number of neurons, and theregularization parameters. He uses early-stopping to prevent over-fitting.

Results – His model has 53% accuracy, which outperforms regularized logisticregression and a few MLP baselines.

Sharang andRao(2015)使用了DBN(深度信念网络)训练的技术指标对投资组合进行分类。

Architecture – They use a DBN consisting of 2 stacked RBMs. Thefirst RBM is Gaussian-Bernoulli (15 nodes), and the second RBM is Bernoulli (20nodes). The DBN produces latent features which they try feeding into threedifferent classifiers: regularized logistic regression, support vectormachines, and a neural network with 2 hidden layers. They predict 1 ifportfolio goes up over 5 days, and -1 otherwise.

Training – They train the DBN using a contrastive divergence algorithm. Theycalculate signals based on open, high, low, close, open interest, and volumedata, beginning in 1985, with some points removed during the 2008 financialcrisis. They use 20 features: the “daily trend” calculated over different time frames, and thennormalized. All parameters are chosen using a validation dataset. When trainingthe neural net classifier, they mention using a momentum parameter duringmini-batch gradient descent training to shrink the coefficients by half duringevery update.

Results – The portfolio is constructed using PCA to be neutral to the firstprincipal component. The portfolio is an artificial spread of instruments, soactually trading it is done with a spread between the ZF and ZN contracts. Allinput prices are mid-prices, meaning the bid-ask spread is ignored. The resultslook profitable, with all three classification models performing 5-10% moreaccurately than a random predictor.

Zhu(et al.)(2016)使用了基于深度信念网络的箱体震荡理论来进行决策。箱体震荡理论认为股票的价格会在一个确定的范围内(箱体)震荡,如果价格超出这个范围,那么股票价格会完全进入一个新的箱体。他们的交易策略就是在突破箱体顶部时买入和在跌穿箱体底部时卖出。

Architecture – They use a DBN made up of stacked RBMs and a finalback-propagation layer.

Training – They used block Gibbs sampling to greedily train each layer fromlowest to highest in an unsupervised way. They then train the back-propagationlayer in a supervised way, which fine-tunes the whole model. They chose 400stocks out of the S&P 500 for testing, and the test set covers 400 daysfrom 2004 to 2005. They use open, high, low, close prices as well as technicalanalysis indicators, for a total of 14 model inputs. Some indicators are givenmore influence in the prediction through the use of “gray relation analysis” or “gray correlation degree.”

Results – In their trading strategy, they charge 0.5% transaction costs pertrade and add a couple of parameters for stop-loss and “transaction rate.” I don’t fully understand the result tables, but they seem tobe reporting significant profits.

波动率预测

Xiong (etal.)(2015)根据估算出来的开、高、低、收价格预测了标准普尔500指数的日波动率。

Architecture – They use a single LSTM hidden layer consisting of oneLSTM block. For inputs they use daily S&P 500 returns and volatilities.They also include 25 domestic Google trends, covering sectors and major areasof the economy.

Training – They used the “Adam” method with 32 samples per batch and meanabsolute percent error (MAPE) as the objective loss function. They set themaximum lag of the LSTM to include 10 successive observations.

Results – They show their LSTM method outperforms GARCH, Ridge, and LASSOtechniques.

波基于文本的分类模型

Rönnqvist andSarlin(2016)使用新闻文章来预测银行的运营状况。具体来说,他们建立了一个分类器用来判断一个句子表示的是处于困难时期还是平稳时期。

Architecture – They use two neural networks in this paper. The firstis for semantic pre-training to reduce dimensionality. For this, they run asliding window over text, taking a sequence of 5 words and learning to predictthe next word. They use a feed-forward topology where a projection layer in themiddle provides the semantic vectors once the connection weights have beenlearned. They also include the sentence ID as an input to the model, to providecontext and inform the prediction of the next word. They use binary Huffmancoding to map sentence IDs and word to activation patterns in the input layer,which organizes the words roughly by frequency. They say feed-forwardtopologies with fixed context sizes are more efficient than recurrent neuralnetworks for modeling text sequences. The second neural network is forclassification. Instead of a million inputs (one for each word), they use 600inputs from the learned semantic model. The first layer has 600 nodes, themiddle layer has 50 rectified linear hidden nodes, and the output layer has 2nodes (distress/tranquil).

Training – They train it with 243 distress events over 101 banks observed duringthe financial crisis of 2007-2009. They use 716k sentences mentioning thebanks, taken from 6.6m Reuters news articles published during and after thecrisis.

Results – They evaluate their classification model using a custom “Usefulness” measure. The evaluation is done usingcross-validation, leaving N banks out in each fold. They aggregate the distresscounts into various timeseries but don’t go so far as to consider creating a tradingstrategy.

Fehrer andFeuerriegel(2015)训练了一个基于新闻标题的模型用来预测德国的股票收益。

Architecture – They use a recursive autoencoder with an additionalsoftmax layer in each autoencoder for estimating probabilities. They performthree-class prediction {-1, 0, 1} for the following day’s return of the stock associated with theheadline.

Training – They initialize the weights with Gaussian noise, and then updatethrough back-propagation. They use an English ad-hoc news announcement dataset(8,359 headlines) for the German market covering 2004 to 2011. Results – Their recursive autoencoder has 56% accuracy, which in an improvementover a more traditional random forest modeling approach with 53% accuracy. Theydo not develop a trading strategy. They have made a Java implementation oftheir code publicly available.

Ding (etal.)(2015)使用从新闻标题中提取出来的结构化信息来预测标准普尔500指数的变化。他们用OPEN IE(Open information Extraction,不是打开IE=.=)来处理新闻标题,并获得新闻事件所表达的信息(人,事,物,时)。与其他普通的网络不同的是,他们使用了张量神经网络学习语义组合。

Architecture – They combine short-term and long-term effects ofevents, using a CNN to perform semantic composition over the input eventsequence. They use a max pooling layer on top of the convolutional layer, whichmakes the network retain only the most useful features produced by theconvolutional layer. They have separate convolutional layers for long-termevents and mid-term events. Both of these layers, along with an input layer forshort-term events, feed into a hidden layer which then feeds into two outputnodes.

Training – They extracted 10 million events from Reuters and Bloomberg news. Fortraining, they corrupt events by replacing one event argument with a randomargument. During training, they assume that the actual event should be given ahigher score than the corrupted event. When it isn’t, model parameters get updated.

Results – They find that structured events are better features than words forstock market prediction. Their approach outperforms baseline methods by 6%.They make predictions for the S&P 500 index and 15 individual stocks, and atable appears to show that they can predict the S&P 500 with 65% accuracy.

投资组合模型

Heaton (etal.)(2016)试图寻找一个比生物科技指数IBB表现更好的投资组合。他们有目标地跟踪指数和一些股票,并尝试在大幅下跌的情况下仍然能跑赢指数。他们使用支持非线性结构的拟合模型,而不是直接对协方差矩阵建模。

Architecture – They use auto-encoding with regularization and ReLUs.Their auto-encoder has one hidden layer with 5 neurons.

Training – They use weekly return data for the component stocks of IBB from 2012to 2016. They auto-encode all stocks in the index and evaluate the differencebetween each stock and its auto-encoded version. They keep the 10 most “communal” stocks that are most similar to the auto-encodedversion. They also keep a varying number of other stocks, where the number ischosen with cross-validation.

Results – They show the tracking error as a function of the number stocksincluded in the portfolio, but don’t seem to compare against traditional methods. Theyalso replace index drawdowns with positive returns and find portolios thattrack this modified index.

原文发布于微信公众号 - 量化投资与机器学习(ZXL_LHTZ_JQXX)

原文发表时间:2016-10-16

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏智能算法

Yoshua Bengio等大神传授:26条深度学习经验

原文地址:http://www.marekrei.com/blog/26-things-i-learned-in-the-deep-learning-summe...

3776
来自专栏数据科学与人工智能

【机器学习】10 种机器学习算法的要点

前言 谷歌董事长施密特曾说过:虽然谷歌的无人驾驶汽车和机器人受到了许多媒体关注,但是这家公司真正的未来在于机器学习,一种让计算机更聪明、更个性化的技术。 也许我...

2527
来自专栏人工智能头条

Yoshua Bengio、 Leon Bottou等大神传授:深度学习的26条经验

2382
来自专栏AI研习社

不会做特征工程的 AI 研究员不是好数据科学家!上篇 - 连续数据的处理方法

眨眼间我们就从人工特征、专家系统来到了自动特征、深度学习的人工智能新时代,众多开源测试数据集也大大降低了理论研究的门槛,直接加载数据集就可以开始模型训练或者测试...

52510
来自专栏CSDN技术头条

Yoshua Bengio等大神传授:26条深度学习经验

【编者按】8月初的蒙特利尔深度学习暑期班,由Yoshua Bengio、 Leon Bottou等大神组成的讲师团奉献了10天精彩的讲座,剑桥大学自然语言处理与...

2106
来自专栏机器之心

资源 | 从最小二乘到DNN:六段代码了解深度学习简史

选自floydhub 机器之心编译 参与:路雪、刘晓坤、黄小天 六段代码使深度学习发展成为今天的模样。本文介绍它们的发明者和背景。每个故事包括简单的代码示例,均...

3949
来自专栏机器人网

十种深度学习算法要点及代码解析

谷歌董事长施密特曾说过:虽然谷歌的无人驾驶汽车和机器人受到了许多媒体关注,但是这家公司真正的未来在于机器学习,一种让计算机更聪明、更个性化的技术。 也许我们生活...

1.1K7
来自专栏PPV课数据科学社区

【源码】机器学习算法清单!附Python和R代码

本文约6000字,建议阅读8分钟。 通过本文为大家介绍了3种机器学习算法方式以及10种机器学习算法的清单,学起来吧~ 前言 谷歌董事长施密特曾说过:虽然谷歌的无...

3333
来自专栏IT派

二十六条深度学习经验,来自蒙特利尔深度学习

【前言】2016年8月初的蒙特利尔深度学习暑期班,由Yoshua Bengio、 Leon Bottou等大神组成的讲师团奉献了10天精彩的讲座,剑桥大学自然语...

3287
来自专栏AI研习社

深度学习如何入门?

关于深度学习,网上的资料很多,不过貌似大部分都不太适合初学者。 这里有几个原因: 1. 深度学习确实需要一定的数学基础。如果不用深入浅出地方法讲,有些读者就会有...

3256

扫码关注云+社区

领取腾讯云代金券