Stock Price Prediction With Big Data and Machine Learning(如何用大数据和机器学习预测股票市场)
Synopsis
This post is based on Modeling high-frequency limit order book dynamics with support vector machines paper. Roughly speaking I’m implementing ideas introduced in this paper in scala with Spark and Spark MLLib. Authors are using sampling, I’m going to use full order log from NYSE (sample data is available from NYSE FTP), just because I can easily do it with Spark. Instead of using SVM, I’m going to use Decision Tree algorithm for classification, because in Spark MLLib it supports multiclass classification out of the box.
If you want to get deep understanding of the problem and proposed solution, you need to read the paper. I’m going to give high level overview of the problem in less academic language, in one or two paragraphs.
Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome.