This paper explores the use of machine learning techniques in classifying financial news for the purpose of predicting stock price movements. The current body of literature on the subject is small, and the reported results are mixed. During the course of this paper we attempt to identify some causes for the divergent results, and devise experiments that account for weaknesses in existing research. A corpus of Thomson Reuter newswires was collected from Dow Jones' Factiva for seven large stocks. Each article was then linked with the associated price gap of the trading day following the article's publish date. Utilizing a sequential minimal optimization based support vector machine along with a WordNet-transformed bag-of-words representation, predictions were made in the form of long and short signals. Another variant of the system was also evaluated, wherein Latent Semantic Analysis was employed to process the input data. The signals were conditioned on a set of thresholds, meaning that trade signals were only generated when the predicted values exceeded certain threshold values. Higher thresholds were associated with higher accuracy but a lower number of trading signals. Overall the results were promising.
展开▼