Currency Movement Prediction with Macroeconomic Indicators and Gene Expression Programming .NET Framework

by

Boian V Petkantchin

September 2, 2008

Goal of Experiment

The goal of this experiment is to predict changes in currency rates, based on historical data of macroeconomic indicators. The choice of currency pair is USD/GBP.
The current paper is not in a formal form, but it presents some good insights into this matter.

Used Data

The time period of the learning / teaching data for the model is from Jan 1, 1982 until Dec 31, 2004. The testing set is from Jan 1, 2005 until Dec 31, 2006.
Data is taken with least time frame of 1 month. Some time series run on quarters. When evaluating output for a case, data from 3 years before that, of all indicators, is fed to the model.
Below is a list of all used time series: sa - seasonally adjusted
nsa - not seasonally adjusted
saar - seasonally adjusted annual rate

Extra attention is paid not to use data that has not been released at the time of a case, so proper delays are implemented to avoid that.

Expected output

Various cases where tested. Prediction of USDGBP
2 months in the future,
3 months in the future,
6 months in the future and
12 months in the future.

Fitness Function

Two types of fitness functions where tested. First is the mean absolute precentage error between the predicted exchange rate and the actual rate. The second fitness function checks only if the predicted rate and the actual rate are on the same side of the horisontal line passing through the current rate. Meaning, if the genotype guesses the direction of the trend, the error is 0. If there is no change in the price the output is 0, if there is a down trend -1/2 and up trend +1/2. Results are presented as a precentage of the successful cases from all cases.

Simulation Software

The software used to carry out simulations in this experiment is Gene Exression Programming .NET Framework 1.0, which was created for this experiment and can be found at http://sourceforge.net/projects/gepnet/. One of the examples in the source code file, avialable to download there, is the current experiment. The indicators' data is also available there.

Parameters of the Gene Exression Program

Results

Although not all parameter cases where mixed and matched, a lot of them were and all led to the same results.

Mean Absolute Percentage Error Fitness Function

In the first test, the probability to generate a function in the head compared to a variable or a constant was close, but in this case individuals evolve to constants. It is just too appealing and all individuals are drawn to constants sooner or later.
Because of this a test was made with probability of function to appear much higher (up to 0.9999) than a variable or a constant. The results of all parameters tweeking and testing, with best errors of thousands of precents, where not satisfactory.

Threshold Trend Recognition Fitness Function

On the learning data the best individuals usually produced 75% to 80% correct results. Despite the good performance on the testing data, there was no relation to these values whatsoever.
When trying to guess just the trend's direction, having only 265 learning cases and genotypes with high compexity, it twists itself to catch those particular cases, but can not recognize the general pattern.

Conclusion

With the threshold fitness function things are bound to fail due to the small number of learning cases and the loss of information with mapping the real currency rate to a trinary space. If the number of cases is increased to thousands or even tens of thousands, a better result can be reached. But it seems it will end up at the situation of the mean absolute percentage error.
With the mean absolute percentage error as a fitness function, no corelation was found between the past macro economic indicators' data and future currency exchange rates. This does not mean that such relation does not exist. Maybe, if distributed computing is used and computational power is increased dramatically, better results can be produced. What leads to such a conclusion is the big number of variables. 2500 variables create a huge space and a few thousand initial individuals can't cast a dense enough web over the search space.
This experiment touched a delicate subject, concerning search algorithms. Although genetic algorithms have advantage due to crossover, it is still a challenge to escape clustering of the population in local optima and progress gradually to global one.
When thinking on this matter, it seems that the problem comes from the mating selection algorithm. With roulette wheel selection all individuals start to look alike when evolution progresses. What if a real space was introduced and every individual corresponded to a point in it? Its distance to other individuals in that space will be considered while mating, so that closer to each other individuals have a higher probability to mate with each other. These positions will mutate during evolution. It will be like in reality, biological organisms mate to the one closer in geographical terms. Even more complex types of clustering and moving of individuals can be implemented, but not too much, the algorithm needs to be fast.

SourceForge.net Logo