Currency Movement Prediction with Macroeconomic Indicators and GEP .NET Framework

Currency Movement Prediction with Macroeconomic Indicators and Gene Expression Programming .NET Framework

by

Boian V Petkantchin

September 2, 2008

Goal of Experiment

The goal of this experiment is to predict changes in currency rates, based on historical data of macroeconomic indicators. The choice of currency pair is USD/GBP.
The current paper is not in a formal form, but it presents some good insights into this matter.

Used Data

The time period of the learning / teaching data for the model is from Jan 1, 1982 until Dec 31, 2004. The testing set is from Jan 1, 2005 until Dec 31, 2006.
Data is taken with least time frame of 1 month. Some time series run on quarters. When evaluating output for a case, data from 3 years before that, of all indicators, is fed to the model.
Below is a list of all used time series:

United States

Building Permits sa
New Private Housing Units Authorized by Building Permit saar
Real Estate Loans at All Commercial Banks sa
Capacity Utilization Total Industry sa
Manufacturing PMI Composite Index sa
University of Michigan Consumer Sentiment nsa
Consumer Price Index for All Urban Consumers All Items Less Food & Energy sa
Consumer Price Index For All Urban Consumers All Items sa
Producer Price Index All Commodities nsa
Producer Price Index Crude Energy Materials nsa
Producer Price Index Intermediate Materials Supplies & Components sa
AVERAGE HOURLY EARNINGS OF PRODUCTION WORKERS
AVERAGE WEEKLY HOURS OF PRODUCTION WORKERS
Civilian Labor Force - Employment Level
Goverment Employment
Private Employment
Total Population All Ages including Armed Forces Overseas
Unemployment Level
Commercial and Industrial Loans at All Commercial Banks sa
Consumer (Individual) Loans at All Commercial Banks sa
M1 Money Stock nsa
M2 Money Stock nsa
Monthly Effective Federal Funds Rate
Required Reserves, Not Adjusted for Changes in Reserve Requirements nsa
Total Borrowings of Depository Institutions from the Federal Reserve nsa
Total Consumer Credit Outstanding nsa
Total Consumer Credit Outstanding nsa
Total Investments at All Commercial Banks sa
Total Loans and Leases at Commercial Banks sa
U.S. Government Securities at All Commercial Banks sa
Total Loans and Leases at Commercial Banks sa
Gross Domestic Product saar
Business Sector Real Compensation Per Hour sa
Business Sector Unit Labor Cost sa
Nonfarm Business Sector Real Compensation Per Hour sa
Nonfarm Business Sector Unit Labor Cost sa
Federal Government Debt Total Public Debt nsa
Balance on Current Account sa
Balance on Goods and Services sa
Balance on Investment Income sa
Balance on Merchandise Trade sa
Balance on Services sa
Exports of Goods, Services and Income sa
Exports of Merchandise Adjusted, Excluding Military sa
Exports of Services sa
Foreign Assets in the U.S. Net, Capital Inflow {+} sa
Foreign Official Assets in the U.S. Net sa
Imports of Goods, Services, and Income sa
Imports of Merchandise Adjusted, Excluding Military sa
Imports of Services sa
U.S. Assets Abroad, Net Outflow (-) sa
U.S. Official Reserve Assets Abroad, Net sa
U.S. Private Assets Abroad, Net sa
U.S. Reserves of Foreign Currencies sa

United Kingdom

Claimant Count sa
Composite Price Index and annual change
In Employment sa
Number of Property Transactions in England and Wales sa
Output of All Production Industries sa
Output of Electricity, Gas and Water Supply
Output of Extraction of Oil & Gas
Output of Manufacturing
Output of Mining & Quarrying
Producer Price Index Output of Manufactured Products
Retail Price Index All Items
Unemployed sa
Balance of Payments Investment Income Balance sa
Balance of Payments Services Total Export sa
BoP Current Account Balance sa
BoP Goods & Services Exports SA
BoP Goods & Services Imports sa
BoP Services Exports sa
BoP Services Imports sa
Changes in Inventories Including Alignment Adjustment sa
Construction Orders Received
Consumer Credit Outstanding
Employee Jobs sa
GDP by category of income
Household Final Consumption Expenditure National Concept nsa
International Investment Position Inward nsa
International Investment Position Outward nsa
Money Stock M4 sa
Preliminary GDP All Production Industries
Preliminary GDP Chained Volume Measures sa
Productivity Jobs Whole Economy Index sa
Public Sector Taxes on Inclome and Wealth
Public Sector Total Current Expediture
Velocity of Circulation Ratios Money Stock M0
Internal Purchasing Power of the Pound (base on RPI)
Population
Bank of England Interest Rate

Common

Spot Oil Price West Texas Intermediate
USDGBP Exchange Rate

sa - seasonally adjusted
nsa - not seasonally adjusted
saar - seasonally adjusted annual rate

Extra attention is paid not to use data that has not been released at the time of a case, so proper delays are implemented to avoid that.

Expected output

Various cases where tested. Prediction of USDGBP
2 months in the future,
3 months in the future,
6 months in the future and
12 months in the future.

Fitness Function

Two types of fitness functions where tested. First is the mean absolute precentage error between the predicted exchange rate and the actual rate. The second fitness function checks only if the predicted rate and the actual rate are on the same side of the horisontal line passing through the current rate. Meaning, if the genotype guesses the direction of the trend, the error is 0. If there is no change in the price the output is 0, if there is a down trend -1/2 and up trend +1/2. Results are presented as a precentage of the successful cases from all cases.

Simulation Software

The software used to carry out simulations in this experiment is Gene Exression Programming .NET Framework 1.0, which was created for this experiment and can be found at http://sourceforge.net/projects/gepnet/. One of the examples in the source code file, avialable to download there, is the current experiment. The indicators' data is also available there.

Parameters of the Gene Exression Program

Population size: from 100 to 2000 individuals
Number of genes: from 10 to 200
Head length of a gene: from 5 to 20
Number of generations to evolve: Usually a few thousand generations, depending on the size of the population and genotypes. If for many generations there was no improvement, the evolution stops
Number of constants: between a few hundred and 4000
Constants lower bound: -10
Constants upper bound: 10
Constants set mutation probability: 0.04 and 0.1
Genes linking function: addition

Genetic Operators' Parameters

Inversion probability: 0.06 and 0.1
Partial (intragenic) transposition probability: 0.06 and 0.1
Mutation probability in head: 0.04 and 0.1
Mutation probability in tail: 0.04 and 0.1
Mutation probability in constants: 0.04 and 0.1
Two point and one point crossovers probability ratio: 0.5
Elitism of 1 to 2 individuals or no elitism at all

Functions' set

Addition
Division
Multiplication
Subtraction
Cos
Sin
Invert sign
Sqrt
Square
Threshold

Results

Although not all parameter cases where mixed and matched, a lot of them were and all led to the same results.

Mean Absolute Percentage Error Fitness Function

In the first test, the probability to generate a function in the head compared to a variable or a constant was close, but in this case individuals evolve to constants. It is just too appealing and all individuals are drawn to constants sooner or later.
Because of this a test was made with probability of function to appear much higher (up to 0.9999) than a variable or a constant. The results of all parameters tweeking and testing, with best errors of thousands of precents, where not satisfactory.

Threshold Trend Recognition Fitness Function

On the learning data the best individuals usually produced 75% to 80% correct results. Despite the good performance on the testing data, there was no relation to these values whatsoever.
When trying to guess just the trend's direction, having only 265 learning cases and genotypes with high compexity, it twists itself to catch those particular cases, but can not recognize the general pattern.

Conclusion

With the threshold fitness function things are bound to fail due to the small number of learning cases and the loss of information with mapping the real currency rate to a trinary space. If the number of cases is increased to thousands or even tens of thousands, a better result can be reached. But it seems it will end up at the situation of the mean absolute percentage error.
With the mean absolute percentage error as a fitness function, no corelation was found between the past macro economic indicators' data and future currency exchange rates. This does not mean that such relation does not exist. Maybe, if distributed computing is used and computational power is increased dramatically, better results can be produced. What leads to such a conclusion is the big number of variables. 2500 variables create a huge space and a few thousand initial individuals can't cast a dense enough web over the search space.
This experiment touched a delicate subject, concerning search algorithms. Although genetic algorithms have advantage due to crossover, it is still a challenge to escape clustering of the population in local optima and progress gradually to global one.
When thinking on this matter, it seems that the problem comes from the mating selection algorithm. With roulette wheel selection all individuals start to look alike when evolution progresses. What if a real space was introduced and every individual corresponded to a point in it? Its distance to other individuals in that space will be considered while mating, so that closer to each other individuals have a higher probability to mate with each other. These positions will mutate during evolution. It will be like in reality, biological organisms mate to the one closer in geographical terms. Even more complex types of clustering and moving of individuals can be implemented, but not too much, the algorithm needs to be fast.