Currency Movement Prediction with Macroeconomic Indicators and Gene Expression Programming .NET Framework
by
Boian V Petkantchin
September 2, 2008
Goal of Experiment
The goal of this experiment is to predict changes in currency rates, based on historical data of macroeconomic
indicators. The choice of currency pair is USD/GBP.
The current paper is not in a formal form, but it presents some good insights into this matter.
Used Data
The time period of the learning / teaching data for the model is from Jan 1, 1982 until Dec 31, 2004.
The testing set is from Jan 1, 2005 until Dec 31, 2006.
Data is taken with least time frame of 1 month. Some time series run on quarters. When evaluating output for a
case, data from 3 years before that, of all indicators, is fed to the model.
Below is a list of all used time series:
United States
- Building Permits sa
- New Private Housing Units Authorized by Building Permit saar
- Real Estate Loans at All Commercial Banks sa
- Capacity Utilization Total Industry sa
- Manufacturing PMI Composite Index sa
- University of Michigan Consumer Sentiment nsa
- Consumer Price Index for All Urban Consumers All Items Less Food & Energy sa
- Consumer Price Index For All Urban Consumers All Items sa
- Producer Price Index All Commodities nsa
- Producer Price Index Crude Energy Materials nsa
- Producer Price Index Intermediate Materials Supplies & Components sa
- AVERAGE HOURLY EARNINGS OF PRODUCTION WORKERS
- AVERAGE WEEKLY HOURS OF PRODUCTION WORKERS
- Civilian Labor Force - Employment Level
- Goverment Employment
- Private Employment
- Total Population All Ages including Armed Forces Overseas
- Unemployment Level
- Commercial and Industrial Loans at All Commercial Banks sa
- Consumer (Individual) Loans at All Commercial Banks sa
- M1 Money Stock nsa
- M2 Money Stock nsa
- Monthly Effective Federal Funds Rate
- Required Reserves, Not Adjusted for Changes in Reserve Requirements nsa
- Total Borrowings of Depository Institutions from the Federal Reserve nsa
- Total Consumer Credit Outstanding nsa
- Total Consumer Credit Outstanding nsa
- Total Investments at All Commercial Banks sa
- Total Loans and Leases at Commercial Banks sa
- U.S. Government Securities at All Commercial Banks sa
- Total Loans and Leases at Commercial Banks sa
- Gross Domestic Product saar
- Business Sector Real Compensation Per Hour sa
- Business Sector Unit Labor Cost sa
- Nonfarm Business Sector Real Compensation Per Hour sa
- Nonfarm Business Sector Unit Labor Cost sa
- Federal Government Debt Total Public Debt nsa
- Balance on Current Account sa
- Balance on Goods and Services sa
- Balance on Investment Income sa
- Balance on Merchandise Trade sa
- Balance on Services sa
- Exports of Goods, Services and Income sa
- Exports of Merchandise Adjusted, Excluding Military sa
- Exports of Services sa
- Foreign Assets in the U.S. Net, Capital Inflow {+} sa
- Foreign Official Assets in the U.S. Net sa
- Imports of Goods, Services, and Income sa
- Imports of Merchandise Adjusted, Excluding Military sa
- Imports of Services sa
- U.S. Assets Abroad, Net Outflow (-) sa
- U.S. Official Reserve Assets Abroad, Net sa
- U.S. Private Assets Abroad, Net sa
- U.S. Reserves of Foreign Currencies sa
United Kingdom
- Claimant Count sa
- Composite Price Index and annual change
- In Employment sa
- Number of Property Transactions in England and Wales sa
- Output of All Production Industries sa
- Output of Electricity, Gas and Water Supply
- Output of Extraction of Oil & Gas
- Output of Manufacturing
- Output of Mining & Quarrying
- Producer Price Index Output of Manufactured Products
- Retail Price Index All Items
- Unemployed sa
- Balance of Payments Investment Income Balance sa
- Balance of Payments Services Total Export sa
- BoP Current Account Balance sa
- BoP Goods & Services Exports SA
- BoP Goods & Services Imports sa
- BoP Services Exports sa
- BoP Services Imports sa
- Changes in Inventories Including Alignment Adjustment sa
- Construction Orders Received
- Consumer Credit Outstanding
- Employee Jobs sa
- GDP by category of income
- Household Final Consumption Expenditure National Concept nsa
- International Investment Position Inward nsa
- International Investment Position Outward nsa
- Money Stock M4 sa
- Preliminary GDP All Production Industries
- Preliminary GDP Chained Volume Measures sa
- Productivity Jobs Whole Economy Index sa
- Public Sector Taxes on Inclome and Wealth
- Public Sector Total Current Expediture
- Velocity of Circulation Ratios Money Stock M0
- Internal Purchasing Power of the Pound (base on RPI)
- Population
- Bank of England Interest Rate
Common
- Spot Oil Price West Texas Intermediate
- USDGBP Exchange Rate
sa - seasonally adjusted
nsa - not seasonally adjusted
saar - seasonally adjusted annual rate
Extra attention is paid not to use data that has not been released at the time of a case, so proper delays are
implemented to avoid that.
Expected output
Various cases where tested. Prediction of USDGBP
2 months in the future,
3 months in the future,
6 months in the future and
12 months in the future.
Fitness Function
Two types of fitness functions where tested. First is the mean absolute precentage error between the predicted
exchange rate and the actual rate. The second fitness function checks only if the predicted rate and the
actual rate are on the same side of the horisontal line passing through the current rate. Meaning, if the genotype
guesses the direction of the trend, the error is 0. If there is no change in the price the output is 0, if there is a
down trend -1/2 and up trend +1/2. Results are presented as a precentage of the successful cases from all cases.
Simulation Software
The software used to carry out simulations in this experiment is Gene Exression Programming .NET Framework 1.0, which
was created for this experiment and can be found at
http://sourceforge.net/projects/gepnet/. One of the examples in the source code file, avialable to download there,
is the current experiment. The indicators' data is also available there.
Parameters of the Gene Exression Program
- Population size: from 100 to 2000 individuals
- Number of genes: from 10 to 200
- Head length of a gene: from 5 to 20
- Number of generations to evolve: Usually a few thousand generations, depending on the size of the population and
genotypes. If for many generations there was no improvement, the evolution stops
- Number of constants: between a few hundred and 4000
- Constants lower bound: -10
- Constants upper bound: 10
- Constants set mutation probability: 0.04 and 0.1
- Genes linking function: addition
Genetic Operators' Parameters
- Inversion probability: 0.06 and 0.1
- Partial (intragenic) transposition probability: 0.06 and 0.1
- Mutation probability in head: 0.04 and 0.1
- Mutation probability in tail: 0.04 and 0.1
- Mutation probability in constants: 0.04 and 0.1
- Two point and one point crossovers probability ratio: 0.5
- Elitism of 1 to 2 individuals or no elitism at all
Functions' set
- Addition
- Division
- Multiplication
- Subtraction
- Cos
- Sin
- Invert sign
- Sqrt
- Square
- Threshold
Results
Although not all parameter cases where mixed and matched, a lot of them were and all led to the same results.
Mean Absolute Percentage Error Fitness Function
In the first test, the probability to generate a function in the head compared to a variable or a constant was close,
but in this case individuals evolve to constants. It is just too appealing and all individuals are drawn to constants
sooner or later.
Because of this a test was made with probability of function to appear much higher (up to 0.9999) than
a variable or a constant. The results of all parameters tweeking and testing, with best errors of
thousands of precents, where not satisfactory.
Threshold Trend Recognition Fitness Function
On the learning data the best individuals usually produced 75% to 80% correct results. Despite the good performance
on the testing data, there was no relation to these values whatsoever.
When trying to guess just the trend's direction, having only 265 learning cases and genotypes with high compexity,
it twists itself to catch those particular cases, but can not recognize the general pattern.
Conclusion
With the threshold fitness function things are bound to fail due to the small number of learning cases and the
loss of information with mapping the real currency rate to a trinary space. If the number of cases is increased
to thousands or even tens of thousands, a better result can be reached. But it seems it will end up at the situation
of the mean absolute percentage error.
With the mean absolute percentage error as a fitness function, no corelation was found between the past macro economic
indicators' data and future currency exchange rates. This does not mean that such relation does not exist. Maybe,
if distributed computing is used and computational power is increased dramatically, better results can be produced.
What leads to such a conclusion is the big number of variables. 2500 variables create a huge space and a few
thousand initial individuals can't cast a dense enough web over the search space.
This experiment touched a delicate subject, concerning search algorithms. Although genetic algorithms have advantage
due to crossover, it is still a challenge to escape clustering of the population in local optima and
progress gradually to global one.
When thinking on this matter, it seems that the problem comes from the mating selection algorithm. With roulette wheel
selection all individuals start to look alike when evolution progresses. What if a real space was introduced and every
individual corresponded to a point in it? Its distance to other individuals in that space will be considered while mating,
so that closer to each other individuals have a higher probability to mate with each other. These positions will mutate
during evolution. It will be like in reality, biological organisms mate to the one closer in geographical terms.
Even more complex types of clustering and moving of individuals can be implemented, but not too much, the
algorithm needs to be fast.