I’ve received a few questions from readers on the topic of curve fitting, so I thought I’d talk about that a bit here.
What’s curve fitting? Well, it’s bad. When we back-test a trading system, we are trying to see if the trading rules we have written in our program successfully make a profit on historical data. One of the things we do in back testing is we optimize parameters. That means we find the best possible value for a parameter that controls something in our program. But sometimes, we find the perfect value for historical data, but that value doesn’t work in a real trading situation. Why not? Well, it is possible we have fit our parameter too closely to the exact historical data, making it not flexible enough to work in new unseen data. That’s curve fitting. And it’s bad because it creates a trading system that seems good, but really isn’t.
One way to look at it is to think of the the signal and the noise. Our trading system is looking for the signal (our rules for entries and exits based on indicators or patterns), and trying to ignore the noise (spikes, gaps, volatility, general market randomness). If we tune our parameters to listen to everything, both the signal and the noise, then when we trade in a live account, it won’t work. In the new market data, the signal is still there, but the noise is different.
How do we prevent curve fitting? There are a number of things we can do to prevent curve fitting when we are back testing. One of the best techniques is to split up your testing data into two sections: an in-sample set, and an out-of-sample set. You will spend time tuning parameters, optimizing your software, and getting everything perfect on the in-sample data set. You will run your program dozens of times with this data and get it profitable, and working very well.
Then, once you have your perfect parameters, you run your program on the out-of-sample data set, and you see how it does. The key here is that you may run the program one time on the out-of-sample set, and you may not use the out of sample results to tweak your program’s parameters. If you do, that data set now becomes part of the in-sample set!
If the results of your out-of-sample set look good, then congratulations! You may move onto the next phase of testing. But if the results of the out-of-sample test do not look good, you have probably been guilty of curve-fitting during your back testing, and you need to go back and find more general parameters. We’ll go over that in a future blog post.
Let’s assume you were successful in your out-of-sample testing. What’s the next step? That would be walk forward optimization. Walk Forward Optimization is neat. The concept is this: if you take January to Feburary of 2011 as your in-sample set, and then use March of 2011 as your out of sample set, you’ll find some good parameters to use. Save them off somewhere. Now run the back-test again, but use February to March as the in-sample set, and use April of 2011 as the out-of-sample set. Find your best parameters and save them off. Keep moving the “window” of sample sets forward, and keep saving off your results. At the end of all of the back-test runs, compare all the results you have saved off. If they are similar, you have a great set of parameters for your system. If not, then you have some work to do (more on that in a future post).
If this walk-forward test is automated, then this process is somewhat enjoyable. If you have to manually run all of these tests, then it’s a bit tedious, and you have to keep really accurate notes on each run. Unfortunately for us, the walk-forward analysis in Metatrader is a manual process, and it’s tedious. (There is an MT4 add-on that makes it easier, but it’s not perfect.)
The AmiBroker trading platform that I have mentioned in the past does do walk forward analysis, and that is one of its features that makes it attractive to trading system developers.
But, as long as you can keep a spreadsheet of dates, parameters and profitability, you can run the walk-forward analysis manually, and your system will be much better in the long run.
So, curve fitting is bad; we want to make sure our optimized parameters are great, but general enough for new unseen data.
I mentioned a book a few weeks ago called “Trading Systems” by Emilio Tomasini; he spends a lot of time discussing curve fitting, how to avoid it, and how to properly test a trading system. If are working on back testing, and you haven’t picked up that book yet, you’d do well to get a copy.