Trader Tech Talk 022: Dave Walton’s 4 Tips for Back Testing

For this first podcast of 2014, I would like to introduce you to Dave Walton.    I found out about Dave by reading one of my favorite trading newsletters, the one by Van Tharp.  Mr. Tharp is a really interesting guy, he has many books and courses out, and he really helps traders get through all of their issues and become the amazing traders that they were meant to be.davewalton
Dave is one of the guys who participated in Van Tharp’s Super Trader program for the last two years to focus on trading psychology and his trading business development.  In this episode, Dave is going to tell us a bit about the Super Trader program, and how it changed his trading, and actually changed his live.  It’s a very inspiring story.
Dave is co-founder and partner of StatisTrade, a trading system evaluation consultancy for money managers and funds. The mission of StatisTrade is to provide clients with unique and critical insights into their trading systems to improve their performance and meet their specific goals using proprietary, statistically-sound tools and processes.
Anyway, back to the newsletter.  I was reading the November issue of Van Tharp’s newsletter, and one of the articles was by Dave.  The article was on back testing and bias, and it was one of the most interesting articles I’d ever read, and I think you are really going to enjoy our conversation about back testing your strategies.  Dave gives us some outstanding tips on how to avoid data mining bias and do better with back testing.
Here are some links to the newsletter and to Van Tharp’s web site:


Did you enjoy this article?
"Get my free programmer's checklist"
The checklist gives you step-by-step instructions on how to create the perfect automated trading system.


  1. Bill says

    Hello John,

    A few comments. I know you are trying hard but I cannot resist:

    1. QIM performance for 2014 is negative when S&P 500 finished up 30%. That should tell you something about the methods used.
    2. Specifically about the interview: a) Dave never mentioned the concept of independent and identically distributed samples (i.i.d.) that are required for Monte Carlo to make sense and the important issue of auto-correlation of trade returns. b) Highly curve-fitted systems show very good robustness behavior and this method, promoted by the Builder developer to solve the data-mining bias that is inherent in his program, is not useful. Usually, the best performers in Builder are those who are highly fitted to the data by virtue of a curve-fitting genetic algorithm. I am surprised why Dave did not like the curve-fitting term. If a backtester tests billion of combinations of rules some will clear all cross-validation and robustness analysis tests, possibly many of them due to luck. c) Aronson’s methods because of some of the reasons I mentioned above are highly outdated and even useless. His book could be digested in a 10 page paper and most of it deals with logic rules.
    3. The issue of future performance is much simpler than Dave purported it to be. If future data are drawn from the same distribution as the data used during development then future performance will closely match the backtest. If they are not, then performance will not match. Because markets constantly change it is highly unlikely that the future will be the same as the past and due to this whether Aronson style or QIM style or Builder style the systems will not work because they are actually fitted, something that Dave dismissed in advance maybe to promote his agenda of analysis. Curve-fitting is the issue and Dave dismissed it. For example the Builder program all it does is curve-fitting and so is Aronson’s NN program. No matter what you do to minimize data-mining bias if the future is not the same as the past the system will fail. QIM should by now have learned this the hard way.

    Thanks and good luck.

    • says

      Hi Bill, thanks for your comment. I thought I’d give Dave Walton a chance to reply, so here is Dave’s reply.
      Hi Bill, thanks for your feedback. I find I always have something to learn from other traders so thanks for another opportunity to learn. I did want to respond to some of the points you raised and offer some clarifications and/or an alternate viewpoint.

      Yes QIM performance for 2013 is negative (I assume that is what you meant instead of 2014 but if you do know 2014 performance, please let us know so we can capitalize on it. J). Although true, I don’t find that comparison very useful. One of the issues I have with mainstream financial punditry and sales techniques is that periods of performance comparison are cherry picked to prove the point someone wants to make. Unfortunately this plays on the human tendency to fall prey to recency bias and jump on the best performing investment in the most recent past, usually right before the tide turns.

      It is a much more instructive to make a comparison over the entire history of the QIM fund with respect to the S&P500 TR. That paints a much different picture. Since 2003, QIM has returned 10.59% compounded annual return after fees compared to 8.24% for the S&P500 TR. Further the worst drawdown for QIM was -11.74% vs. -50.95% for the S&P500 TR. Any type of trading system has periods of underperformance to a benchmark when achieving absolute returns. Lately many hedge funds have been underperforming the S&P500 and I’ve even seen articles lately say hedge fund investing is dead, long live buy-and-hold. Of course articles such as these were not seen in 2008/2009.

      The interview did not actually address the areas of IID, Monte Carlo Permutation (MCP), and autocorrelation so allow me to address these now. First though I want to clarify the feature in Builder that was mentioned in the interview is not MCP. The Builder feature that allows injection of random market noise, random variation of parameter values, and randomized start date is actually much more novel than MCP but let me address MCP first.

      There are four major problems with MCP: 1) The assumption that the original back-test results represent the mean of the life-time distribution of trade results, 2) open equity drawdown is not modeled, 3) market data is assumed to be normally distributed and thus real-world characteristics such as autocorrelation and changing inter-symbol correlations are not modeled, and 4) the real world portfolio combined with position sizing are not modeled. And as you mention, IID is required to use MCP. For these reasons, I think applying MCP to back tested results in an attempt to determine confidence bands is a dangerous practice.

      What Builder does is different. Let me explain using the example of randomized parameter variation. When simulated, each unique combination of parameter values generates different entry and exit signals based on the specific ways the market data and input parameter based rules interact. Some combinations capitalize on luck through this interaction whereas others are penalized. Over a large number of combinations, the average level of performance approaches the expected performance of the generalized system concept. The average level of performance across combinations is much lower than what was found through the parameter search algorithm.

      No, I do not like the term curve-fitting because that term has to do with finding a best fit equation for a set of points. My criticism is just semantics. I call the problem over-fitting rather than curve-fitting. Regardless of what it is called, it is a large problem when back-testing trading systems.

      David Aronson brought largely academic DM bias compensation methods to the mainstream. I’m sure the latest and greatest academic research has improved since the publication of his book in 2007. If you are up to date on the latest research, perhaps you wouldn’t mind sharing. I personally would be very interested. From what I’ve read, the latest trading system development literature remains far behind what Aronson presented.

      Also, my only agenda in this interview was to shed some light on the problem of data mining bias and offer some solutions. In my discussion with traders, I’ve found that many are largely unaware of DM bias and the impact it has. What I do find is that people are largely frustrated live trading performance that does not live-up to estimates made through back-testing.

      Finally, your point on the future not being the same as the past is of course true. But let’s not throw the baby out with the bathwater. If you only test your system on a bull market and live trading occurs in a bear market, it should be expected that the system will not “work.” For this reason it is important to test the system over as much market data as possible such that the system has been exposed to varying market conditions. Classifying market conditions is another large topic and I don’t have the space here to discuss it. However, if the system has been back tested on a broad range of market conditions, even under similar conditions in the future, performance will be lower than expected due to DM bias. DM bias has nothing to do with how a system might perform under different market conditions.


      Dave Walton

  2. Janie Guill says

    Thank you for bringing Dave to your show!

    I am a Super Trader and recently sat through a one day class by Dave on back testing and data mining bias. He is a great teacher–able to explain complex ideas in simple terms. I like his strategies and his auspicious language. This interview was a great review for me.

    Thank you, again!

  3. Bill says

    Hello again,

    Thanks for the reply. I am surprised by the long defense of QIM, a fund that claims advanced quantitative analysis and methods and could not even finish in positive territory in 2013, a year when the market returned close to 30%. It appears that their performance was worse than random, a term used by Aronson’s co-author, Masters. As a matter of fact the last 4 years QIM has under-performed the SP500 TR by a substantial factor. Here are the numbers:

    In my opinion it is just ludicrous to claim that some method has merit when it cannot generate alpha in a market that has been going straight up. I cannot think of any defense.

    Now, as far as the Builder randomization, the claim that: “…Over a large number of combinations, the average level of performance approaches the expected performance of the generalized system concept. ” sounds peculiar to me because nobody trades the average of many systems but one system. Someone may trade the system that fails and someone else the one that wins although on the average the method is sound. Changing markets make the randomization useless for rejecting the null hypothesis.

    Now, I do not also disagree with this statement: “However, if the system has been back tested on a broad range of market conditions, even under similar conditions in the future, performance will be lower than expected due to DM bias. DM bias has nothing to do with how a system might perform under different market conditions.”

    Of course data-mining bias has to do with how a system will perform in the future. If the process that develops a system is plagued by data-mining bias, then the system will fail in the future. Therefore, data-mining bias implies bad future performance. The opposite is not true, i.e. bad performance does not imply data-mining bias, i.e. it may just be a bad unbiased hypothesis.

    Also I disagree with your definition of data-mining. Data-mining and randomness are loosely related. Data-mining is not the result of randomness, but the result of curve-fitting spurious correlations after repeated use of same data. Like for example Builder is a curve-fitting tool and develops systems by repeatedly back-testing billions of combinations of rules over the same data.

    Finally Aronson did not introduce data-mining to traders. This was done many years ago by several researchers. Only uninformed traders were educated by a popular book. I will mention the papers by Hsu and Kuan in 2005, the master thesis by Griffioen in 2003, the paper by Sullivan and White in 1997 amongst many others. If some traders waited for Aronson to write a book it is certainly their problem.

    “From what I’ve read, the latest trading system development literature remains far behind what Aronson presented.”

    Yes, because the latest academic literature considers the case closed. Anyone looking for gold in backtesting without additional and unpublished restrictions (unpublished edges) pays a high price. There is nothing to find with backtesting. Clusters of supercomputers have already found and exploited everything that could be found in any available time series. Backtesting is a thing of the past.

    Good luck.

      • Bill says

        Hi John,

        I appreciate the fact that you listen to different opinions. I just like to add to the point Dave made about Aronson introducing data-mining bias to traders. It is a fact that by the time some information becomes public it contains no useful value. Otherwise its source should use it to maximize own utility. If the source claims that the objective is to educate traders and no to maximize utility, then according to economics this source is not rational. If the source is not rational, then the information is probably useless. The straightforward explanation is then that the source maximizes its utility by publishing a book of outdated information to uninformed recipients.

        • Dave Walton says

          Hi again Bill. From your comments it seems you believe in efficient markets where no long lasting edge is possible. If this is so, why get involved in trading? With this belief, if follows that it is not possible to trade effectively unless you work at a large institution with the edges of computational power and access to private research.

          I’ll say a few things about that. First is that beliefs control all aspects of human behavior and they tend to be self-fulfilling and self reinforcing. Thus it is very critical to control ones beliefs. One very enlightening question is to ask oneself is if a belief is useful. In my own journey, I’ve found that many of the most non-useful beliefs tended to be the most strongly held.

          Second, if you do believe that all edges are removed when information becomes public, you might want to investigate some of the well-known anomalies such as momentum. There is tons of research that shows relative and absolute momentum are persistent edges much to the chagrin of EMH proponents.

          Last, the statistical techniques that Aronson made understandable for mainstream traders are not market edges subject to degraded efficacy. They are simply techniques to evaluate the statistical significance of trading rule performance in light of data mining.


Leave a Reply

Your email address will not be published. Required fields are marked *