Can you trust your MMM?
When presented with results from an MMM, one of your questions will be “can I trust the model?” You may have been presented with some positive-looking diagnositics; a high R2, possibly a low mean absolute error (MAE) or Mean percentage error (MAPE) which both look good. So everything’s OK then. Not quite. These tests alone are nowhere near enough to validate a model – why? Because they don’t eliminate the danger of overfitting – when the analyst selects and uses multiple variables to make the model fit the data well and increase R2.
You need to probe further.
Here are two of the best ways to validate a marketing mix model so you can have confidence in the results.
Here are two ‘acid’ tests you can ask for when being presented with MMM results. If your model passes at least one of these tests, it means you are better able to rely on it to support media investment decisions and channel mix changes. If it passes both, you have a well validated marketing mix model. These are robust ways to validate a model. Unfortunately, they are not always shared by MMM providers – I’ll leave you to figure out why.
MMM validation test 1: The holdout test
How does the test work? In this test, we test the model on a subset of the data, usually towards the end of the sample. Let’s assume you have a model built on three years’ worth of weekly data – that’s 156 observations. You can split the data and ‘slice off’ the last 26 weeks and ask the model to predict what would have happened in that period using only the explanatory variables you have used in the model e.g. media adspend, digital spends, and perhaps other data like price and sales promos.
What should you see? You will have the actual sales data for the 26 week period. The key ‘acid test’ question for you is how well the model’s prediction for those 26 weeks matches actual sales. If you model is predicting within +/- 20% that’s a strong result. If it’s predicting between +/- 10% that is a very strong result.
MMM validation test 2: Cross Fold validation
How does this test work? In this test we slice your 156 weeks of data into 10 consecutive time “blocks” of 15-16 weeks. In a similar way to the holdout test, we provide the explanatory variables for each block and ask the model to predict what it estimates sales to be in each of these blocks based on the model coefficients. We then examine the prediction vs the actual and the size of the error in each block.
What should you see? We will have the actual sales data for each of the blocks. Again, the key ‘acid test’ question for you is how well the model’s prediction for those blocks matches actual sales in each of those blocks. If your model is predicting within +/- 20% that’s a strong result. If it’s predicting between +/- 10% that is a very strong result.
Worth noting that in some categories the 20% might be too tight, you could relax it to say 20-30%, but you need to be close to that range to have a validated model.
What do these results mean? If you see a model prediction within 20% of the actual in this holdout period, or across the ‘folds’, then you have a model that is working well. It is unlikely to be ‘overfitted’. Overfitting is a way of making the model look good within the sample, but it does not perform well outside the same and would therefore fail this test).

