This article is an overview of screening; in subsequent articles I will be doing deep dives into some classic screens such as those by Benjamin Graham, William O’Neil, Joel Greenblatt, Joseph Piotroski, and James and Patrick O’Shaughnessy.
Hundreds of thousands—perhaps millions—of investors use screens as a first step to picking stocks. Sometimes the screens have very simple and straightforward rules; sometimes they are very complicated, using multifactor ranking systems (my screens are like that) or head-scratching technical indicators.
How can you tell if a screen “works”? The conventional way is to backtest it. (There are a number of websites, brokers, data providers, and subscription services that allow you to do this; I use Portfolio123.) Typically, if a screen beats the market, it works; if it doesn’t, it doesn’t.
But what do you do with periods like the last three years, when “the market”—the S&P 500—has beaten everything in sight? In order to write this article and the articles that will follow it, I tested dozens of established screens designed by or based on the ideas of investment experts with a wide variety of approaches, and fewer than 15% of them have beat the S&P 500 over the last three years. Screens are usually at least partly based on value ratios and value stocks have been hugely disadvantaged lately; screens are often designed to hunt up smaller and little-known stocks and large caps have trounced small caps recently.
How, then, should we approach stock screening? Is it a technique that has outlived its purpose? Or are there other ways to think about it?
Personally, I use stock screens every night in order to place my trades for the next day. My own returns over the last three years have been pretty good: a CAGR of 21%, compared to 9% for the S&P 500, and a lower drawdown during the market crash this year. So it’s not impossible to use screens to beat the market. But it’s getting hard.
I’m going to make a number of suggestions in this article. In the following articles, I’ll be digging deep into some established screens in order to put these suggestions into practice.
Part One: What Is the Benchmark?
Comparing the results of a screen to those of the S&P 500 has one very common-sense rationale: it has become the gold standard of index funds. Ideally, if you’re going to be investing in individual stocks—which always involves a good amount of research and trading—your portfolio should outperform a readily available ETF.
But the S&P 500 is not necessarily the most logical choice for a benchmark to which to compare your screen results. Basically, you want your screen to outperform a dart-throwing monkey, which means you want your screen to outperform an equally weighted portfolio of every investable stock. That should be your benchmark.
Moreover, the performance of the S&P 500, while stellar over the last few years, has not outpaced the performance of an equal-weight portfolio of the entire Russell 3000. If you start in January 1999, that R3000 portfolio (rebalanced yearly) would have earned you a total return of 548%, compared to 262% for the S&P 500 (9.1% annualized versus 6.2%). It would have beat the S&P 500 in 12 out of the last 21 calendar years. Yes, if you look at the last three years alone, that portfolio would have returned 0.4% compared to 30.5% for the S&P 500 (0.1% annualized versus 9.3%). But that’s a historical fluke. There’s no good reason why an equal-weighted index shouldn’t outperform a cap-weighted one.
So if we use an equal-weighted benchmark—one that’s defined by your sector and liquidity limitations—rather than the S&P 500, it’s not so hard any more to find a screen that has outperformed over the last three years.
Part Two: The Screen’s Purpose
Not every screen is supposed to beat the benchmark. Some screens are designed for steady dividend income. Others are for defensive positions—stocks that are safe and sound. Others are low-volatility screens, designed to evade the massive shocks of sudden upturns and downturns. Still others are for buy-and-hold-forever companies.
How should we judge the backtested performance of those screens? Clearly, they need to be held to very different standards.
Once again, it’s not too hard to find ETFs that are designed for similar purposes. Does your low-volatity screen outperform USMV? And even if it doesn’t, perhaps it has a better Sharpe ratio or Sortino ratio? In judging backtests, we need to look not only at the right benchmark, but at the right performance measure.
Part Three: Backtesting Illusions
First, there are some screens that are very hard to backtest. A buy-and-hold-forever strategy, for example, requires a periodic fresh influx of cash in order to buy new stocks; reinvesting one’s dividends may not be enough, and even if it is, it’s difficult to simulate. If a backtest of a strategy that invests in two dozen microcaps at a time grows to an AUM of $10 million after a certain number of years, the market impact of its transactions is going to balloon, and how do you put that into a backtest? There are very few backtesting programs that will allow the user to make regular deposits or withdrawals, or to calculate tax burdens accurately.
Second, backtesting often gives too rosy a picture. One can tweak one’s inputs infinitely in order to get better and better results, but that won’t necessarily improve out-of-sample performance.
The solution in both cases is to simplify, generalize, and stress-test.
Simplify. The more complicated your rules are, or the more complicated your situation is, the more likely your backtest is going to be misleading. Try to simplify your rules, factors, and approaches as much as you can without over-diversifying or turning your rules into mush. This doesn’t mean that your final screen or investing system should be simple—on the contrary. But when backtesting it, err on the side of simplicity rather than complexity.
Generalize. Rather than trying to calculate the effect of adding or subtracting a certain amount of cash to or from a portfolio every month, try doing so once a year instead. Rather than trying to simulate complicated portfolio weighting, backtest your portfolio at equal weight. And so on.
Stress-Test. This is perhaps the most important of all. Test your strategy using various tools. On Portfolio123, the source of my screens, I use rank performance tests, screen backtests, rolling screen backtests, and simulations. Vary the number of stocks your portfolio might hold. Test your screen on subsets of your universe and on altogether different stocks. Vary factor weights if you use ranking systems. Vary the time period tested.
Part Four: Backtesting Periods
Some investors believe that a screen should perform well in a variety of different market circumstances. Others believe that one should design a screen for the kind of market regime that is about to come into place. And still others believe that the screen that performs best in the near future will be the one that has performed best in the recent past.
All my research tells me that a look-back window of ten to twelve years is ideal. Three to five years is too short a time period—there are plenty of factors that have performed well for three to five years and have then never outperformed again. Thirty or sixty years, on the other hand, will bring up a lot of factors that used to work but have been arbitraged away, or investing conditions (call your broker and place your trade) that no longer apply in this age of high-frequency trading and near-zero interest rates. Ten to twelve years strikes me as a good middle ground.
Part Five: A Preview
In my next few articles I’m going to look at some classic screens such as those named at the beginning of the article. I’m also going to take a look at the very first multifactor ranking system I set up when I started using Portfolio123 to screen stocks in 2015, and how it has held up over the years. In each case I’m going to backtest the screen, tinker with its workings, and deem it either worthy of resuscitation or hopelessly outdated. So stay tuned . . .