Can Screening for Stocks Still Generate Alpha?

This article is an overview of screening; in subsequent articles I will be doing deep dives into some classic screens such as those by Benjamin Graham, William O’Neil, Joel Greenblatt, Joseph Piotroski, and James and Patrick O’Shaughnessy.

Hundreds of thousands—perhaps millions—of investors use screens as a first step to picking stocks. Sometimes the screens have very simple and straightforward rules; sometimes they are very complicated, using multifactor ranking systems (my screens are like that) or head-scratching technical indicators.

How can you tell if a screen “works”? The conventional way is to backtest it. (There are a number of websites, brokers, data providers, and subscription services that allow you to do this; I use Portfolio123.) Typically, if a screen beats the market, it works; if it doesn’t, it doesn’t.

But what do you do with periods like the last three years, when “the market”—the S&P 500—has beaten everything in sight? In order to write this article and the articles that will follow it, I tested dozens of established screens designed by or based on the ideas of investment experts with a wide variety of approaches, and fewer than 15% of them have beat the S&P 500 over the last three years. Screens are usually at least partly based on value ratios and value stocks have been hugely disadvantaged lately; screens are often designed to hunt up smaller and little-known stocks and large caps have trounced small caps recently.

How, then, should we approach stock screening? Is it a technique that has outlived its purpose? Or are there other ways to think about it?

Personally, I use stock screens every night in order to place my trades for the next day. My own returns over the last three years have been pretty good: a CAGR of 21%, compared to 9% for the S&P 500, and a lower drawdown during the market crash this year. So it’s not impossible to use screens to beat the market. But it’s getting hard.

I’m going to make a number of suggestions in this article. In the following articles, I’ll be digging deep into some established screens in order to put these suggestions into practice.

Part One: What Is the Benchmark?

Comparing the results of a screen to those of the S&P 500 has one very common-sense rationale: it has become the gold standard of index funds. Ideally, if you’re going to be investing in individual stocks—which always involves a good amount of research and trading—your portfolio should outperform a readily available ETF.

But the S&P 500 is not necessarily the most logical choice for a benchmark to which to compare your screen results. Basically, you want your screen to outperform a dart-throwing monkey, which means you want your screen to outperform an equally weighted portfolio of every investable stock. That should be your benchmark.

Moreover, the performance of the S&P 500, while stellar over the last few years, has not outpaced the performance of an equal-weight portfolio of the entire Russell 3000. If you start in January 1999, that R3000 portfolio (rebalanced yearly) would have earned you a total return of 548%, compared to 262% for the S&P 500 (9.1% annualized versus 6.2%). It would have beat the S&P 500 in 12 out of the last 21 calendar years. Yes, if you look at the last three years alone, that portfolio would have returned 0.4% compared to 30.5% for the S&P 500 (0.1% annualized versus 9.3%). But that’s a historical fluke. There’s no good reason why an equal-weighted index shouldn’t outperform a cap-weighted one.

So if we use an equal-weighted benchmark—one that’s defined by your sector and liquidity limitations—rather than the S&P 500, it’s not so hard any more to find a screen that has outperformed over the last three years.

Part Two: The Screen’s Purpose

Not every screen is supposed to beat the benchmark. Some screens are designed for steady dividend income. Others are for defensive positions—stocks that are safe and sound. Others are low-volatility screens, designed to evade the massive shocks of sudden upturns and downturns. Still others are for buy-and-hold-forever companies.

How should we judge the backtested performance of those screens? Clearly, they need to be held to very different standards.

Once again, it’s not too hard to find ETFs that are designed for similar purposes. Does your low-volatity screen outperform USMV? And even if it doesn’t, perhaps it has a better Sharpe ratio or Sortino ratio? In judging backtests, we need to look not only at the right benchmark, but at the right performance measure.

Part Three: Backtesting Illusions

First, there are some screens that are very hard to backtest. A buy-and-hold-forever strategy, for example, requires a periodic fresh influx of cash in order to buy new stocks; reinvesting one’s dividends may not be enough, and even if it is, it’s difficult to simulate. If a backtest of a strategy that invests in two dozen microcaps at a time grows to an AUM of $10 million after a certain number of years, the market impact of its transactions is going to balloon, and how do you put that into a backtest? There are very few backtesting programs that will allow the user to make regular deposits or withdrawals, or to calculate tax burdens accurately.

Second, backtesting often gives too rosy a picture. One can tweak one’s inputs infinitely in order to get better and better results, but that won’t necessarily improve out-of-sample performance.

The solution in both cases is to simplify, generalize, and stress-test.

Simplify. The more complicated your rules are, or the more complicated your situation is, the more likely your backtest is going to be misleading. Try to simplify your rules, factors, and approaches as much as you can without over-diversifying or turning your rules into mush. This doesn’t mean that your final screen or investing system should be simple—on the contrary. But when backtesting it, err on the side of simplicity rather than complexity.

Generalize. Rather than trying to calculate the effect of adding or subtracting a certain amount of cash to or from a portfolio every month, try doing so once a year instead. Rather than trying to simulate complicated portfolio weighting, backtest your portfolio at equal weight. And so on.

Stress-Test. This is perhaps the most important of all. Test your strategy using various tools. On Portfolio123, the source of my screens, I use rank performance tests, screen backtests, rolling screen backtests, and simulations. Vary the number of stocks your portfolio might hold. Test your screen on subsets of your universe and on altogether different stocks. Vary factor weights if you use ranking systems. Vary the time period tested.

Part Four: Backtesting Periods

Some investors believe that a screen should perform well in a variety of different market circumstances. Others believe that one should design a screen for the kind of market regime that is about to come into place. And still others believe that the screen that performs best in the near future will be the one that has performed best in the recent past.

All my research tells me that a look-back window of ten to twelve years is ideal. Three to five years is too short a time period—there are plenty of factors that have performed well for three to five years and have then never outperformed again. Thirty or sixty years, on the other hand, will bring up a lot of factors that used to work but have been arbitraged away, or investing conditions (call your broker and place your trade) that no longer apply in this age of high-frequency trading and near-zero interest rates. Ten to twelve years strikes me as a good middle ground.

Part Five: A Preview

In my next few articles I’m going to look at some classic screens such as those named at the beginning of the article. I’m also going to take a look at the very first multifactor ranking system I set up when I started using Portfolio123 to screen stocks in 2015, and how it has held up over the years. In each case I’m going to backtest the screen, tinker with its workings, and deem it either worthy of resuscitation or hopelessly outdated. So stay tuned . . .

8 Replies to “Can Screening for Stocks Still Generate Alpha?”

Marc Gerstein says:
July 8, 2020 at 9:54 am
First things first . . . When did you suddenly get into screening? Welcome to the club!
You missed the point about the SP500 benchmark. Yes, there is something to be said for measuring against an equally-weighted basket of R3000 stocks. But, if I can beat the Equal-wt R3000 but not the S&P 500, we still have to answer the question of why I don’t just toss benchmarking theory into the trash and put my money in SPY, which can be done with a few simple mouse clicks, zero commission, and zero effort devoted to the process of screening and zero subscription fees paid for a good screening platform.
There are, actually, good reasons but they are matters of opinion, including my own opinion (and I actually swapped core funds out of SPY).
For more on the dysfunction of SPY and many other widely-recognized benchmarks . . .
seekingalpha.com/…
seekingalpha.com/…
1. Yuval Taylor says:
  July 8, 2020 at 10:14 am
  Marc, the links you put in your comment have disappeared for some reason . . .
  I’ve been screening since I started using Portfolio123 in 2015. But all my rules are in the universe rules and all my factors are in ranking systems. I think the difference between using hard-and-fast rules and relative rules (as in rank) is a major one, but they both qualify as screening, and if I ever said otherwise I was wrong.
  Your question–“if I can beat the Equal-wt R3000 but not the S&P 500, we still have to answer the question of why I don’t just toss benchmarking theory into the trash and put my money in SPY, which can be done with a few simple mouse clicks, zero commission, and zero effort devoted to the process of screening and zero subscription fees paid for a good screening platform”–is one I attempted to address when I wrote “Ideally, if you’re going to be investing in individual stocks—which always involves a good amount of research and trading—your portfolio should outperform a readily available ETF.” The answer is, as you pointed out, that the future of SPY is uncertain–as is the future of any index fund (or any screen, for that matter).
  So you look at things probabilistically. If you conclude that an equal-weight index fund (if only one existed for the R3000!) has a greater than 50% chance of beating SPY down the road, and if you conclude that your screen has a greater than 50% chance of beating an equal-weight benchmark, then you conclude that your screen is a better option than SPY, even if a backtest shows otherwise.
Chaim Gewirtz says:
July 9, 2020 at 10:19 pm
Excellent article.
Do you have any plans to add EW current universe or R3000 EW as an index?
1. Yuval Taylor says:
  July 10, 2020 at 9:11 am
  Both are good suggestions. We’ll put them on our to-do list.
Pingback: A Stock-Picker’s Guide to Benjamin Graham’s Screening Rules – Portfolio123
Hannes says:
October 11, 2020 at 6:44 am
Thank you for your well written article – this one and the other two about Benjamin Graham and Joel Greenblatt and William O’Neil.
You got any idea when the series gonna continue (especially the one about the O’Shaughnessy’s, which I fancy most)?
Thanks a lot!
1. Yuval Taylor says:
  October 11, 2020 at 1:46 pm
  I’m currently working on a rather long article about intrinsic value, after which I plan to get back to writing about screens. I plan to cover Ken Fisher’s screen (from Super Stocks) next, followed by Piotroski’s and those of James and Patrick O’Shaughnessy. So I think it’ll probably be several weeks before I get to those. Sorry for the wait!
Pingback: Why Piotroski's F-Score No Longer Works - Portfolio123 Blog