One of the most intuitive ideas in 13F tracking: if several smart-money funds independently open a new position in the same stock during the same quarter, that consensus should mean something. It's the premise behind half the "whale tracking" products on the internet — including parts of this site.
So we tested it. Properly.
The rules (no cheating)
Backtests of 13F strategies usually fail in one of two ways: they enter at quarter-end prices nobody could have traded (13F filings arrive up to 45 days later), or they quietly drop trades with missing price data. We did neither:
- Signal: ≥K tracked funds open a brand-new position in the same stock in the same quarter. Options excluded; a fund's first-ever filing excluded (everything looks "new" in a debut filing).
- Entry: first close after the filing deadline — quarter-end + 46 days. You only trade what you could have known.
- Exit: fixed holding period (30/60/180 days), equal weight, compared against SPY over the exact same windows.
- Coverage: trades without price data are reported as missing, not silently dropped.
- Window: 2021-Q1 through 2025-Q4 signals.
Results
| Strategy | Trades | Win rate | Avg return | Avg excess vs SPY | Beat SPY |
|---|---|---|---|---|---|
| ≥3 funds, hold 30d | 2,246 | 49.2% | +0.5% | +0.0% | 46.5% |
| ≥3 funds, hold 60d | 2,246 | 50.6% | +0.9% | −0.3% | 43.5% |
| ≥3 funds, top-10 by $, hold 60d | 136 | 42.6% | −1.8% | −2.1% | 40.4% |
| ≥5 funds, hold 60d | 359 | 49.3% | +0.2% | −0.9% | 43.7% |
| ≥3 funds, hold 180d | 2,089 | 55.2% | +7.4% | +0.4% | 42.9% |
| Berkshire-only follow, hold 180d | 30 | 63.3% | +2.9% | −1.6% | 50.0% |
SPY buy-and-hold over the same period: +95.5%.
Three things stand out:
- The win rate is a coin flip. 49–55% across every configuration, and the excess return over SPY rounds to zero. Whatever information is in a consensus new-position signal, the market has priced it in by the time you can legally see it.
- *Stronger consensus is worse.* Requiring ≥5 funds instead of ≥3, or
taking only the ten largest consensus buys, lowered returns (−0.9pt and −2.1pt excess). The most crowded, most visible ideas underperformed the obscure ones — consistent with paying a popularity premium.
- Even copying Buffett doesn't survive the disclosure lag. Following only Berkshire's new positions — entered after the filing, like a real
copier — trailed SPY by 1.6pt per trade.
Why this doesn't kill 13F tracking
The signal that fails here is a timing signal: buy X because funds just bought X. That's the use case that doesn't survive a 45-day lag.
What does survive, in our other tests, is portfolio-level replication: cloning a great manager's whole book, rebalanced quarterly at filing dates, tracks their performance closely — because it doesn't depend on catching any single trade early. That's exactly what our clone-performance rankings measure, and why the spread there (from +43.8pt to −29.3pt annual alpha across 59 funds) is a more honest answer to "who is worth copying" than any consensus-buy list.
Use 13F data to find managers worth studying and positions worth understanding. Don't use it as a buy-signal feed.
Methodology and code: scripts/backtest_consensus_verified.py in our pipeline. 48–68% of trades had usable price history depending on configuration; missing-price trades are excluded from returns but reported. Nothing here is investment advice.