Overfitting and Survivorship Bias

Overfitting and survivorship bias are two of the most important traps in beginner quantitative research. They are dangerous because they can make a model appear scientific while quietly removing the uncertainty that matters most.

Overfitting in plain language

Overfitting means your model learned the quirks of one dataset instead of learning a durable pattern. Imagine making a rule that works perfectly on last year's data because you changed the parameters again and again. The rule may not have discovered structure. It may have discovered coincidence.

Survivorship bias in plain language

Survivorship bias means studying only the examples that survived. If a stock index database excludes companies that failed, the past can look safer than it really was. If a crypto dataset excludes collapsed tokens, the market can look more profitable and less dangerous than reality.

How to reduce the damage

Keep a holdout period, disclose all parameter choices, include failed or inactive examples where possible, and explain what data is missing. The goal is not perfection. The goal is honest uncertainty.

Strong research does not hide its weaknesses. It makes them visible enough that another reader can judge the result fairly.