📊 Berkson's Paradox Simulator

What is Berkson's Paradox?

Two variables that are completely independent in the full population can appear negatively correlated once you condition on a biased sample — one that tends to select people who are high in at least one of the two traits. The correlation is a statistical artifact of the sampling process, not a real relationship.

Choose a scenario:

In the general public, intelligence and attractiveness are unrelated. But Hollywood selects people who excel in at least one dimension. Among stars, the two traits appear negatively correlated — the smarter, the less attractive, and vice versa.

Selection Threshold

A person is selected if their score in at least one trait exceeds this value.

More inclusive (−2)0.50More selective (+2)

Statistics

Full population600 people
Selected (54%)324 people
Correlation — full population
r = 0.038
(expected ≈ 0, traits are independent)
Correlation — selected sample
r = -0.462
← spurious negative correlation from selection bias

Scatter Plot

-3-3-2-2-1-1112233IntelligenceAttractiveness
Selected
Not selected
Threshold
Regression (selected)

Why does this happen?

The two traits in this simulation are generated completely independently — knowing one value tells you nothing about the other. The full-population correlation is essentially zero.

When we restrict our attention to people selected because they exceed the threshold in at least one trait, we introduce a dependency: someone with a low X score must have a high Y score to have made the cut. Equivalently, among the selected, a very high X score means Y could be anything — but a middling X score signals a high Y. This creates the illusion of a negative relationship.

Try it: Drag the threshold slider to the right (more selective). The apparent negative correlation in the selected sample grows stronger. With a very inclusive threshold (far left), nearly everyone is selected and the spurious correlation disappears.

Named after biostatistician Joseph Berkson (1946), who noticed that hospital-based studies of disease co-occurrence were systematically biased because patients are admitted for having at least one condition — causing unrelated diseases to appear negatively associated.