Two variables that are completely independent in the full population can appear negatively correlated once you condition on a biased sample — one that tends to select people who are high in at least one of the two traits. The correlation is a statistical artifact of the sampling process, not a real relationship.
Choose a scenario:
A person is selected if their score in at least one trait exceeds this value.
The two traits in this simulation are generated completely independently — knowing one value tells you nothing about the other. The full-population correlation is essentially zero.
When we restrict our attention to people selected because they exceed the threshold in at least one trait, we introduce a dependency: someone with a low X score must have a high Y score to have made the cut. Equivalently, among the selected, a very high X score means Y could be anything — but a middling X score signals a high Y. This creates the illusion of a negative relationship.
Try it: Drag the threshold slider to the right (more selective). The apparent negative correlation in the selected sample grows stronger. With a very inclusive threshold (far left), nearly everyone is selected and the spurious correlation disappears.
Named after biostatistician Joseph Berkson (1946), who noticed that hospital-based studies of disease co-occurrence were systematically biased because patients are admitted for having at least one condition — causing unrelated diseases to appear negatively associated.