The history of intelligence testing spans more than a century, crossing from the clinics of early 20th-century Paris to the digital platforms of the present day. It is a history marked by genuine scientific achievement, profound ethical failures, and continuous methodological refinement. Understanding where IQ testing came from — and the purposes it was originally designed to serve — is essential context for using and interpreting it responsibly today.
Alfred Binet and the Original Purpose
The story begins not with a theory of intelligence, but with a practical problem. In 1904, the French government commissioned psychologist Alfred Binet and physician Théodore Simon to develop a method for identifying schoolchildren who needed supplementary educational support. The government's concern was humanitarian: without an objective tool, teachers might mistake inattentiveness or poor schooling for intellectual incapacity, or vice versa.
Binet and Simon's 1905 scale — the first practical intelligence test — consisted of 30 tasks of increasing difficulty, ranging from following a moving light to defining abstract concepts. Crucially, Binet had no theory of what intelligence was; he was agnostic about whether it was a single faculty or many. His scale was empirical: whatever tasks discriminated between children of different educational ages was useful, regardless of what it "meant." Binet also strongly warned against using his scale to rank normal children or to classify individuals as permanently limited. "Some recent philosophers," he wrote, "appear to have given their moral support to the deplorable verdict that the intelligence of an individual is a fixed quantity." He explicitly rejected this view.
The American Expansion and the Army Tests
The Binet-Simon scale was translated and adapted for American use by Henry Goddard in 1908 and later revised by Lewis Terman at Stanford — producing the Stanford-Binet test in 1916, which introduced the now-famous "intelligence quotient" (mental age divided by chronological age, multiplied by 100). Terman and his contemporaries were considerably less cautious than Binet about what the tests measured and what should be done with the results. The eugenics movement, then scientifically respectable, embraced IQ testing as evidence for the inheritability and immutability of intelligence.
World War I provided the first large-scale application of group intelligence testing. The U.S. Army, facing the challenge of classifying 1.7 million recruits, commissioned psychologists Robert Yerkes, Henry Goddard, and Lewis Terman to develop group-administered tests. The resulting Army Alpha (for literate recruits) and Army Beta (non-verbal, for the illiterate or non-English-speaking) were administered to 1.75 million men. The data were later used — controversially and methodologically dubiously — to support claims about racial and national group differences in intelligence, contributing to immigration restriction legislation in the 1920s.
Wechsler and the Modern Era
David Wechsler's contribution was pivotal in shifting intelligence testing toward its modern form. Working as a clinical psychologist at Bellevue Hospital in New York, Wechsler was dissatisfied with the Stanford-Binet's single composite score and its heavy reliance on verbal tasks. In 1939 he published the Wechsler-Bellevue Intelligence Scale, which separated intelligence into verbal and performance (non-verbal) components and used deviation IQ — comparing performance to same-age peers rather than to mental age ratios — as its scoring metric.
Wechsler's approach became the dominant paradigm in clinical intelligence assessment. The successive Wechsler scales — WISC for children, WAIS for adults, WPPSI for preschoolers — went through multiple revisions, with each edition reflecting advances in psychometric theory, factor analysis, and understanding of cognitive architecture. The WAIS-IV (2008) organises cognitive ability into four index scores: Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed — a multidimensional structure consistent with contemporary cognitive neuroscience.
From Paper Tests to Computerised Adaptive Assessment
The most recent era of intelligence testing is characterised by computerised adaptive testing (CAT), which uses algorithms to tailor item difficulty in real time to the test-taker's demonstrated ability level. Rather than everyone answering the same questions, CAT systems select harder items following correct answers and easier items following errors — converging efficiently on a precise ability estimate with fewer total questions. This approach reduces testing time, increases precision at extreme score ranges, and reduces the floor and ceiling effects that plague fixed-form tests.
Online IQ assessments — including those modelled on validated clinical instruments — have made psychometric testing accessible to the general public for the first time. This democratisation comes with both benefits (wider access, immediate feedback, large normative databases) and risks (potential for misinterpretation without clinical context, variable quality of instruments). The challenge for the field is maintaining the scientific rigour that distinguishes validated psychometric tools from entertainment quizzes.
Key Takeaway
IQ testing began as a pragmatic tool for identifying children who needed educational support — a humanitarian purpose that Binet would have recognised and approved. It was subsequently co-opted for purposes its inventor rejected, contributing to some of the most harmful episodes in early 20th-century social policy. The century since has seen genuine scientific maturation: greater methodological rigour, more nuanced multidimensional models, and a clearer-eyed understanding of what tests measure and what they don't. The history of IQ testing is inseparable from the history of how society has chosen to respond to human cognitive differences — and remains a live ethical question as assessment technology continues to evolve.