IQ in the Age of AI: Are We Measuring the Right Things?

When GPT-4 was evaluated on the bar exam, it scored in the 90th percentile. When tested on the SAT reading and mathematics sections, it outperformed the majority of human test-takers. DeepMind's AlphaFold solved a 50-year protein-folding problem that had stumped the world's biochemists. AI systems now regularly outperform human experts on pattern recognition, medical diagnosis, legal analysis, and complex reasoning benchmarks. In this context, it is worth asking a serious question: what does human intelligence actually mean in 2025, and are our measurement tools still asking the right questions?

What AI Does Well — and Why It Challenges Our Metrics

Modern large language models and specialist AI systems perform extraordinarily well on many of the cognitive tasks that conventional IQ tests were designed to measure: pattern recognition, verbal analogies, logical inference, numerical sequences, and working memory tasks that can be expressed in symbolic form. A well-prompted LLM will ace a standard IQ test. This does not mean the AI is "intelligent" in the full human sense — it may be doing something fundamentally different from what human cognition does when solving the same problems — but it does raise the question of whether these specific tasks still demarcate human cognitive capacity in a meaningful way.

The Abstraction and Reasoning Corpus (ARC), developed by François Chollet, was designed specifically to resist AI pattern matching by requiring genuine novel reasoning from very few examples — the kind of generalisation that humans perform effortlessly but that has proven challenging for even the most capable AI systems. The ARC benchmark captures something closer to what Chollet calls "fluid intelligence" — the ability to reason from first principles about genuinely novel problems — and for years humans dramatically outperformed AI on it. Recent models have narrowed the gap but not closed it, suggesting that human generalisation from minimal data remains a frontier that current architectures have not fully captured.

What Remains Distinctly Human

The cognitive attributes that AI systems handle least naturally point toward what may be most distinctively valuable in human intelligence. These include: goal-directed agency under genuine uncertainty about the future; embodied, situated understanding that arises from living in a physical and social world; emotional and motivational states that shape cognition in adaptive ways; common-sense causal reasoning grounded in physical and social experience; and the capacity for what philosophers call "thick" understanding — knowing not just the answer to a question but why it matters and to whom.

AI systems also lack what might be called "wisdom" — the integration of domain knowledge with contextual judgment, ethical sensitivity, and a grounded sense of what a situation actually calls for. A large language model can produce a paragraph that sounds wise without having any stake in the outcome or any understanding of the human context that makes the advice meaningful or misguided.

Implications for How We Measure Intelligence

The AI challenge creates both a crisis and an opportunity for psychometrics. The crisis: if AI can score 130 on a conventional IQ test, the test may no longer capture what is distinctively valuable about human cognition. The opportunity: it forces us to ask more precisely what we mean by "intelligence," and to develop measures that capture dimensions of human thought that remain genuinely beyond current AI.

Researchers are increasingly interested in measuring metacognitive accuracy (knowing what you know and don't know), cognitive flexibility under adversarial conditions, creative problem-framing (identifying the right question, not just the answer), and interpersonal and collaborative cognition. These are harder to quantify but may prove to be the dimensions of human intelligence most worth cultivating and assessing as AI takes over an expanding domain of well-defined cognitive tasks.

Key Takeaway

The rise of AI does not make human intelligence irrelevant — it reshapes what kinds of human intelligence matter most. The cognitive attributes that AI captures most easily (pattern recognition, recall, logical inference on well-defined problems) may become less economically scarce, while those it handles least well — genuine novelty, embodied judgment, ethical wisdom, collaborative creativity — may become more valuable than ever. IQ tests will likely need to evolve, emphasising not just what you can do, but how you reason under genuine uncertainty, adapt to unexpected contexts, and integrate knowledge with purpose. The age of AI may ultimately push psychometrics toward a richer, more complete account of human cognitive potential.