History of the Big Five

The Big Five did not arrive in the personality literature as a finished discovery. It was assembled, dropped, rediscovered, and consolidated across roughly a century of empirical work, and the path matters: where the framework comes from is part of why it carries the weight it does. This page traces the research program from its first articulation in 1884 to the current consolidation around five-factor and six-factor lexical models.

The through-line is a single substantive claim — the lexical hypothesis — articulated and reformulated repeatedly across the period. The hypothesis is that the most socially and biologically consequential differences between people accumulate single-word labels in the languages those people speak; the corollary, that careful analysis of trait-descriptive vocabulary should recover the structure of personality differences a culture has been selecting on for as long as it has been speaking. Every step in the history below is a partial test of that claim, and the durability of the five-factor structure across languages is its strongest evidence.

Galton's first formulation

Francis Galton was the first scientist to articulate something like the lexical hypothesis. In an 1884 essay titled "Measurement of Character," published in the Fortnightly Review, Galton proposed that the conspicuous aspects of human character could be inferred from how often and how variously they were named in the lexicon. He counted character-relevant words in a thesaurus and estimated roughly a thousand terms with distinct shades of meaning, each overlapping substantially with the others. He went no further with the project; his interests pulled toward heredity, anthropometry, and the development of correlation as a statistical method.

What Galton initiated, English-language scholars largely ignored for the next half-century. A 1910 paper by George Partridge listed about 750 mental-state adjectives. In 1933, Franziska Baumgarten in Switzerland produced a German psycholexical classification of just over a thousand terms. The lexical approach existed; it had simply not yet been done with full coverage and methodological seriousness.

Allport and Odbert: the foundational catalog

That work fell to Gordon Allport and Henry Odbert at Harvard in 1936. Their Trait-Names: A Psycho-Lexical Study, published as Psychological Monographs No. 211, examined Webster's New International Dictionary and produced an inventory of roughly seventeen thousand nine hundred terms relevant to personality and behavior, sorted into four categories: stable traits proper; temporary states and moods; evaluative and character-laden judgments; and miscellaneous physical, capacity, and metaphorical descriptors. The first category, on the order of four thousand entries, became the working corpus from which subsequent lexical analyses would draw.

Allport himself was wary of where the catalog might lead. He was a strong proponent of idiographic personality study — understanding the individual case in its uniqueness — and was skeptical of nomothetic factor-analytic approaches that compressed personality into a small number of group-level dimensions. The Trait-Names study made the Big Five possible without endorsing it; Allport, were he alive to read Goldberg or McCrae, would likely have considered the resulting framework reductive in the way he had warned against.

Cattell and the path to sixteen

Raymond Cattell took up the Allport-Odbert inventory in the 1940s, using it as the raw material for the first major factor-analytic attack on the lexical structure of personality. He clustered the trait terms by semantic similarity, reduced them to a working set of approximately thirty-five variables, and then factor-analyzed correlation matrices generated from ratings on those variables. The result, after several iterations across the late 1940s, was the Sixteen Personality Factor Questionnaire, first published in 1949.

Cattell's commitment to sixteen primary factors rather than five higher-order ones reflected several methodological choices that would later be questioned. He used oblique rather than orthogonal factor rotations, allowing the factors to correlate. He relied on subjective clustering at the variable-construction stage, which preserved distinctions that more aggressive reduction would have collapsed. And he resisted the higher-order factor solutions that emerged when his own data were re-analyzed by others. Cattell remained a critic of the Big Five throughout his career. The irony is that the Big Five emerged most directly from re-analyses of Cattell's own data; the predecessor framework supplied the empirical material that displaced it.

The buried discovery: 1949 to 1968

Three independent re-analyses in the two decades after Cattell's 16PF found that five higher-order factors absorbed the variance in his trait-rating data more cleanly than sixteen.

Donald Fiske, working at the University of Chicago in 1949, factor-analyzed self-, peer-, and clinician ratings on a subset of Cattell's variables and recovered five factors closely matching what would later be named the Big Five. Ernest Tupes and Raymond Christal, at the United States Air Force's Lackland base in the late 1950s, ran the same kind of analysis on peer ratings of Air Force officers across multiple samples; they identified five factors which they labeled Surgency, Agreeableness, Dependability, Emotional Stability, and Culture, and reported the result in a 1961 USAF technical report that almost no one in academic personality psychology read. Warren Norman, at the University of Michigan, replicated the structure in 1963 using peer ratings of college students, published it in the Journal of Abnormal and Social Psychology, and provided the labels — with minor adjustments — that would later become canonical.

The structure was robust, replicable, and present in the published literature for the better part of a decade. The field nevertheless ignored it. Several factors explain the silence. The trait-psychology mainstream was dominated by Cattell's 16PF and Hans Eysenck's three-factor (E, N, P) program. Tupes and Christal's report sat in a military technical-document series with limited reach. The orientation of clinical and applied personality work was toward projective tests and depth-psychological assessment, not factor-analytic taxonomy. And in 1968, the entire trait enterprise was challenged on a far more fundamental basis.

The situationist wilderness: 1968 to the late 1970s

Walter Mischel published Personality and Assessment in 1968 and produced what is often described as a paradigm crisis in personality psychology. Mischel's central argument was empirical: when behavior is measured across multiple situations, the cross-situational consistency assumed by trait theory is rarely observed, and personality test scores predict behavior at correlations too low to support strong claims about stable dispositions. Mischel concluded — in the strongest reading of the book, the one that propagated most widely — that broad personality dispositions could not be the right level of analysis at all.

Mischel later objected that his argument had been more interactionist than anti-trait, and that he had not intended to dismantle the field. The book had landed regardless. Trait psychology paused for over a decade. Funding shifted to social-cognitive accounts of behavior; graduate students were directed away from trait dissertations; the existing five-factor literature, never well-known, slipped further into the background. Brent Roberts later characterized the field's response in three modes — ignoring the critique, attempting to refute it head-on, and proceeding as though it had never happened. The Big Five literature of 1949 to 1963 fell mostly into the third category and would have to be rediscovered rather than continued.

The revival: Goldberg, Digman, and Honolulu

The revival came from two directions. Lewis Goldberg, at the Oregon Research Institute, began an independent lexical research program in the 1970s, sampling trait adjectives from English-language sources and running factor analyses on self- and peer-rating data. He arrived at the same five factors that Tupes, Christal, and Norman had recovered, without initially knowing about their work. In parallel, John Digman at the University of Hawaii, preparing a graduate seminar in factor analysis in 1978, re-examined Cattell's correlation matrices and found that five-factor solutions cohered across studies in a way that six- or seven-factor solutions did not.

In January 1981, a symposium in Honolulu chaired by Goldberg brought together Digman, Naomi Takemoto-Chock, and Andrew Comrey to review the personality assessment landscape. The participants concluded that the most promising existing instruments converged on five common factors. Goldberg published the symposium argument later that year in a chapter that introduced the term "Big Five" to the literature. By 1990 his consolidated lexical analysis of English personality adjectives, in the Journal of Personality and Social Psychology, had established the structure as a working reference standard. Through the late 1980s and 1990s, parallel lexical replications by Boele de Raad, Gerard Saucier, Fritz Ostendorf, and others extended the result into Dutch, German, Italian, Hungarian, Czech, and a dozen other languages, with five-factor solutions emerging in most of them.

Costa and McCrae's parallel track

The other line of descent ran through questionnaire psychometrics rather than lexical analysis. Paul Costa and Robert McCrae, at the National Institutes of Health, developed the NEO Personality Inventory beginning in 1978 from a different starting point. They had been working in the tradition of Hans Eysenck, who proposed that personality was organized by three superfactors — Extraversion, Neuroticism, and Psychoticism. Costa and McCrae accepted Eysenck's E and N but disagreed about the third dimension. What Eysenck called Psychoticism, they argued, was a conflation of two distinct things, one of which was genuinely a personality dimension and the other an aspect of cognitive style they called Openness to Experience. The first NEO model, published in a 1978 chapter, was thus a three-factor framework: N, E, O — hence the acronym.

For most of the 1980s, Costa and McCrae continued to defend three factors against the emerging five-factor consensus. As late as 1983 they were arguing for the three-factor structure; the original 1985 NEO-PI was a three-factor instrument with six facets per dimension. The accumulating evidence — Goldberg's lexical work, the cross-cultural replications, the consistency of two further factors emerging from any sufficiently broad item pool — eventually convinced them. Beginning around 1989, McCrae and Costa published a series of papers that incorporated Agreeableness and Conscientiousness as full domains. The 1992 NEO-PI-R was the consolidated five-factor instrument: two hundred and forty items, five domains, six facets per domain, eight items per facet. It became, and remains, the most cited measurement instrument in personality research.

The transition was contested. Eysenck (1991, 1992) maintained that A and C were lower-order facets of his Psychoticism dimension rather than independent superfactors, and exchanges between Eysenck and Costa & McCrae in Personality and Individual Differences in 1992 set out the case for and against. The five-factor side won the empirical argument by accumulating replications; the three-factor program persisted in clinical and biologically oriented research but ceded the taxonomy to the Big Five.

The IPIP and the public-domain turn

In 1996, at the eighth European Conference on Personality, Goldberg announced what would become the International Personality Item Pool: a public-domain repository of personality items, freely available for research and modification, intended to provide an alternative to the proprietary commercial inventories that had dominated the field. The project was housed at the Oregon Research Institute and grew steadily through the late 1990s and 2000s. By the mid-2010s it contained over three thousand items distributed across more than two hundred and fifty inventories, with public-domain analogs of most major commercial instruments.

The fifty-item IPIP Big-Five Factor Markers, derived from Goldberg's 1992 lexical-marker scale, became the most widely deployed brief Big Five measure. The longer IPIP-NEO inventories, first in three-hundred-item and later in one-hundred-twenty-item forms, provided public-domain substitutes for the NEO-PI-R that researchers without budget for licensed instruments could use without permission. The Big Five test on this site uses the fifty-item IPIP markers.

HEXACO and continued lexical work

The lexical research program did not stop with the consolidation of the Big Five in the 1990s. Through the 2000s, Michael Ashton, Kibeom Lee, and collaborators ran a sustained series of cross-language lexical studies in which a six-factor solution proved more robust than five — Italian, Korean, Polish, Hungarian, Dutch, German, French, and English samples all yielded a recurring sixth factor that did not fit cleanly within any Big Five domain. They named the factor Honesty-Humility and proposed the HEXACO model in a 2004 paper, elaborated theoretically in a 2007 Personality and Social Psychology Review article and at book length in The H Factor of Personality (2013).

HEXACO is not simply the Big Five with a sixth factor added. The reorganization shifts content. Traits that load on Agreeableness in the Big Five — particularly the Straightforwardness and Modesty facets in the NEO-PI-R — load on Honesty-Humility in HEXACO instead, and HEXACO's Emotionality differs from Big Five Neuroticism by including some content (sentimentality, dependence on others) that the Big Five places elsewhere. The frameworks are correlated but not interchangeable. As of the mid-2020s, HEXACO is the leading alternative to the Big Five in research where moral or ethical traits are central, and the empirical question of whether five or six factors better captures the lexical structure of personality remains genuinely open.

The current landscape

The instruments most actively used in contemporary research span a range of lengths and design choices. The NEO-PI-R (1992) and its modest update NEO-PI-3 (2005) remain the standard for full-bandwidth assessment. The BFI-2 (Soto and John, 2017, in the Journal of Personality and Social Psychology) provides a sixty-item revision of the original Big Five Inventory with fifteen replicable facets, three per domain. The IPIP-50, IPIP-NEO-120, and IPIP-NEO-300 anchor the public-domain end of the spectrum. The HEXACO-PI-R is the standard six-factor instrument. Brief screeners — the BFI-10, the TIPI — exist for survey contexts where ten or twenty items are all that can be afforded.

Newer methods have begun to apply natural-language processing to the lexical hypothesis directly, using language models to recover trait correlations from large text corpora rather than from human ratings. Early results suggest that the broad five-factor structure does emerge from text-only analysis, with some interesting deviations at the boundaries of Openness and Neuroticism. Whether this constitutes another consolidation of the framework or the beginning of a successor program is still being worked out.

The Big Five is, in any case, no longer in dispute as a useful description. The question is whether it is the best description, and on that point the field is closer to consensus on the lower-bound claim — five factors recover most of the structure most of the time — than on the upper-bound claim — five factors are sufficient. The evidence from a hundred and forty years of lexical work is that something close to this structure is real. Whether it is exactly five is still being measured.

For the framework's substantive content rather than its history, see the Big Five pillar root.