You are here: Books / Books / Per Bech: Clinical Psychometrics / Comments - 2
Saturday, 16.12.2017

Comment (Donald F. Klein)

By Donald F. Klein 

Per Bech's remarkable book has been outlined by its author, commented on by Martin Katz and replied to by Bech, who emphasizes the value of continuing critical dialog. These remarks continue this thread.

"Clinical Psychometrics", floods this reviewer with many contextual memories. When, in the 1950s, the, paradigm destroying, antipsychotic effects of chlorpromazine were first noted, they incited a storm of disbelief. There were many independent replications of anti-psychotic benefit, however to scientifically verify that these observations were not clinical fabrications the quite recent technology of the randomized, double blind, clinical trial was employed.

However, massive criticism, mostly by objectivity averse psychoanalysts, argued for objective diagnostic and clinical change measures, probably in the dim hope that objectivity was impossible. At that time, discerning objective manifestations of psychiatric disease was impossible. In current psychiatry objective measures are still ambivalently regarded as shown by their absence in DSM 5, despite NIMH’s fevered search for biomarkers.

However, if independent raters agreed with each other, then there had to be something observable out there, that allowed more than chance agreement. Rater agreement (reliability) then served as a surrogate for objective description. However, ill defined accusations of lack of validity, without specification of the multiplicity of validity criteria, served to derogate systematic observation.

Bech, using pithy summaries, explains the foundational observational and analytical work of Wundt, Kraepelin, Spearman, Galton, Pearson, Fisher, Eysenck, Hamilton, and Pichot among others.

Strikingly, Bech argues that the ubiquitous factor analysis does not provide appropriate measures of change or a foundation for diagnosis. This critically challenges much current work, as well as the NIMH sponsored Research Domain Criteria (RDoC) manifesto for dimensional primacy via multivariate analysis.

The more “modern” (since the 1970s!!) psychometric developments sparked by Rasch, Guttman and others, is generally labeled Item Response Theory (IRT). Bech holds these produce the only appropriate severity measures.

Guttman defined a hierarchy aimed at producing a unidimensional severity scale, based on the proportion of subjects endorsing each item. Since items endorsed by most subjects are easy (less pathological), whereas rarely endorsed items are difficult (very pathological), if an item of specified severity is endorsed, then all easier items should also be endorsed.  Each potentially useful item is mathematically evaluated to see if it consistently takes its place in such a hierarchy. Items that are endorsed by the few, but not by the many, just don’t fit, although they may be useful for other purposes. Change is determined by differences in Guttman defined severity. This exposition seems quite clear, even if the mathematics is well beyond me.

This fundamental Rasch analysis is unique in that its item pool is initially selected by expert psychiatrists, as reflective of a particular syndrome. Rasch analysis produces a severity scale, not a diagnostic scale. Bech holds that such a scale sufficiently describes an individual’s degree of severity by its total. This is not the case for familiar, but multi-dimensional, indices such as the Hamilton 17 item scale.

Factor analyses depend upon the rule of thumb selection of the number of factors, that then are rotated (by various methods) to differing definitions of simple structure. Bech holds that these procedures do not flow from a logical basis that allows firm deductions or sampling inferences. This defect is affirmed by the lack of factor replication across various samples.

Bech also argues that the use of factor analysis differs between American and British traditions. The mathematics of factor, and principal component, analyses, yields a principal factor,  marked by consistently positive loadings, and a second orthogonal factor with both positive and negative loadings. The British tradition uses only the contrast evident in the second factor.["In contrast, an American approach rapidly emerged in which factor analysis was used to identify as many factors as possible".] Bech argues that these factors, even if  ”rotated to simplicity”, cannot be represented by a simple total since they contain  heterogeneous  items with regard to both  severity and group discrimination. This impairs their use both as change and diagnostic measures.

In a clinical trial some of the items loading a supposedly simple factor may significantly contrast drug with placebo, whereas other items from the same scale  do not come close. Therefore a factorial scale score that sums its items attenuates the distinction between drug and placebo. This had been noted in a widely unnoticed 1963 paper (Klein DF & Fink M.  Multiple item factors as change measures in psychopharmacology. Psychopharmacologia 1963; 4: 43-52.)

Katz has reasonably suggested a “multi-vantaged” approach to patient evaluation. In particular, evaluations are amplified by video recordings that can be “blindly” assessed, by multiple experts, without knowledge of treatment or time of observation. In addition to the methodological gains, such recordings allow a more fine grained evaluation of the patient’s physical appearance, verbal flow, affective manifestations, change over time, etc.

Where Katz seems to part company with Bech is his reliance on scales produced by multiple factor analysis as well as depending on multiple statistical analyses, without correction for multiplicity, Katz argues (and I agree ) that specific tests of antecedently supported and  hypothesized  effects do not require a “family wise” significance level correction. However, such specifically stated antecedent hypotheses are not apparent (to me) for many of the claimed findings.

At one time, long past, NIMH supported methodological advances in psychopharmacology that often benefited, from designs using concurrent placebo control groups. Such clinical trials sufficed, both for demonstrating that specific drug activity existed and gaining FDA approval for marketing. However, this group average outcome difference does not determine which patients actually require medication for a positive response exceeding their counterfactual response while on placebo. This parallel group design obscures understanding of this critical issue.

Both Bech and Katz have addressed this problem.  It was recently suggested that the inclusive clinical trials design, promulgated by Chassan, may be necessary to solve this problem [Klein DF (2011): Causal Thinking for Objective Psychiatric Diagnostic Criteria. In: Shrout PE, Keyes K, Ornstein K (Eds.) Causality and Psychopathology,   New York City:  Oxford University Press, pp 321-337].

A discussion of this specific issue, in the dynamic framework for controversy provided by INHN, would be most worthwhile.

Donald F. Klein
October 31, 2013