You are here: Controversies / Amy S. F. Lutz: The Rise and Fall of the Dexamethasone Suppression Test: Stability, Consensus, Closure.
Tuesday, 18.05.2021

Amy S. F. Lutz: The Rise and Fall of the Dexamethasone Suppression Test: Stability, Consensus, Closure


       In 1981, psychiatrists Bernard Carroll, Michael Feinberg, and their colleagues at the University of Michigan published an explosive paper in the Archives of General Psychiatry: “A Specific Laboratory Test for the Diagnosis of Melancholia.” This test, the Dexamethasone Suppression Test (DST), had been known to researchers since the late 1960s. But the 1981 report marked the first time the DST was explicitly proposed as an “objective laboratory diagnostic test” (Carroll, Feinberg, Greden et al. 1981) for a particular psychiatric disorder, melancholic depression. It was a big claim: “the holy grail of biological psychiatry” (Noll 2006), a biomarker for disease, had finally been identified.

       Excitement over the DST skyrocketed. In 1986, the American Journal of Psychiatry marked the 8,000th citation on the test (Friedenthal and Swartz 1986); Carroll and Feinberg’s article alone would go on to be cited more than 2,200 times. Such enthusiasm is unsurprising: the American Psychiatric Association (APA) had just released a drastic revision of its Diagnostic and Statistical Manual, DSM III, in 1980. This version – which Anne Harrington calls “a remarkable turning point in the history of psychiatry” – erased all references to psychoanalytical theories of causation or treatment. Columbia Psychiatry Professor Robert Spitzer and his co-authors claimed they were “concerned only with descriptive diagnostics and not with etiology,” a justification Harrington rejects: “Of course, they were being disingenuous. They believed that biological – not psychoanalytic – markers and causes would eventually be discovered for all the true mental disorders (Harrington 2019).” One short year later, Carroll and Feinberg had seemingly fulfilled that prophecy.

       Yet, by the end of the 1980s, interest had faded. Edward Shorter and Max Fink – who wrote the only book-length history of the DST – attribute its fall to several factors, including unreasonably high expectations, shifting diagnostic criteria for depression and political infighting at the American Psychiatric Association (APA). And those are all important. But I’d like to tell a different story, through retrospective interviews with prominent participants, as well as an examination of more than 80 letters to the editor published between 1981-1989 in two of the most respected psychiatric journals of the period, the American Journal of Psychiatry and the Archives of General Psychiatry (now published as JAMA Psychiatry). This narrative privileges the social dynamics that determined the fate of the DST, examining the community (or lack thereof) of biological psychiatrists that fought over the research and clinical applications of the test.  Following other social histories of experimentation by Ludwig Fleck and Peter Galison, I argue that the DST was doomed by failures of stability, consensus and closure.


Instability of the DST

       Originally developed in 1960 as a test for Cushing’s disease, the DST measures the degree to which dexamethasone, a synthetic hormone, suppresses the production of cortisol, otherwise known as the “stress hormone.” By 1968, having noted that Cushing’s patients often experienced depressive and psychotic symptoms, Carroll began using the DST in severely depressed patients. This logic – that psychopathology is caused by dysregulation of the hypothalamus-pituitary-adrenal axis – originated in the early 20th century when researchers focused on a malfunctioning endocrine system as the source of “physical illness, mental pathology and moral deviance” (Pettit 2013). Carroll found that his depressed patients were, in fact, typically “nonsuppressors” – they maintained high levels of cortisol even after administration of dexamethasone. According to Carroll and Feinberg, by 1981 the DST was already “being used widely for the diagnosis of melancholia.” The stated purpose of their paper, in other words, was not to introduce the DST to the field, but to address various methodological inconsistencies: “Differing procedures have been adopted in various centers, and the test needs to be standardized for general use” (Carroll, Feinberg, Greden et al. 1981).

        Carroll and Feinberg attempted to stabilize the DST by strictly defining several of these variables, including dexamethasone dose, time and number of blood samples and response criteria. As Feinberg told me in a recent interview, “If you did [the DST] exactly the way it was described for exactly the patient group it was designed for, it worked.” And many subsequent researchers did follow their protocol. But a striking theme in the letters to the editor – a fascinating archive that documents the informal experiments, the negative results and the intradisciplinary conversations of a field – is the resistance to that boundary work.

       Carroll and Feinberg, for example, announced, “These results indicate clearly that the 1-mg dose of dexamethasone should be used” (Carroll, Feinberg, Greden et al. 1981). This conclusion was reached, Feinberg explained to me, after testing both 1mg, 2mg and the standard endocrine dose of 4mg and determining that the 1mg dose represented the best compromise between sensitivity (or “true positive rate,” the percentage of patients that actually have a given disease and test positive for it) and specificity (or “true negative rate,” the percentage of people that do not have the disease and test negative for it). Yet several letters reported experiments using different doses of dexamethasone, such as 1.5 or 2 mg. A team of German researchers, after reporting their contradictory findings, declared that “a dose of 1mg of dexamethasone is too low to provide reliable DST results” and “cannot be regarded as a valid measure of cortisol hypersecretion” (Berger, Pirke, Krieg et al. 1985).

       Other letters announced confounding factors that interfered with DST results. Carroll and Feinberg were aware of some of these interactions; they excluded almost 10%  of their outpatient subjects and 20% of their inpatients because of the drugs they were taking or the co-morbid medical conditions that could result in either false positives (pregnancy, alcoholism, Cushing’s syndrome, anorexia, epilepsy, and others) or false negatives (such as Addison’s disease). But the list of additional confounders reported in letters to the editor is truly staggering: stress (Grimm 1983), age (Chabrol, Claverie et al.1983; Klee, Garfinkel, et al. 1983; Branyon 1983; Robbins and Alessi 1984; Zis 1986), sleep disturbance (Mendlewicz 1984), ethnic background (Escobar 1985; Fuente and Amor 1986), repeated testing (Rihmer, Arato and Greden 1985), sodium levels Bond and Price 1989), season (DST nonsuppression is apparently less frequent during the winter) (Brewerton and Gwirtsman 1987; Arato, Rihmer and Szádóczky 1986).  Some seemingly random factors, like caffeine (Uhde, Bierer and Post 1985) or B12 deficiency (Fahs 1985) were only reported in one letter each. But the overwhelming impression was that virtually anything might interfere with DST results.

       An even bigger problem was the ongoing debate over other disorders that might trigger a positive DST. Carroll and Feinberg couldn’t have been clearer in the title of their paper: the DST was a test for one disorder only, and that was melancholia. Emphasized Feinberg, “It is good at, to my mind, one thing.”  Yet letters were continuously published during this decade reporting that the DST was sensitive to bulimia (Wolf 1982), brain tumors (Romanik 1983), borderline personality (Dilsaver and Greden 1983), and dementia (Castro P, Lemaire M, et al. 1983; Moffatt 1984; Mahendra 1985; Davous 1987). Other researchers announced positive results for schizophrenia (Bacher and  Lewis 1983; Castro, Lemaire, Toscano-Aguilar and Herchuelz 1983; Harris 1985; Tandon, Mazzara, et al. 1989), agoraphobia (Schneider, Whiteford, et al. 1983), panic disorder (Bueno, Sabanes, Gascon et al. 1984) and mania (Poirier, Loo, Gay and de Luca 1985; Schaffer 1982). Many of these letters then provoked responses confirming or refuting these findings. Jay Amsterdam, former director of the Depression Research Unit at the University of Pennsylvania and an early researcher on the DST, blames Carroll himself for this phenomenon; in an interview, Amsterdam told me that Carroll “was so enthusiastic, he made the DST out to be the perfect test.” Agreed Feinberg, “I think that Barney started to oversell it.”

       These questions all attached to the utility of the DST as a diagnostic tool. But biomarkers can have other applications besides diagnosing disease. Psychiatrists Anissa Abi-Dargham and Guillermo Horga point out that biomarkers can also “reflect a process associated with the therapeutic response and [be] used in clinical stratification” (Abi-Dargham and Horga 2016). In other words, they can provide an important predictive function. If it’s not always clear where Shorter and Fink stand on the diagnostic value of the DST – on the one hand, they call Carroll and Feinberg’s decision to proclaim the DST “A Specific Laboratory Test for the Diagnosis of Melancholia” “a blunder” (Shorter and  Fink 2010) yet they also defend the DST’s “use to define melancholia” (Shorter and  Fink 2010) – they are unequivocal in their support of its utility in evaluating suicide risk, as well as assessing response to treatment and confirming remission. Normalization of DST results, Shorter and Fink summarize, “was predictive of a good outcome” (Shorter and  Fink 2010).

       Yet that, too, was contested. In 1982, Amsterdam and his colleagues published a paper in The American Journal of Psychiatry announcing that they had found abnormal DST responses – in their controls. “Fifteen percent of my normal subjects were also non-suppressors,” he told me. “And my results were not statistically significant from other papers that found up to 25% response in controls.” This finding led to the discovery that Carroll would later identify as the death blow to the DST: the tremendous amount of individual variation in dexamethasone metabolism – “a 30-fold variability,” according to Amsterdam. This could be controlled by measuring dexamethasone levels, as well as cortisol levels, in the blood of each subject. But the additional test increased the overall cost to the point that insurance companies were unwilling to cover it.

       Which raises the question: who, exactly, was meant to administer the DST? Carroll and Feinberg highlight the “internists” and “family practitioners” that they hoped could use the DST to accurately distinguish their melancholic patients from those with milder, non-endogenous dysthymias – a project that had become increasingly urgent, since DSM III merged these two clinical entities into one diagnostic category: major depression. This new disorder ran the gamut from patients experiencing difficulty getting over a divorce or unhappy in their employment to those whose profound debility left them, as Amsterdam described them, “too sick even to commit suicide.” Ross Baldessarini, Professor of Psychiatry at Harvard and Director of the International Consortium for Bipolar and Psychotic Disorders Research at McLean Hospital, condemned this lumping move as “a nightmare… It was a fundamentally major blunder that left us with a heterogenic collection of illnesses. How they’re to be categorized and treated is still fuzzy and debated.”  Ideally, the DST would have done that work of sorting patients, ensuring that those with melancholic/endogenous depression received the somatic therapies to which they were most likely to respond (tricyclic antidepressants, MAO inhibitors and ECT).

       However, as Baldessarini explained, “It’s a complicated test, with lots of risk of artifacts. It requires a certain degree of expertise to execute properly.” Several of the letters addressed these technical issues. A particular concern was the variation found between different methods of measuring cortisol in the blood.  Reported a team from the University of Chicago, “Research laboratories, e.g., those of Carroll, Sachar, and ourselves, use the competitive protein-binding assay (CPBA) method to measure plasma cortisol levels, whereas most clinical laboratories use commercial radioimmunoassay (RIA) kits. Compared with CPBA, RIA is more rapid and requires less effort. Many such kits are now available, but their performance varies due to the characteristics of the antiserum.” When the researchers compared different RIA kits to CPBA, they “obtained values that were 20% to 40% higher than the actual concentrations we had prepared” (Fang, Warenica and Meltzer 1982) – which meant that patients would be identified as non-suppressors who actually exhibited normal suppression. Seven months later, a group of researchers from West Park Hospital in England wrote to report that they also found significant variation between assays and suggested that “an international collaborative study should be set up to determine the variability of plasma cortisol estimations” (Wood, Harwood and Coppen 1983). And the following year, a California team argued that, exactly because of this variability, the DST as originally designed was “impractical” for “the psychiatric clinician, who would need to become sophisticated in assay theory, especially when interpreting split-sample variability (e.g., whether the split samples were run in the same or different assays” (Rubin and Poland 1984).

       Although Carroll defended the DST, he never pursued these questions in his research. “I don’t think Barney was interested in that level of nitty-gritty,” Michael Feinberg told me. “He would have considered that derivative science and beneath him. That is the kind of work that somebody had to do, and nobody was willing to do it.” The result, according to Amsterdam, was that the level of expertise needed to run and interpret the DST accurately was beyond that of the 1980s’ “strip mall psychiatrists” who, he said “were still doing psychotherapy and asking patients about their mothers.” And psychiatrists with that expertise, Amsterdam suggested further, wouldn’t need a diagnostic blood test for depression in the first place. He rejects Shorter and Fink’s claim that “severity [of depression] is not readily apparent (Shorter and  Fink 2010)” For him, the symptoms characteristic of the melancholic depression that the DST supposedly identified should be obvious to any competent clinician. These include not just sadness, but physiological symptoms like weight loss, insomnia and loss of libido. Amsterdam described the patients he saw in his clinic – typically those who had already failed a litany of treatments, including electroconvulsive therapy – as “train wrecks” who were easily distinguishable from run-of-the-mill patients dissatisfied with their life circumstances. “Diagnosis of melancholic depression is made from observation and by eliciting information from the patient,” he said. “I don’t need a DST to confirm it, why bother? Would you not treat if the test came back negative?”

       Amsterdam’s comments raise the question of how (or whether) information circulates between the small, elite, “esoteric circle” and the larger “exoteric circle” that microbiologist and philosopher of science Ludwig Fleck argues comprise the structure of all thought collectives, which he defines as “a community of persons mutually exchanging ideas or maintaining intellectual interaction” (Fleck  1979). For Amsterdam, the DST debate was restricted almost exclusively to academic researchers in psychiatry. Where the test did transition to clinical use, he told me, were the private hospitals in which former researchers grew rich unscrupulously and indiscriminately selling “the depression test” to wealthy patients.

       The historiography – or rather, the lack thereof – supports Amsterdam’s characterization of a controversy that was highly contained. Although Shorter and Fink report an overwhelmingly enthusiastic reaction to the DST – one unidentified source they interviewed summed up the response this way: “People said, ‘finally psychiatry is a part of medicine!’” (Sorter and Fink 2010) – prominent histories of psychiatry don’t mention it (including Shorter’s own, published 13 years before Endocrine Psychiatry) (Scull 2015; Shorter 1997). The test is even absent from Harrington’s recent book on the ascendance of biological psychiatry in the 1980s – the exact period when the DST disputes were most intense. But even though she doesn’t address the DST directly, her description of a sharply stratified field is remarkably similar, if somewhat more tactful, than Amsterdam’s – Harrington concludes a section on eugenics with this observation: “Of course, most rank-and-file hospital psychiatrists were not involved in debates like these” (Harrington 2019).

       Are these “rank-and-file” psychiatrists part of the same thought collective as the academic research psychiatrists? Fleck leaves these membership details largely uninterrogated. At times, the thought collective seems synonymous with what we might call a discipline: Fleck reports that the Wassermann reaction test “created and developed a discipline of its own: serology as a science in its own right” (Fleck 1979). But regardless of where the boundaries are drawn, Fleck never questions the coherence and common purpose of the thought collective. He describes a system in which knowledge circulates freely and necessarily between the esoteric and exoteric circles, resulting in the critical co-production that is at the core of scientific discovery: “This network in continuous fluctuation is called reality or truth” (Fleck 1979).

       But even if Fleck’s account of esoteric and exoteric circles is somewhat undertheorized, his emphasis on the necessary sociality of experimentation is still generative for this analysis – which is not that surprising, since he developed his theory to explain how the Wassermann reaction test for syphilis, often cited as a model for biomarkers in psychiatry, came to be accepted in the first decade of the 20th century. Particularly resonant for my analysis of the DST is Fleck’s celebration of uncertainty, error and even failure across many experiments by different researchers as “the building materials for a scientific fact” (Fleck 1979). As he reports, “[Wassermann’s] basic assumptions were untenable, and his initial experiments irreproducible, yet both were of enormous heuristic value. This is the case with all really valuable experiments” (Fleck 1979).

       Interestingly, the heuristic value of the DST was raised by both Amsterdam and Baldessarini. Although he rejected its clinical use, Amsterdam appreciated the test as a research tool:


[The DST] got me thinking about the hypothalamus, and developing the ACTH stimulating test. It got me doing adrenal CT scans and molecular imaging of the adrenal cortex. It got me thinking about endocrinology, and from there to immunology and viruses. It’s a heuristic construct – a way to understand the disorder and to dissect and study it…It was a means to an end, not the end itself.


       Similarly, Baldessarini celebrated the DST as “historically an important idea, an effort that lead to a lot of critical thinking about things like sensitivity, specificity, predictive power – things many people hadn’t thought about before.” What Amsterdam and Baldessarini were saying, in effect, was that the DST was good – to paraphrase Claude Levi-Strauss – to think with. But not, apparently, to diagnose depression with.

       The big difference between the Wassermann reaction and the DST was, obviously, that the former was accepted – in other words, it became a scientific fact – and the latter was not. But Fleck would not have blamed Carroll and Feinberg for their failure to fix the DST as a portable object that could be reliably used across different research and clinical settings, as they set out to do in their 1981 paper. For Fleck, this responsibility belonged to the entire thought collective, and that’s who deserved the credit for the success of the Wassermann reaction: “The thought collective made the Wassermann reaction usable and…even practical,” he wrote. “The findings were stabilized and depersonalized” (Fleck 1979).

       In other words, the confusion attached to the DST – who should administer it, how it should be administered, and how the results should be interpreted – did not necessarily doom the test. The problem was the failure of those different experiments to build upon one another, to cohere, to inform some kind of consensus. Concluded Feinberg, “For whatever reason, the test never came down from Sinai on tablets of stone – and even if it had, people would still have argued about it. It would have taken something else to make it a standard test – for example, drug companies using the DST as part of clinical trials during drug development. Then, all of a sudden, it would have wider validity outside research settings. And that never happened.”


Biomarkers, Scientificity and Thought Styles

       Psychiatrists have been searching for biomarkers as long as there have been psychiatrists. Historian of psychiatry Richard Noll traces this quest back to 1854, when Scottish asylum physician W. Launder Lindsay first examined the blood of his patients under a microscope, searching for differences that might explain their bizarre thoughts and behavior. Noll’s history of the quest for diagnostic blood tests in psychiatry – which, he suggests, would cause a “potential revolution in psychiatry” (Noll 2006) – was provoked by the 2005 announcement, from a team of American, Canadian and Taiwanese researchers, of a new blood test that used RNA profiles to distinguish schizophrenic patients from those with bipolar disorder, as well as normal controls. Noll describes this paper as “actually the second time in the history of psychiatry that such a claim has caught the world’s attention (and held it for any significant length of time).”

       The first? Noll (like Shorter, like Scull, like Harrington) does not cite the DST. Nor, more surprisingly, does he count the Wasserman reaction test, even though he identifies the 1906 discovery as “a turning point for biological psychiatry” (Noll 2006), a milestone also identified by Elizabeth Lunbeck (2003). Instead, he cites the notorious German biochemist Emil Abderhalden who announced in 1912 that he had identified “defence enzymes” that could be used to diagnose several conditions, including pregnancy, cancer, schizophrenia and other psychiatric disorders. Despite the fact that, as Israeli historian of science Ute Deichmann and German biologist Benno Müller-Hill remind us in a 1998 account accusing Abderhalden of deliberate fraud, “defence enzymes do not exist!” (Deichmann and Müller-Hill 1998), they persisted in the literature until the 1960s – including a particularly ignominious stretch as a marker of racial differences during the Nazi era. In the United States, the Abderhalden reaction test was used to justify experimental thyroidectomies, oophorectomies and cecostomies, as well as complete extraction of patients’ teeth.

       These examples of biomarkers are important not because of their ultimate utility, but because they demonstrate a long-standing interest in psychiatry that has informed research since the middle of the 19th century. Fleck calls this “entirety of intellectual preparedness or readiness for one particular way of seeing and acting and no other” a thought style. In other words, scientific thought and practice in any field are not random or free; the social forces he identifies represent “a definite constraint on thought” (Fleck 1979). Psychiatrists research biomarkers because the academics who trained them, the labs where they did their research and the publications they read reinforce this common currency – which is now used to describe not only blood tests like the DST, or genetic variants like the monoamine oxidase A gene, which has been correlated with antisocial behavior, but even a “digital phenotype” that former NIMH director Thomas Insel hopes will be revealed by how subjects swipe their smartphones (Rogers 2017).

       Typically, discourse about biomarkers is located within a larger insecurity that, as Shorter and Fink open the introduction to Endocrine Psychiatry, “more than any other medical field, psychiatry has been guided by cultural preferences and political persuasions” (Sorter and Fink 2010). This concern with scientificity is also part of the thought style and is easily recoverable from any text that even references the history of the discipline. Noll observes that in psychiatry particularly, “almost all the tales from the bench are about failed lines of research that explored hypotheses promising for their own historical era but which are obsolete in the present one” (Noll 2006). And Lunbeck notes that “psychiatry’s status became even more marginal” in the late 19th century, when medicine [unlike, she implies, psychiatry] began to embrace science” (Lunbeck  2003). One of the first things Ross Baldessarini described in our interview was the “tremendous push in academic psychiatry to make mental illness part of general medicine,” that has been “driven by a wish for medical respectability. The field has always been a little bit fringy, not really mainstream medicine in many parts of the world. I suspect the quest for lab tests, etc. were driven by this need to be part of regular medicine.”

       While concerns about the scientificity of the profession did emerge in the debates over the DST, they took a somewhat surprising form. The researchers and clinicians that wrote letters to the editor regarding the DST neither expressed hope that the test would prove to be the biomarker that would finally rehabilitate psychiatry, nor did they betray any anxiety that it would fail to be – as Michael Feinberg summed up Carroll’s attitude – “the greatest thing since sliced bread.” There was no evidence in the letters, or in the interviews I conducted with the researchers, that anything larger was on the line than the validity of the DST itself. Feinberg himself holds a relatively narrow view of the utility of biomarkers in psychiatry. “I think it would help in the very narrow field of drug studies,” he said, suggesting that psychotropic medications might one day be targeted to people with particular physiological indicators.

       Instead, researchers were concerned with adhering to the scientific method – a strategy, they believed, that distinguished them from the psychoanalytic predecessors that had “virtually abandoned [it] for half a century” (Shorter 1997). As discussed above, many letters focused on the technical details of the test. One of the most entertaining letters raised concerns about spurious correlations, criticizing an earlier study that claimed to find an association between depression ratings and post dexamethasone plasma cortisol concentrations. The authors of the letter included a similar graph comparing “the median salaries of Protestant ministers and the price of beer,” to show that “mean values of two time series tell us little, if anything, about longitudinal relationships” because “these two variables are both correlated with a third variable, i.e., inflation. A third variable may also explain the apparent relationship that is seen in the dexamethasone-clinical response example” (Gibbons and Davis 1984).

       But the most common theme in the letters was replication, or the lack thereof – of Carroll and Feinberg’s original protocol, or of variations and departures reported in earlier letters. For Feinberg, this represents the foundation of science. “To me, something is scientific when it comes from two completely independent sets of studies,” he told me. “One guy and his disciples, no. But two completely separate places, then I’ll believe it.” Amsterdam holds a similar definition: “Science isn’t the discovery – that’s hubris. People get money and fame (and lots of amygdala activation) from the discovery. But science is the replication.” He distinguished this rational process – which he called “science from the head” from Carroll’s DST research, which Amsterdam considered “science from the heart.” The latter is, in his opinion, “uninformed science” that impedes the advancement of the field. When he discovered the significant rate of non-suppression in the normal population, Amsterdam “never mentioned it to Barney because I knew he was so invested in the DST and was so disappointed that the field didn’t embrace it,” he told me. “It was like walking on eggshells with him.”

       Amsterdam wasn’t the only one to suggest that Carroll’s emotional investment in the DST was the largest threat to its scientific bona fides. Feinberg, his frequent collaborator, described Carroll as a “very difficult person” who would get angry with researchers who published studies on the limitations of the DST. Baldessarini remembered, “I knew Barney very, very well and over many of the years we were pretty good friends. Other times, we got into bitter conversations, often about the DST. I basically didn’t agree with his very, very high enthusiasm. He had a hard time listening to anything he took as criticism.” Even Shorter and Fink – unabashed fans of the DST who explicitly argue in their book that the test deserves reconsideration – report that his colleagues found Carroll “irascible” and “prickly.” One told Shorter and Fink that Carroll was “‘so personally identified that he couldn’t be objective” (Shorter and Fink 2010).

       Not that institutional efforts weren’t made to smooth things over. In March 1982, Carroll submitted a letter to The American Journal of Psychiatry in response to a study published that month by Amsterdam and his colleagues. Amsterdam has an original copy of that letter, which he was given by the journal along with an invitation to respond. But the version that appeared in the November issue was markedly different. Particularly notable are the many changes the editorial staff made to correct Carroll’s contemptuous tone and preserve a more civil discourse. Consider the first line of the letter, for example. In the original draft, Carroll begins in an accusatory manner: “There are several errors of fact and of interpretation in the article by Amsterdam et al.” The published letter opens much more cordially: “Several aspects of ‘The Dexamethasone Suppression Test in Outpatients with Primary Affective Disorder and Healthy Control Subjects’ by Jay D. Amsterdam, M.D. and associates (March 1982 issue) merit discussion” (Carroll 1982).

       Throughout the letter, the editors preserved Carroll’s objections to Amsterdam’s paper while softening his obvious anger. The passage, “Amsterdam et al know perfectly well that there is a marked dose effect on the sensitivity of the DST. By not pointing it out in the discussion of their results they have misled many readers,” became simply, “There is a marked dose effect on the sensitivity of the DST.” The dig, “In the 1980s, however, well informed endocrinologists are aware of these issues,” was re-phrased: “In the 1980s, however, we should be aware of these issues.” Also stripped were repeated dismissals of Amsterdam’s claims as “naïve.” In contrast, Amsterdam’s reply was reproduced almost verbatim. Only the slightest editorial changes were made: adding a citation; replacing “Dr. Carroll” with “he” in one instance; changing “DST” to “the DST” (Amsterdam, Winokur and Caroff 1982).

       The question remains: was Carroll a lightning rod because of his identification with the DST, or was the DST a lightning rod because of its very tight association with Carroll? Fleck argues that “every scientist has the obligation to remain in the background… All, in the service of the common ideal, must equally withdraw their own individuality into the shadows” because “the true creator of a new idea is not an individual but the thought collective” (Fleck 1979). Clearly, Carroll felt differently. Instead, he became the public face of the DST, its most visible champion. But it’s not hard to see how a man “with few friends” (Feinberg 2019) who stormed out of workshops (Shorter and Fink 2010), attacked conference presenters during Q & A (“What we all heard him say,” Amsterdam remembered about this incident, “was ‘I really need to be right – this is my baby, my child,’”) (Amsterdam 2019), and “slam[ed] his critics as ‘credulous’ and ‘confused’” (Shorter and Fink 2010) would disrupt the intrinsically social process that, for Fleck, is science.



       In the standard story, the DST ends not with a bang, but with a whimper. Shorter and Fink conclude, “the whole endocrine project has trickled away” (Shorter and Fink 2010), echoing Harrington’s indictment of the entire decade: “The science that was necessary to support [biological psychiatry’s] grandiose ambitions was not on hand. The biomarkers were not found” (Harrington 2019).

       But what does it mean to call something a failure? While its popularity has certainly waned, a quick PubMed search reveals that researchers are still using the DST in studies comparing unipolar to bipolar patients (Monreal, Duval , Mokrani et al. 2019), examining the role of the HPA axis in bipolar disorder (Tournikioti, Dikeos, Alevizaki et al. 2018), and testing its correlation with childhood trauma (Kellner, Muhtz, Weinås et al. 2018), back pain (Nees, Löffler, Usai and Flor 2019), and obesity (Maripuu, Wikgren, Karling et al. 2016), just to name a few. Physicist and historian of science Peter Galison argues that evaluating the end of an experiment is tricky: “There is no strictly logical termination point inherent in the experimental process,” he writes. “Between first suspicion and final argument there is a many-layered process through which belief is progressively reinforced.” He concludes that “any account of science that glosses over the difficulty of the process misses the real content of laboratory life (Galison 1987).

       Galison’s case studies in high-energy experimental physics suggest that confidence in experimental findings can be mapped along two axes: “the increasing directness of measurement and the growing stability of the results. By directness I mean all those laboratory moves that bring experimental reasoning another rung up the causal ladder… By ‘stability’ I have in mind all those procedures that vary some feature of the experimental conditions” (Galison 1987). While I can’t speak to technological changes that might have affected the “directness” of the DST since Carroll and Feinberg published their paper almost 40 years ago, it seems clear that the condition of stability was never satisfactorily achieved. While the utility of the test to the field of psychiatry has never been proven, it’s never been disproven, either.

       Baldessarini agrees. Although Shorter and Fink count him “among the country’s most articulate and determined opponents of the [DST]” (Shorter and Fink 2010), he acknowledged that “we didn’t stick with [the DST] long enough to find the deeper truths about the DST positives, and how they were different from DST negatives, and whether that has some message in terms of subtyping [depression].” Why not? “People get to the point of frustration with what hasn’t worked,” Baldessarini said. “We try to remain hopeful by using new tools. It’s the human tendency to keep moving forward.”

       That frustration speaks to the very human element in this story. Like Fleck, Galison emphasizes the social core of science. He reports that “the social interaction among experimenters was pivotal in determining how the experiments ended” (Galison 1987), because, ultimately, it is consensus among researchers – within the thought collective, in other words – that drives science. And that consensus never emerges from one experiment by one researcher. Very similarly to Fleck, Galison argues, “No single argument drove the experiment to completion… It was a community that ultimately assembled the full argument” (Galison 1987). The DST is a perfect example of what happens when the community remains fractured, unable to come to consensus.

       And that lack of consensus – at least in this particular case – is self-perpetuating. Although researchers are still using the DST in experiments, clinical use of the test requires time-consuming and expensive lab work that isn’t justifiable for the isolated patient. Michael Feinberg explained to me exactly what is involved:


       When you’re doing a laboratory measurement test – on that day, you run a bunch of standards with that day’s samples, and you get a curve of the standards. The results that you send back are accurate within the limit of the standards. You can interpolate, but extrapolation is extremely risky. 

       The levels of cortisol from a patient that is considered an escaper are lower than what most labs consider their standard curve. Because the labs are either looking for zilch – because the patient has Addison’s – or off the chart, because the patient has Cushing’s. They’re not looking in the low range. That’s hard to get done unless you coordinate with the lab. And if I send in a sample for a DST with a note to standardize in the low range, that is going to require money, time, effort. 


       “It’s one of those things,” he concluded. “Nobody does it because nobody does it.”

       Perhaps the ambiguity shrouding the DST is closer to the norm than positivist views of science care to admit. Certainly, Fleck and Galison would agree. Fleck’s claim that “there is no such thing as complete error or complete truth” (Fleck 1979), resonates with the advice Professor Emeritus of Psychiatry John C. Whitehorn offered in a 1961 lecture at Massachusetts General Hospital: “Expressed in terms of punctuation marks, it is not right to symbolize science by the period, which closes a statement with an appearance of utter finality. Science is better symbolized by the question mark, signaling a doubt and a further look” (Whitehorn 1963). Or, as Ross Baldessarini – who shared Whitehorn’s speech with me, obviously greatly influenced by it – concluded: “I don’t know that anything is ever closed.”



Amsterdam J, February 21, 2019. 

Baldessarini R, March 6, 2019 

Feinberg M, February 28, 2019.



Abi-Dargham A, Horga G. The Search for Imaging Biomarkers in Psychiatric Disorders. Nat Med, 2016;22(11):1248-1255. 

Amsterdam J, Winokur A, Caroff S. Letter to the Editor. Dr. Amsterdam and Associates Reply. American Journal of Psychiatry, 1982;139(11):1523. 

Arato M, Rihmer Z, Szádóczky E. Letter to the Editor. Seasonal Influence on the Dexamethasone Suppression Test Results in Unipolar Depression. Arch Gen Psychiatry. 1986;43(8):813. 

Bacher NM, Lewis HA. Letter to the Editor. Abnormal dexamethasone suppression test results in schizophrenia. American Journal of Psychiatry, 1983;140(8):1100b-1101.  

Berger M, Pirke K-M, Krieg J-C, von Zerssen D. Letter to the Editor. Limited Utility of the 1-mg Dexamethasone Suppression Test as a Measure of Hypercortisolism. Arch Gen Psychiatry, 1985;42(2):201-2. 

Bond WS, Price TRP. Letter to the Editor. DST response and pre-DST sodium levels. American Journal of Psychiatry, 1989;146(1):123-124. 

Branyon DW. Letter to the Editor. Dexamethasone suppression test in children. American Journal of Psychiatry, 1983;140(10):1385. 

Brewerton TD, Gwirtsman HE. Letter to the Editor. Seasonal Dexamethasone Suppression Test Results. Arch Gen Psychiatry. 1987;44(10):920. 

Bueno JA, Sabanes F, Gascon J,  Gasto C, Salamero M. Letter to the Editor. Dexamethasone Suppression Test in Patients With Panic Disorder and Secondary Depression. Arch Gen Psychiatry, 1984;41(7):723-4. 

Carroll BJ. Letter to the Editor. Comments on Dexamethasone Suppression Test Results. American Journal of Psychiatry, 1982;139(11):1522c-13. 

Carroll BJ, Feinberg M, Greden JF, Tarika J, Albala AA, Haskett RF, James NM, Kronfol Z, Lohr N, Steiner M, de Vigne JP, Young E. A specific laboratory test for the diagnosis of melancholia. Standardization, validation, and clinical utility. Arch Gen Psychiatry, 1981;38(1):15-22. 

Castro P, Lemaire M, Toscano-Aguilar M, Herchuelz A. Letter to the Editor. Abnormal DST Results in Patients With Chronic Schizophrenia, American Journal of Psychiatry,1983;140(9): 1261.  

Castro P, Lemaire M, et al. Letter to the Editor. Depression, dementia, and the dexamethasone suppression test. American Journal of Psychiatry, 1983;140(10):1386.  

Chabrol H, Claverie J, et al. Letter to the Editor. DST, TRH test, and adolescent suicide attempts.  American Journal of Psychiatry, 1983;140(2):265. 

Davous P. Letter to the Editor. Cortisol and Alzheimer's Disease. American Journal of Psychiatry,  1987;144(4): 533c-4. 

Deichmann U, Müller-Hill B. The fraud of Abderhalden's enzymes. Nature, 1998;393:109-11. 

de la Fuente JR, Amor JS. Letter to the Editor. Does Ethnicity Affect DST Results? American Journal of Psychiatry,  1986;143(2):275-276. 

Dilsaver SC, Greden JF. Letter to the Editor. The DST in Borderline Patients. American Journal of Psychiatry,  1983;140(11):1540b-1. 

Escobar JI. Letter to the Editor. ACTH and cortisol levels in healthy probands and psychiatric patients following the dexamethasone suppression test. American Journal of Psychiatry,  1985;142(2):268-9.  

Fahs JJ. Letter to the Editor.  The DST and Organic Affective Disorder. American Journal of Psychiatry, 1985;142(8):991b-2. 

Fang VS, Warenica B, Meltzer HY. Letter to the Editor. Dexamethasone Suppression Test: Technique and Accuracy. Arch Gen Psychiatry. 1982;39(10):1217. 

Fleck L. Genesis and Development of a Scientific Fact. Chicago: University of Chicago Press; 1979. 

Friedenthal SB, Swartz CM. Letter to the Editor. A Milestone for the Dexamethasone Suppression Test. AJP, 1986; 143(9):1198. 

Galison P. How Experiments End. Chicago: University of Chicago Press; 1987. 

Gibbons RD, Davis JM. Letter to the Editor. The Price of Beer and the Salaries of Priests: Analysis and Display of Longitudinal Psychiatric Data. Arch Gen Psychiatry, 1984; 41(12):1183-4. 

Grimm RH. Letter to the Editor. Possible correlation of stress and DST performance. American Journal of Psychiatry, 1983;140(9):1258. 

Harrington A. Mind Fixers: Psychiatry’s Troubled Search for the Biology of Mental Illness. New York: W.W. Norton & Co.; 2019.

Harris VJ. Letter to the Editor.  The dexamethasone suppression test and residual schizophrenia. American Journal of Psychiatry, 1985;142(5):659b-60.  

Kellner M, Muhtz C, Weinås Å, Ćurić S, Yassouridis A, Wiedemann K. Impact of Physical or Sexual Childhood Abuse on Plasma DHEA, DHEA-S and Cortisol in a Low-Dose Dexamethasone Suppression Test and on Cardiovascular Risk Parameters in Adult Patients with Major Depression or Anxiety Disorders. Psychiatry Res, 2018;270:744-8. 

Klee SH, Garfinkel BD, et al. Letter to the Editor. Use of the cortisol suppression index for adolescents. American Journal of Psychiatry, 1983;140(7):951b-952. 

Lunbeck E. Psychiatry. Cambridge History of Science 7. Cambridge: Cambridge University Press; 2003. 

Mahendra B. Letter to the Editor. The Dexamethasone Suppression Test in Dementia. American Journal of Psychiatry, 1985;142(4):520b-1.  

Maripuu M, Wikgren M, Karling P, Adolfsson R, Norrback K-F. Relative Hypocortisolism is Associated with Obesity and the Metabolic Syndrome in Recurrent Affective Disorders. J Affect Disord, 2016;204:187-96. 

Mendlewicz J. Letter to the Editor. REM latency and DST results. American Journal of Psychiatry,  1984;141(3):473-4. 

Moffatt J. Letter to the Editor. The Dexamethasone Suppression Test and Dementia. American Journal of Psychiatry, 1984;141(8):1019.  

Monreal JA, Duval F, Mokrani M-C, Fattah S, Palao D. Differences in Multihormonal Responses to the Dopamine Agonist Apomorphine Between Unipolar and Bipolar Depressed Patients, J Psychiatr Res, 2019;112:18-22. 

Nees F, Löffler M, Usai K, Flor H. Hypothalamic-Pituitary-Adrenal Axis Feedback Sensitivity in Different States of Back Pain.  Psychoneuroendocrinology, 2019;101:60-6. 

Noll R. The blood of the insane. Hist Psychiatry, 2006;17(68 Pt 4):395-418. 

Pettit M. Becoming Glandular: Endocrinology, Mass Culture, and Experimental Lives in the Interwar Age. The American Historical Review,  2013;118(4):1052-76. 

Poirier MF, Loo H, Gay C, de Luca S. Letter to the Editor. Confirmation of Abnormal DST Results in Manic Patients. American Journal of Psychiatry, 1985;142(7):888.  

Rihmer Z, Arato M, Greden JE. Letter to the Editor. Possible Dexamethasone Influences on Subsequent Serial DST Results. American Journal of Psychiatry, 1985;142(4):519. 

Robbins DR, Alessi NE. Letter to the Editor. Questions Unanswered in Article on DST in Adolescents. American Journal of Psychiatry,  1984;141(11):1492-3. 

Rogers A. Star Neuroscientist Tom Insel Leaves the Google-Spawned Verily for… A Startup?, May 11, 2017. 

Romanik RL. Letter to the Editor. Use of DST to Indicate Brain Disorders. American Journal of Psychiatry,  1983;140(1):135. 

Rubin RT, Poland RE. Letter to the Editor. Variability in Cortisol Level Assay Methods. Arch Gen Psychiatry, 1984;41(7):724-725. 

Schaffer CB. Letter to the Editor. DST during drug-induced switch from depression to mania American Journal of Psychiatry,  1982;139(8):1081. 

Schneider P, Whiteford H, et al. Letter to the Editor. Dexamethasone suppression test in agoraphobia. American Journal of Psychiatry, 1983;140(9):1259b-60. 

Scull A. Madness in Civilization: A Cultural History of Insanity from the Bible to Freud, from the Madhouse to Modern Medicine. Princeton: Princeton University Press; 2015.  

Shorter E. A History of Psychiatry: From the Era of the Asylum to the Age of Prozac. New York: John Wiley & Sons, Inc; 1997. 

Shorter E, Fink M. Endocrine Psychiatry: Solving the Riddle of Melancholia. Oxford: Oxford University Press; 2010. 

Tandon R, Mazzara C, et al. Letter to the Editor. The DST and outcome in schizophrenia. American Journal of Psychiatry, 1989;146(12):1648a-9. 

Tournikioti K, Dikeos D, Alevizaki M, Michopoulos I, Ferentinos P, Porichi E, Soldatos CR, Douzenis A. Hypothalamus-Pituitary-Adrenal (HPA) Axis parameters and Neurocognitive Evaluation in Patients with Bipolar Disorder. Psychiatriki, 2018;29(3):199-208. 

Uhde TW, Bierer LM, Post RM. Letter to the Editor.  Caffeine-Induced Escape From Dexamethasone Suppression. Arch Gen Psychiatry, 1985;42(7):737-8. 

Wolf JM, Letter to the Editor. Bulimia and the Dexamethasone Suppression Test. American Journal of Psychiatry, 1982;139(11):1523a-4. 

Wood K, Harwood J, Coppen A. Letter to the Editor. Technique and Accuracy of the Dexmethasone Suppression Test. Arch Gen Psychiatry, 1983;40(5):585. 

Whitehorn JC. Education for Uncertainty. Perspect Biol Med, 1963;7:118-23.  

Zis AP. Letter to the Editor. The Dexamethasone Suppression Test in Depressed Children. American Journal of Psychiatry, 1986;143(1):128-9.


April 15, 2021