## Response to Martin M. Katz’s response

**Onset of Antidepressant Effect**

**Donald F. Klein’s response to Martin M. Katz’s reply**

Katz points out limitations of his 2011 study. "… an attempt to demonstrate that prediction at two weeks is possible and that adopting the approach we outlined in that paper would be useful in any effort to shorten the length of clinical trials. This small study, as noted in earlier papers, should be followed by a prospective study with a large and representative patient sample to establish the validity….. that a two-week trial would be sufficient to determine whether a new, putative antidepressant will be efficacious…..mainly refers to findings from independent and large sample multisite studies (e.g., Stassen et al 1996; Szegedi et al 2009; Katz et al 2004) that are relatively definitive in establishing that actions of efficacious drugs begin as early as 1 week".

This does not follow. What is the purpose of the small study that requires a large follow up if it has already been established by large studies?

Are these large studies? Unfortunately, only the insufficient abstracts of Szegedi could be retrieved. Katz (2004) had an N of 82, 12 dropped out after randomization. "… it was decided *a priori* that patients who did not complete at least 3 weeks of treatment would not generate useful data”. The remaining 70 were distributed into 3 groups, paroxetine, desmethylimipramine and placebo. Of the 29 in the DMI group, 3 dropped out by 2 weeks, of the 28 in the paroxetine group 4 dropped out by 3 weeks. In any case this is not a large study groups, paroxetine, desmethylimipramine and placebo

There is no simple listing of behavioral measures. I count 26 but this may be a substantial undercount. Similarly the number,timing, and evaluator of the behavioral measures is not simply tabulated ,although very frequent. This affords an ample opportunity . The number of biochemical assesments is not stipulated. The analyses use Last Observation Carried Forward, a questionable practice. There is no mention of correction for multiple analyses. Therefore, the findings are not clearly distinguished from mere sampling variation.

The point of the study is detection of therapeutic onset. It was defined here as, "The 'median time of onset' was defined as the earliest time point at which 50% of patients changed a minimum of 20% on a given behavioral construct, a change that was then sustained throughout the course of treatment".

This arbitrary measure does not derive from some statistical model of onset. Note that it yields a single index for each group since it depends on 50% of the group reaching the arbitrary 20% decrement that is sustained. The time of onset is likely to be variable among subjects. A definition of onset that can be individually applied would give some idea of the spread of onset times. Stassen (below) arbitrarily develops such a measure. I could not follow the analysis described for "Analysis of onset of 'therapeutic' action within each treatment group". It did not seem to yield an onset time, at least to me .Clarification may be helpful.

The analyses in Katz' latest INHN submission, used within drug comparisons of binary status measures. These are not demonstrations of drug effect, since they lack a placebo comparison.

** In the section ”Prediction of outcome,** Logistic regression (Hosmer and Lemeshow, 1989 ) was used to develop an algorithm for estimating the probability that a patient would recover by 6 weeks of treatment based on values on the behavioral constructs after 1 or 2 weeks of treatment. Different models of individual prediction were tested for each drug independently".

"We did not test models including variables that did not discriminate between recovered and non-recovered subjects at any of the early time points.....The model and threshold that provided the best combination of sensitivity and specificity was then selected as the prediction model for recovery." This is exploratory work. That is justified but it should not be presented as definitive.

All of these shifting procedures plus the lack of correction for multiplicity of analyses of the same data set, leads to an analogy with the Texas Sharpshooter who carefully draws a target around each scattered bullet hole.

No doubt Katz was attempting to solve difficult problems by exploratory work.

Stassen (1996) states," The sample consisted of moderately depressed male (n = 154) and female (n = 275) patients (aged 17-73), diagnosed according to DSM-III criteria for major depression. Of these, 120 were treated with oxaprotiline, 120 with amitiptyline and 189 with placebo. Efficacy criteria were Hamilton Depression (HAMD) and Anxiety (HAMA) and Zung Self-Rating scales. Up to eight ratings over a period of 40 days were available for analysis... the appropriate determination (is) of the time points at which the medication begins to clearly show a therapeutic effect in each individual patient..... A solution to this problem is to define onset of improvement in each individual case on the basis of significantly reduced psychopathology scores relative to the corresponding baseline, that is, a reduction of - *d%* of baseline..…Lacking appropriate a priori knowledge we unified as a tentative step all 429 cases (minus 17 cases due to insufficient data) to one single sample in order to get an estimate of the 'natural' variability of HAMD and HAMA scores over time …It turned out that a relative change of 15-25% with respect to the corresponding initial values represents a suitable threshold for a reasonable definition of onset of improvement. … we decided in favor of the 20% threshold.”

Clearly the problem of therapeutic onset within individuals has not been solved. I could not find the demonstration of drug effects at 1 or 2 weeks that Katz refers to. The relevance of this paper to Marty's hypotheses is unclear.

Now to address the algebra! But why should we? I attempted to reconstruct the basic 2X2 tables relating early response to later response from the proffered indices referring to true positives ,true negatives, overall correctness and sample size. Perhaps it was more complicated than I realized since the definitions given of the indices was complex and easily misunderstood. But Marty has that data. Surely the most easily understood, trenchant refutation would be the direct comparison of the actual 2X2 predictions with my reconstructions.

I also regret this data was not presented for they test the validity of Marty’s hypotheses. He believes that important practical implications, such as shortening the length of clinical trials, follows. Certainly, presenting these 2X2’s has more important implications than just refuting my analysis. However, fortunately, he still has that presentation opportunity

Donald F. Klein

July 16, 2015