Martin M. Katz: Onset of Clinical Action od Antidepressants

Martin M. Katz vs. Donald F. Klein

Collated

Charles M. Beasley Jr’s Commentary

The requirements to demonstrate with sufficient empirical robustness to gain regulatory approval by the US Food and Drug Administration (FDA) of a claim that one antidepressant is superior in efficacy to other antidepressants and/or that one antidepressant has a more rapid onset of action than other antidepressants has been a topic of considerable commercial interest within pharmaceutical companies. During my almost 28 (1987-2015) years with Eli Lilly and Company, working with fluoxetine, tomoxetine (atomoxetine) and duloxetine, I had substantial opportunity to consider the design of studies that would support such claims. My thinking about some design elements was highly influenced by the opinions of Dr. Paul Leber regarding the requirements to demonstrate superior efficacy that are also relevant to determining the onset of action. Dr. Leber served as the Director of the Neuro-Pharmacological Drug Products Division (1981-1999) and reviewed potential drugs with psychiatric and neurological indications.

Dr. Tom Ban was aware of my professional activities history and a number of my interests when he asked me if I might be interested in writing a concluding commentary on the interaction between Drs. Katz and Klein, with additional contributions by Drs. Morey, Morra and Szabadi, regarding the time of onset of antidepressant activity. The following Comment is the result of Tom’s invitation and encouragement.

My process for developing this Comment was to first write my list of design elements and their alternatives without reading the collated writings of Drs. Katz, Klein and others. I then reviewed the collated writings to determine if they suggested additional design elements. Finally, I reviewed three of the four manuscripts cited by Dr. Katz (Stassen, Delini-Stula and Angst 1992; Katz, Koslow and Frazer 1996; Leon 2001a) discussing the design of studies assessing the onset of antidepressant action. The fourth manuscript (Laska and Siegel 1995) was not available to me. Finally, I reviewed one additional manuscript (Leon, Blier, Culpepper et al. 2001b).

This Comment progresses in four sections:

1. Introductory thoughts

2. Design elements and their alternative options that I have previously considered without advocating for any specific set of alternative options

3. Additional design elements and/or options suggested by the exchange between Drs. Katz and Klein as well as the other contributors

4. Additional design elements and/or options suggested by the literature reviewed

5. A summary of the design that I would consider optimal to address the question. Even in this section, some design elements remain with options where I have no strong belief about the optimal option or sufficient technical training to recommend an optimal option.

Introductory Thoughts

My top-line thought is that the answer to the question: “What is the time of onset of the antidepressant action of a drug approved for the treatment of depression through 2018, and generally recognized as an effective antidepressant?” is highly dependent on the design elements of the study or set of studies intended to answer the question. Furthermore, I hypothesize that there would not be a robust consensus regarding many of the multitude of design elements that would need to be considered in designing a program to address the question. I enumerate below both the design elements that I believe are important considerations and alternatives for those design elements. I acknowledge that the list of elements and the list of alternatives for these elements are unlikely to be complete lists.

Furthermore, the alternatives that I list might not be the best choice for one or more of the design elements. I would not view many of the alternatives below as optimal choices nor do I think that the majority of scientists working in this area would view them as optimal. Some scientists might view these non-optimal alternatives as reasonable and because I have considered them I have listed them below. The purpose of this Comment is to illustrate the complexity of establishing the time of onset of antidepressant action and why I believe that it is difficult to estimate this time.

At the highest level of design, there are two interacting elements. The first of these elements is whether the estimation of onset of action will be based on the data for all subjects treated with an antidepressant or data for only a subset of subjects treated with an antidepressant. The second of these elements is whether the analysis used to estimate the time of onset of action will be based on some difference between the two treatment groups or simply on data for the antidepressant-treatment group. While these design elements and alternatives associated with each will be further detailed below, they are of sufficient importance to be briefly described at the outset.

If the analysis that estimates time of onset of action will be based on data for only a subset of subjects treated with an antidepressant, this subset would logically be a subset showing some substantial improvement at a time after the onset of improvement.

If the analysis that estimates time of onset of action will be based on a difference between two groups, then the groups could both be subsets treated with antidepressant (a subset showing greater improvement and a subset showing less improvement) or the group (or subset) treated with antidepressant and a group (or subset) treated with placebo.

Note that if only data for antidepressant-treated subjects will be used to estimate time to onset of action, either in a comparative or non-comparative analysis, a comparison of antidepressant to placebo might still be conducted. This antidepressant-placebo comparison would be a preliminary analysis to determine that sufficient evidence of efficacy within the study exists to use the antidepressant-treated subjects’ data for estimation of time of onset of action.

As illustrated above with the two design elements (total data or subset data use and noncomparative or comparative analysis use), there are complex and sometimes conditional dependencies between the selected alternatives described below, i.e., some alternatives for some elements make logical sense only if certain alternatives for other elements are selected. I have hopefully organized the elements and alternatives such that these will be reasonably understandable. In some cases, the alternatives for a design element are mutually exclusive, but in some cases, they are not. Mutually exclusive alternatives are hopefully obvious as the alternatives are listed.

The design elements and their alternatives are organized into four high-level categories: 1) non-numerical elements directly pertinent to the primary study objective of determining time to onset of action; 2) numerical and statistical elements; 3) elements pertinent to the antidepressant(s) studied; and 4) elements pertinent to the selection of the population studied.

For these design elements, there is a tension between reducing “noise” impacting potential improvement due to factors other than a pharmacological effect and generalizability of the interpretation of the result to clinical use of antidepressants.

Design Elements and their alternative options that I have previously considered without advocating for any specific set of alternative options

1) The following are non-numerical elements directly pertinent to the primary study objective of determining the time to onset of action. Elements a) through d) below, while non-numerical in that they do not involve the a priori definition of a value important to the study and its analysis, do relate to statistical analysis design matters.

a) The analysis to estimate time to onset of action will be based on either noncomparative analysis of data for antidepressant-treated subjects or comparison of data for two groups.

(1) Noncomparative analyses

(a) If a noncomparative analysis is used, then the antidepressant-treated subjects’ data that are used are the data for

(i) All subjects

(ii) Only a subset of subjects showing substantial improvement

(2) Comparative analyses - see b) below for alternatives of groups to be compared if the analysis is based on a comparison.

b) If the primary analysis of time of onset of action is a comparative analysis, what groups are used for comparison? Alternatives (1) through (4) below suggest comparative pairs. Alternative (5) below provides the option of how to use data for subjects treated with an antidepressant but were highly likely to be non-compliant with treatment and is a consideration different from (1) through (4). Alternative (5) considers whether to further subset antidepressant subjects that would be selected in (1) through (4) below.

(1) All antidepressant-treated subjects compared to all placebo-treated subjects

(2) Subjects who improved (by some criteria to be specified) with antidepressant-treatment compared to all placebo-treated subjects

(3) Subjects who improved (by some criteria to be specified) with antidepressant-treatment compared to placebo-treated subjects who did not improve (by some criteria to be specified)

(4) Subjects who improved (by some criteria to be specified) with antidepressant-treatment compared to antidepressant-treated subjects who did not improve (by some criteria to be specified)

(5) Some subjects in studies are non-compliant with medication taking, and their data add ‘noise’ to defining improvement versus non-improvement with medication. Should plasma concentrations of the antidepressant be assayed (trough values after reaching steady state), and patients identified as non-compliant (by some criteria to be specified) be excluded from the analyses?

(a) If “YES” to (5), immediately above, should the criteria be:

(i) Only subjects with plasma concentrations ‘below quantifiable’ (BLQ)

(ii) Subjects with plasma concentrations considered exceptionally low by some criteria such as below the 10^th or 5^th percentile of values observed within the antidepressant-treatment group or values observed in the development program for the antidepressant(s)

c) The inclusion of a placebo-treatment arm that would allow for an assessment of the efficacy of the antidepressant(s) used in the study. However, if the primary analysis were noncomparative or were of two antidepressant-treatment groups (i.e., improved and not improved), it would not be logically necessary to include a placebo-treatment group. However, with the magnitude of improvement often observed in randomized, placebo-controlled studies of antidepressants, without a placebo-treatment group any magnitude of improvement in the total antidepressant-treatment group (or subset) could not be conclusively attributed to the antidepressant-treatment. It could be important to have evidence that a subset of subjects showing improvement with antidepressant-treatment was likely showing such improvement due to the antidepressant-treatment and not due to other factors (e.g., placebo response). Some might strongly suggest that before a study could be used to determine the time of onset of antidepressant action, it would first be necessary to demonstrate that the antidepressant studied had its intended therapeutic action within the population that was studied (e.g., that the antidepressant proved superior to placebo within the study). Even if the primary analysis does not involve placebo treatment, should a placebo-treatment group be included to establish efficacy within the study population as a preliminary requirement before being able to use the study data to determine the onset of action of the antidepressant (the question is moot if the primary analysis compares antidepressant to placebo)?

(1) The inclusion of a treatment arm that would potentially allow for an assessment of the “lack of assay sensitivity” of the study if the study includes placebo treatment and antidepressant failed to be demonstrated to be superior in efficacy. Such an outcome could result from either excess placebo improvement or lack of efficacy of the antidepressant (or a combination of both these factors). The inclusion of a treatment arm of an alternative antidepressant of a different class than the antidepressant being primarily studied to determine the onset of action might help to address the potential matter of lack of efficacy of the primary antidepressant. If the antidepressant of primary interest failed to separate from placebo but the second did separate, the data for the second could be used in the intended primary analysis. If both antidepressants failed to separate, this result would strongly suggest that the study lacked assay sensitivity due to unexpectedly high improvement in the placebo-treated group. Should two distinct antidepressants (antidepressant classes) be used in the study?

(2) Use of simple or more complex study design to reduce placebo improvement and other potential confounders of the outcome. The classical antidepressant randomized clinical study might consist of a one-week, open-label, placebo-lead-in period with potential subjects who passed cross-sectional screening but improved by some magnitude greater than an a priori criterion on a symptom severity scale discontinued before randomization. The criteria for discontinuation due to placebo improvement would be known to investigators. An even simpler design would not include the open-label placebo-lead-in. This traditional study design would include a blinded treatment period of 6-8 weeks (length is fixed for all subjects at some specific number of weeks). Several alternatives to these simple designs have been employed. The following alternatives do not exhaust potential options.

(a) Simple design – No lead-in or an open-label placebo-lead-in with the exclusion for improvement criterion known to the investigator

(b) Stealth design – A single-blind, 1-week placebo lead-in is followed by a double-blind placebo-lead-in period with randomizations occurring after 1, 2, or 3 weeks of double-blind placebo. While some magnitude of improvement after the week-1 open-label placebo lead-in leads to subject discontinuation, a lesser magnitude of improvement at the time of randomization results in a subject’s data being excluded from analysis but the subject is randomized. The study runs a sufficient length to allow subjects randomized at the last randomization point (week-4 of study) to complete 6-8 weeks of treatment but only 6-8 weeks of treatment data are used for all subjects

(c) Two-stage randomization, referred to as the Sequential Parallel Comparison Design (SPCD) (Fava, Mischoulon, Iosifescu et al. 2012) – Some small portion of subjects are randomized immediately to antidepressant-treatment (no placebo-lead-in or short placebo-lead-in) while the bulk of subjects are randomized immediately to placebo. After 4-6 weeks, the subjects randomized to placebo are either continued on placebo (if they showed a specified magnitude of improvement with placebo) or undergo a second randomization to either continued placebo-treatment or antidepressant-treatment. Investigators could be made aware of or kept blinded to the design. While statistical methods have been developed that allow efficacy to be assessed based on a combination of the two groups of randomized subjects (2 stages) (Doros, Pencian, Rybin et al. 2013), for purposes of studying time to onset of antidepressant action, consider the use of only the second randomization group

d) Scale(s) to be used to define improvement. Multiple scales are available. However, several important, broad conceptual differences are reflected in the choices below.

(1) The scale should only assess the diagnostic signs and symptoms in the latest edition of DSM

(2) The scale should assess signs and symptoms generally recognized as potentially present as a component of depression (e.g., sense of hopelessness)

(3) The scale should assess reverse neurovegatative / atypical depression signs and symptoms (i.e., overeating/weight gain, hypersomnia, reverse diurnal variation, leaden paralysis, rejection sensitivity)

(4) Clinician-administered, self-report or both

e) Site vs. central diagnostics and rating scale completion. Services are available that will complete severity rating instruments via a remote audio-visual (AV) interview; in this case, the interviewers are experts in the administration of the interviews and scoring of the scales, completely blinded to subject status and have no biased interest in site performance. However, AV interviews might result in differing levels of interviewer-subject rapport relative to interview by site staff and use of more structured questioning with AV interviews might fail to resolve ambiguous subject responses where resolution would be important to the validity of scoring.

(1) For non-self-report instruments, used centralized raters

(2) For non-self-report instruments, use site staff raters

f) Length of study. The minimum length of the study will depend primarily on two factors. The first factor is whether improvement on the part of the group of subjects whose data are used to estimate the time of onset of action is required to be of some minimum magnitude (e.g., 50% improvement from baseline, an absolute severity rating suggesting complete symptom resolution) to contribute to the analysis and the magnitude of that required improvement. The second factor is whether subjects who show the requisite magnitude of improvement if this is required, are also required to demonstrate that this improvement is stable and sustained as well as the definition of such stability if it is required. Potential magnitudes of required improvement and demonstration of the stability of improvement are discussed below in 2)f) and 2)g). If the group of subjects used in the primary analysis is required to achieve the complete absence of symptoms (e.g., a HAMD-17 score ≤6), then 12 weeks of treatment would be a reasonable length of time to achieve this magnitude of improvement. If it would be required that subjects demonstrate the stability of such a robust response to contribute data, then an additional 3-4 weeks (or even longer) of observation might be required. Shorter periods of observation would be required for less robust improvement and some primary analyses, such as simply the difference between antidepressant and placebo being statistically significant as defining the onset of action might be observed in a period as short as two weeks or less. If the primary analysis estimating onset of action compares two groups and statistical significance of the observed difference is a criterion used in the estimation of onset of action, then sample size might influence the required length of the study. If the variance in the parameters being inferentially compared to estimate onset of action can be held reasonably constant or increase minimally with an increase in the sample size, and the desired outcome is observed, then larger sample size is likely to result in a more rapidly observed statistically significant difference

(1) <6 weeks

(2) 6 – 10 weeks

(3) >10 – 15 weeks

(4) >15 – 20 weeks

g) The frequency of visits. The frequency of early visits should depend on the hypothesis regarding the time of onset of antidepressant action with visits as frequent as logistically possible, surrounding that hypothesized time and estimates of the variability that would be observed across subjects. Following those visits, the intent of visits would be a determination of subjects that reach the magnitude of improvement, if required, and demonstration of the stability of that improvement, if required, and therefore visits could be spread out.

(1) 3/week for 3 weeks, then weekly for the length of study

(2) 2/week for 3 weeks, then weekly for the length of study

(3) 3/week for 2 weeks, then weekly for the length of study

(4) 2/week for 2 weeks, then weekly for the length of study

(5) 3/week for 3 weeks, then bi-weekly for the length of study

(6) 2/week for 3 weeks, then bi-weekly for the length of study

(7) 3/week for 2 weeks, then bi-weekly for the length of study

(8) 2/week for 2 weeks, then bi-weekly for the length of study

2) Numerical and statistical elements

a) Use of data from a single group or comparison of two groups to estimate the onset of action. The basic concepts relevant to this element were discussed above.

(1) The time that the two groups being compared separate based on some criteria to be defined.

(2) Data from a single group of antidepressant-treated subjects

b) Analytical method. First, there might or might not be two preliminary analyses. The first of these possible preliminary analyses would determine if the study resulted in data that could be used for assessment of onset of action and this would be a comparison of antidepressant to placebo. The second of these possible preliminary analyses would be an analysis to select a subset of the antidepressant-treated subjects that would be used to estimate onset of action and, if this was a comparative analysis, also select a subset of subjects (either placebo-treated or antidepressant-treated) for the comparative analysis used to estimate onset of action. Finally, there would be the primary analysis of the onset of action.

c) Analysis to determine if the study outcome was valid such that data could be used to assess the onset of action. This analysis would be a comparison of antidepressant-treatment to placebo-treatment, and the study would be deemed valid if antidepressant-treatment was superior to placebo-treatment in an overall improvement in depression. Multiple other alternatives are possible, especially an analysis of a categorical outcome such as “response” or “remission” because a subset of subject data based on such a categorical outcome might be selected for assessment of onset of action in either a comparative or noncomparative analysis. However, the conventional analysis of the validity of the study would be an analysis based on a comparison of central tendency change from baseline to endpoint between antidepressant-treatment and placebo-treatment.

(1) Last observation carried forward ANOVA (or ANCOVA)

(2) Observed case at endpoint (early discontinuations excluded) ANOVA (or ANCOVA)

(3) Repeated measures, mixed model ANOVA (or ANCOVA)

d) Some analyses, comparative or noncomparative, of onset of action would use data for only a subset of antidepressant-treated subjects, and if the analysis was comparative, possibly data from only a subset of the comparator-treated (placebo or antidepressant) treated group. The analyses to identify subsets of subjects, if only subsets were to be used for estimating onset of action, would likely be simply the identification of subjects by an a priori definition. The matters of a comparative or non-comparative primary analysis of time to onset of action and potential use of only subsets of subjects are discussed as design elements for consideration in 1)a) and 1)b) above

e) Analysis to estimate the onset of action

(1) Based on a comparison between drug-treatment and placebo-treatment or two antidepressant-treatment subgroups. This analysis could be noninferential and based on an a priori definition of a meaningful difference in achieving a definition improvement. Also, the definition could be based on a continuous change (e.g., a difference of 3.5-points in HAMD-17) or a categorical change (e.g., a difference of X% for subjects showing a 3.5-point decrease in HAMD-17). Most would likely favor comparative analysis that was inferential. In that this analysis would be sequential across observation points, the analysis would be by-visit. If the analysis was based on central tendency change differences, and the analysis was inferential, then there would be no need to define a meaningful magnitude of difference unless it would be included in the requirement to classify the groups being compared as different.

(2) The analysis attempts to determine when patterns of change over time, where change is measured at discrete points, show initial separation. Such a task might be best addressed by a time series method, and I lack the technical expertise to list or discuss methods.

(3) Observed case at sequential visits ANOVA (or ANCOVA) for a comparison of mean changes

(4) Repeated measures, mixed model ANOVA (or ANCOVA) for comparison of mean changes

(5) If the outcome that was the basis for comparison was a categorical outcome, then a survival analysis could be employed (Kaplan-Meier or a Cox Proportional Hazards Model that would allow for adjustments based on factors considered relevant to the comparison).

f) Definition (magnitude) of endpoint improvement (or difference in improvement if comparing groups) used to select a subset showing improvement such that their data can be used to assess the onset of improvement. Consideration of this design element is only relevant if data for only a subset of drug-treated subjects will be used for the estimation of onset of action – subjects with substantial improvement. Presumably, this would be some magnitude of change on some symptom severity scale(s). Although the specific choices below implicitly take the position that a single severity scale is used, multiple scales could be used in combination. Furthermore, multiple criteria based on a single scale could be used. If a scale (or scales) does not have a “0” score for the lowest possible score with each scale item and magnitude of improvement required is based on some percentage of change, scores would need to be adjusted such that each item does have the lowest score of “0.” Without “0” as the lowest possible score, 100% improvement cannot be achieved because the lowest possible score is not “0.” Choices below include multiple criteria from a single scale and multiple criteria from multiple scales. Clearly, as multiple criteria from whatever source are employed, the definition becomes more complex. The definition of required improvement (or difference in improvement at endpoint) is an element that requires consideration only under certain circumstances. Some analyses that would estimate onset of action (e.g., a time when two groups separate by some magnitude) would not necessarily require subjects or groups of subjects to achieve some absolute magnitude of improvement but only require a relative difference.

(1) Percent improvement (cannot be an absolute value because some potential absolute values that could be selected could require achieving negative scores on the symptom severity scale which is logically impossible) in the symptom severity scale score not defining any clinically recognized category of improvement. For example, this could be a 63% improvement in the HAMD-17 score

(2) Percent improvement in the symptom severity scale score that would be generally recognized as indicative of clinically robust improvement with or without necessarily achieving full symptom remission (e.g., “response [50% improvement from baseline]”)

(3) Improvement to an absolute scale score below a specified value that would be generally recognized as clinically meaningful (e.g., “remission [absolute HAMD-17 score ≤6]”)

(4) Multiple criteria from a single scale should be used (e.g., percent improvement and absolute score below a specified value

(5) Multiple criteria from multiple scales (clinician-administered, self-report, both) should be used

g) Persistence of difference and improvement. Persistence of difference and improvement is an optional factor and is relevant to both the use of differences between groups and individual subject data. If the assessment of time of onset of action is based on a group comparison, it can be the case that the required magnitude of difference between groups first occurs at time x but is no longer present at time x+n1 and the required magnitude of difference does not become a consistent difference until time x+n2 and/or the difference is not maintained at endpoint. If individual subject data are used, a given subject can achieve the magnitude of improvement to be included in the subset of subjects demonstrating required improvement at time x but at time x+n1 fail to demonstrate the required magnitude of improvement and only at later times show a consistent pattern of improvement or never show a consistent pattern of improvement. When considering group data, the persistence of difference is influenced by the type the subjects included in the time point (visitwise) analysis: last observation carried forward; observed case; or repeated measures mixed model analysis of variance/covariance. If there are substantial incidences of early discontinuations from one or both groups being compared, observed case analysis is especially prone to result in a reduction in the magnitude of the difference between groups. Several questions arise from this matter, even if group data are compared, and at the endpoint, the two groups do separate (the study is valid). This matter and questions that arise from it regarding analyses underscore the need for a study of sufficient length to avoid ambiguity of the results.

(1) If group data will be used to estimate onset of action show separation at time x, but that separation is lost at time x+n1 and becomes persistent to the endpoint at time x+n2, what time should be used?:

(a) Time x

(b) Time x+n2

(2) If individual subject data will be used to estimate onset of action and an individual shows the required magnitude of improvement at time x, but that magnitude is lost at time x+n1 and becomes persistent to the endpoint at time x+n2, what time should be used?:

(a) Time x

(b) Time x+n2

(3) (1) and (2) above consider the situation where sufficient improvement is achieved and is demonstrated to be persistent but only at a time later than when initially achieved. It is possible that if group data are used, the group that will be used to estimate the onset of action will separate at a late time point in the study (say, next to the last visit) or if individual subject data are used, that an individual will achieve the magnitude of improvement sufficient to be included in the subset of sufficiently improved subjects at a late time point. In such cases, the stability of separation for group data or improvement for individual data cannot be assessed. For group data, should it be required that some minimum amount of time for persistent separation be observed (if the answer to this question is yes and separation does not occur early enough to allow for observation of the required persistence then the study will be a failed study) and if individual data are used, should there be a some minimum amount of time for persistent improvement be observed (if the answer to this question is yes and improvement does not occur early enough to allow for observation of the required persistence then the status of the individual subject will be indeterminant and the subject data excluded).

(a) Yes, require some minimum time of persistent separation or persistent improvement

(b) No, do not require persistence

h) Definition (magnitude) of improvement (or difference in improvement if comparing groups) used to estimate the onset of improvement. Some magnitude of improvement (with/without stability) is required to define the onset of action. If the data for a subset of antidepressant-treated subjects showing substantial improvement is used in a noncomparative analysis or data for two groups are compared where one is required to show substantial improvement, then the magnitude of improvement used to estimate onset of action will presumably be less than the magnitude defining substantial improvement. If a noncomparative analysis is used, this magnitude will be some predefined magnitude (a percentage change, an absolute value of change, a change to below an absolute value). If a comparative analysis is used, it could be based on a predefined difference between groups in the types of changes listed in the previous sentence or simply statistical significance of the difference between groups or some combination. However, the statistical significance of any magnitude of difference would be dependent on sample size, if the variance in the two groups did not change appreciably with changes in sample size. With a larger sample size, a smaller difference between groups would become statistically significant, assuming no increase in variance with increased sample size. Therefore, if a statistically significant difference is the only requirement to estimate the onset of action, the sample size might influence what the study would estimate as the onset of action. Options would exist to allow statistical significance to play a part in estimating the onset of action while constraining the sample size’s influence on this estimate. Such constraints could include the observation of a minimum difference in the absolute value of change, the percentage change, or a change to below an absolute value. The requirement for some minimum effect size would represent an option for such a constraint. I find it difficult to suggest a reasonably limited set of options for defining the onset of action (improvement) whether basing this on a group comparison (with or without requiring a statistically significant difference between groups) or using a noncomparative analysis. Three broad questions can be asked:

(1) If the onset of action is identified by the separation between groups, should that magnitude be required to be statistically significant (the alternative is some a priori defined magnitude)?

(2) If statistical separation is a criterion in a group comparison analysis, should additional criteria be required that would constrain the potential influence of sample size?

(3) If a group difference without the requirement of a significant difference is used or a noncomparative analysis is used, the magnitude of improvement could be based on an absolute value of change, or a percentage change, a change to below an absolute value or a more complex combination of these changes. The onset of action should be defined based on:

(a) Absolute value improvement

(b) Percentage improvement

i) The potential importance of data transformation of baseline and/or post-baseline scale scores. Baseline scales scores on the scale(s) used for determining ultimate improvement status would show a skewed distribution relative to the range of potential scores. Additionally, the magnitude of change over time among the total study subjects, drug- or placebo-treated subjects showing substantial or less than the substantial improvement of potential primary interest could show distinct and non-linear patterns of change. Both baseline distribution and the potential patterns of change over time raise the questions as to whether the data would require any transformation to optimize their use, even if no inferential comparison is made between groups to define the onset of action. The actual possibilities in the data are substantial and what might be done to optimize the use of the data given any of these large numbers of possibilities would require sophisticated statistical planning. The simple question that can be asked is whether the data should be subjected to statistical review before any subsequent analyses (inferential or not) to determine if any data transformations should be applied?

j) Sample size. The sample size is particularly important if a comparative analysis is used to estimate the onset of action and that comparative analysis requires a statistically significant difference. The sample size is also particularly important if a comparative analysis of antidepressant vs. placebo is required to demonstrate that the antidepressant shows statistically superior efficacy to placebo to consider the data valid for use in estimation of time of onset of action. Sample size might be particularly important if the study design requires the selection of a subset of antidepressant-treated subjects that show a robust response. All three of these design considerations can impact sample size considerations. However, the likelihood of statistical significance of the observed difference at the endpoint and at earlier time points is easily increased by increasing sample size, under the assumption that variance in the treatment groups being compared would not increase substantially as sample sizes were increased as noted above in 1)f), 2)h) and 2)j). Several general choices can be made regarding sample size.

(1) If statistical significance and therefore sample size are not relevant in determining study validity (the drug is effective in the study) or time to onset of action:

(a) Constrain sample size to that which would generally result in a significantly positive outcome in a placebo-controlled, randomized study at the current time, e.g., ~100-150 subjects per treatment group

(b) Use a larger sample size

(2) If the statistical significance of some difference is required in determining study validity (the drug is effective in the study) or time to onset of action:

(a) Constrain sample size that which would provide reasonable power for detecting significant differences required as study elements based on objective difference and variance assumptions

(b) Use a larger sample size

k) Subjects included in the analysis considering data loss due to early discontinuations. If the onset of antidepressant action requires several weeks to occur and if the criterion defining sufficient group difference or assignment to a subset showing sufficient improvement or a subset lacking sufficient improvement is robust (e.g., remission), then some proportion of subjects will discontinue the study before they have the potential to contribute to group separation or meet the criteria required to define status as sufficiently improved and in either case demonstrate stability and persistence of improvement. In a classical efficacy study, these subjects’ data would be included in the last observation carried forward or repeated measures mixed model analysis of variance/covariance (or both) analysis. However, for this study, these subjects’ status is uncertain, and they contribute only to noise. Determination of the minimum time in study to contribute meaningful data will depend on whether group data or individual subject data are being used and the criteria used to define sufficient separation and/or sufficient improvement. Independent of these specific times, should data for subjects who discontinue from the study before the time prospectively established as a minimum for data to be meaningful be included or excluded from analyses?

(1) Included

(2) Excluded

3) Elements pertinent to the antidepressant(s) studied.

a) The antidepressant studied. There are a host of acute pharmacological actions for the multiple approved antidepressants. For some of these acute actions, there are multiple antidepressants with those acute actions in common (e.g., inhibition of serotonin uptake without inhibition of uptake of other neurotransmitters or substantial affinity for neurotransmitter receptors). For some of these acute actions, there is a single antidepressant with those acute actions. The extent to which two agents do or do not have the same acute actions would not necessarily be a matter of consensus among all experts. Complicating the question of clinical comparability of two agents with at least somewhat similar acute pharmacological actions is the matter of the pharmacokinetics of agents being grouped. Fluoxetine, for example, has an active metabolite, norfluoxetine, that possibly contributes to therapeutic efficacy with a half-life of ~5 weeks. Therefore, it takes patients a considerable length of time to reach a steady state of active antidepressant(s). This long half-life is further complicated by fluoxetine being a potent inhibitor of cytochrome P4502D6, the primary enzyme responsible for the metabolism of fluoxetine. Therefore, except in persons who are genetic 2D6 slow metabolizers, fluoxetine progressively converts persons to slow metabolizers and progressively lengths its effective half-life. Differing pharmacokinetics might influence time to onset of antidepressant action. For some molecular entities with reasonable evidence of antidepressant activity, reasonable evidence of a more rapid onset of action also exists (e.g., esketamine), arguing against studying multiple antidepressants, even if they share pharmacological activity, and arguing against generalizing conclusions regarding a specific antidepressant to multiple antidepressants not studied. As noted above, even with comparable acute pharmacological actions, differences in pharmacokinetic profiles might differentially influence time to onset of antidepressant action for agents with relatively comparable acute pharmacological actions.

(1) Restrict the test antidepressant to a single agent

(2) Restrict the test antidepressant(s) to members of a class with comparable acute pharmacological actions and similar pharmacokinetic profiles

(3) Restrict the test antidepressant(s) to members of a class with comparable acute pharmacological actions, regardless of pharmacokinetic profiles

(4) Allow inclusion of all orally administered agents

b) Optimization of dose. While several approved antidepressants have what might be thought of as a dose that would be effective in most of the population (e.g., fluoxetine – 20 mg/day), in any given sample of the entire population of subjects with depression, the optimally effective dose might vary. For example, with fluoxetine, in the one study that directly compared placebo, 5 mg/day and 20 mg/day, both 5 mg and 20 mg separated from placebo based on mean change on the primary rating scale and 20 mg was not significantly superior to 5 mg. However, response and remission rates in the 6-week study were numerically greater with 20 mg (suggesting the possibility of greater efficacy with 20 mg). Finally, the rate of discontinuation for adverse events was numerically greater with 5 mg (suggesting no greater safety/tolerance with 5 mg) leading to the conclusion that the 20 mg/day dose is effective and safe (Wernicke, Dunlop, Dornseif et al. 1988). However, the study could be interpreted as suggesting that an optimally effective dose on a large population basis is in the 5-15 mg/day range rather than being 20 mg. Furthermore, as noted above, one subset of all persons with depression might demonstrate a greater magnitude of improvement with a given dose of fluoxetine, say 10 mg/day, in a study that evaluated multiple doses. But, a separate subset in a different randomized study might demonstrate a greater magnitude of improvement with a different dose of fluoxetine, say 40 mg/day. This difference in improvement with different doses across subsets of an entire population raises the question: should the optimal study to assess the onset of action study multiple doses in order to use the onset of action (however defined) in response to the empirically established optimal dose within the study population (dose producing the greatest magnitude improvement)? The alternative to multiple doses would be the use of a single dose, generally believed to be the minimally effective (or probably better, the maximally effective) dose of the antidepressant(s) studied to estimate onset of action. For some antidepressants, dose requires individual titration, and this would complicate the use of these antidepressants in the study.

(1) Include multiple doses, even for a drug considered to have a well-established single widely effective dose

(2) Include only a single dose generally recognized as effective

c) Optimization of titration. As with the design element of dose, it might be important with the dose or doses to be studied to consider titrating to the single or multiple doses. Even with antidepressants that are considered easy to use because they can be initiated at doses that are effective and do not require titration for tolerance reasons, the possibility exists that for well-tolerated drugs it would be possible to have more rapid onset of action using down titrated (to target dose) using an initial loading dose scheme. Should the study design use >1 initiation dose with a titration process for the one or more target doses of the antidepressant(s) studied? Even under the assumption that a single antidepressant is being studied, if multiple target doses of the antidepressant are being studied, then studying multiple titration schemes for each target dose would require many treatment arms and many subjects.

(1) Use multiple titration schedules for each dose of each drug

(2) No titration

4) Elements pertinent to the selection of the population studied.

a) Entry criteria focused on the nature of the depressive episode

(1) Restrict to the first episode

(2) If not restricted to the first episode, restrict to no prior drug treatment

(3) If not restricted to the first episode, restrict to some number of prior episodes

(4) Require prior episodes

(a) With a minimum magnitude of improvement for each prior episode, to be defined

(i) With a specified length of that improvement between episodes, to be defined

(ii) Without a specified length of that improvement

(b) Without the requirement for some minimum magnitude of improvement between symptomatic exacerbations (episodes)

(5) Restrict to some age range for onset of the first depressive episode (e.g., 17 < age of onset of 1^st episode <60)

(6) Restrict the length of the current episode to less than some number of months/years (e.g., <2 years)

(a) Restrict the length of all prior episodes (if allowed) to less than some number of months/years (e.g., <2 years)

(7) Use current edition DSM criteria as diagnosis entry criteria (alternatives are possible)

(8) Exclude certain conceptual subtypes (not necessarily recognized by the latest edition of DSM)

(a) Atypical

(b) Melancholic/endogenous

(d) Seasonal

(e) Psychotic

(f) All depression not melancholic/endogenous

(9) Set a minimum score for the primary symptom severity assessment instrument(s)

(10) Set a maximum score for the primary symptom severity rating instrument(s)

(11) Exclude a broad set of psychiatric comorbidities (e.g., all diagnoses other than Generalized Anxiety Disorder would be exclusionary, current or past)

(12) Exclude only a narrow set of psychiatric comorbidities (e.g., only Substance Use Disorders of at least moderate severity and Psychotic Disorders [other than Psychotic Depression], current or past)

b) Use of biomarkers to exclude subjects that might show greater improvement with placebo. Although it could be debated as to whether such biomarkers exist (perhaps the one with some potential being a shortened REM latency) if there was credible evidence for the utility of a biomarker(s), should it/they be used to screen subjects?

c) Genetic testing to exclude subjects with a greater likelihood of being intolerant of the antidepressant(s) and/or who would metabolize the antidepressant(s) differently than much of the population. As with biomarkers to exclude subjects prone to the placebo response, the use of genetic testing to determine subjects more likely to respond and those more likely to demonstrate intolerable side effects can be debated, perhaps more so with efficacy prediction. However, if there was some degree of confidence in the use of such genetic testing for either or both purposes, should it be used to screen subjects?

d) Age criteria

(1) Restrict to adults (>17 y/o)

(2) Restrict to adults less than some maximum age

(3) No restriction on age

e) Exclusionary non-psychiatric medical comorbidities. There is a range of possibilities from any to none. For many studies evaluating the efficacy of an investigational psychotropic medication, the only such comorbidities that would be exclusionary would be those that were deemed so unstable or life-threatening that a potential subject would be at substantial risk of not being able to complete the study. However, for this study, any condition that might adversely impact either speed of improvement or achieving improvement would add confounding noise to the study. If exclusionary criteria were intended to exclude all comorbidities that might adversely impact improvement and/or speed of improvement, there could be great debate about what medical conditions should be included in the exclusionary list. Three broad categories of excluded comorbidities are described below.

(1) Only medical comorbidities that in the clinical opinion of the investigator would more likely than not prevent the potential subject from completing the study, either due to death, hospitalization, inability to continue receiving study medication or need for treatment with an exclusionary medication

(2) In addition to (1) above, only medical comorbidities that in the clinical opinion of the investigator and based on consensus in medicine could result in depressive symptoms, prevent improvement or alter time to onset of improvement when treated with an otherwise effective antidepressant (e.g., overt hypothyroidism, history of a stroke)

(3) In addition to (1) above, any medical comorbidities that in the clinical opinion of the investigator and based on some medical opinion could result in depressive symptoms or prevent improvement when treated with an otherwise effective antidepressant (e.g., subclinical hypothyroidism, hypogonadism in potential subjects of either sex)

f) Exclusionary concomitant psychotropic medications (including OTCs and supplements). Any psychotropic medication might alter the actual response to an antidepressant or might influence a subject’s responses during the interview process used to complete the symptom severity scale(s). Any such psychotropic substance where the subjective experience of the substance for a substantial proportion of subjects would be positive/euphoric or conversely negative/dysphoric might be particularly likely to influence a subject’s response to the antidepressant or responses during an interview. While avoiding any psychotropic substance might be desirable, the study might be impractical without allowing some use of an anxiolytic/sedative-hypnotic. It is assumed that it would be necessary to exclude any psychotropic medication that, due to either pharmacodynamic or pharmacokinetic interaction with the antidepressant, would put a subject at an increased safety risk.

1) Allow no other substance with CNS penetrance

(2) Allow no other substance with known CNS activity

(3) Allow only limited use of a single, relatively short-acting anxiolytic/sedative-hypnotic with no active metabolites that also does not influence the pharmacokinetics of the antidepressant(s) used

g) Exclusionary non-psychotropic medications (including OTCs and supplements). If a non-psychotropic medication has CNS activity (e.g., lipophilic beta-blockers), the same considerations might apply as in f) above. It is assumed that it would be necessary to exclude any non-psychotropic medication that due to either pharmacodynamic or pharmacokinetic interaction with the antidepressant would put a subject at an increased safety risk.

(1) Allow no other substance with CNS penetrance

(2) Allow no other substance with known CNS activity

(3) Allow all other substances except those generally considered to have a modest to a high likelihood of influencing mood state (e.g., lipophilic beta-blockers, reserpine, high dose, systemic corticosteroid) or influence the pharmacokinetics of the antidepressant(s)

(4) Allow all other substances except those generally considered to have a high likelihood of influencing mood state (e.g., reserpine, high dose, systemic corticosteroids) or influence the pharmacokinetics of the antidepressant(s).

Additional design elements and/or options suggested by the exchange between Drs. Katz and Klein as well as the other contributors

1) Inferential analytical method for estimating the onset of action: A determination of time in the study at which some magnitude of difference in some parameter of improvement distinguishes between antidepressant-treated subjects that meet criteria for improvement at endpoint and those subjects receiving antidepressant treatment that are in the comparative subset (all other antidepressant-treated subjects or only those showing substantially less improvement than those meeting criteria).

Additional design elements and/or options suggested by the literature reviewed

No additional elements or alternatives within elements not covered above.

Summary of a suggested optimal design

The following design is rather comparable to what was suggested by Leon (2001) although that proposed design was primarily intended to be used to determine if one antidepressant has a faster onset of action than another antidepressant and this design is intended primarily for the determination of the onset of action of a single antidepressant. For some design and analysis elements, I suggest more detail. As will be noted, I am uncertain about what to recommend for very important design elements, including what groups are compared to define the onset of action. I remain torn between subjects demonstrating remission on antidepressant compared to subjects showing little improvement on antidepressant versus subjects demonstrating remission on antidepressant compared to subjects showing little improvement on placebo.

1) The study would be for a single drug and all drugs of interest would require the same study. If the study were intended to compare the time of onset of action between two drugs, then each drug would need its treatment arm as described below for a single antidepressant.

2) Placebo control is required.

3) If an antidepressant is studied for which there is not good evidence that a single dose has reasonable efficacy for a substantial proportion of the general population with depression, then any period of allowed titration is added to the predetermined time of study at a fixed dose.

4) Not necessary to study multiple, fixed doses (although ideal to do so).

5) Not necessary to study multiple initiations of treatment titration schemes.

6) A positive control would not be used.

7) The number of sites should be kept to the practical minimum.

8) The study would employee either an SPCD design (Fava, Mischoulon, Iosifescu, et al. 2012), with investigative site staff kept blind to the employment of the SPCD design. However, only the data from Stage 2 would be used, or a stealth design would be used with the two alternative designs being used to reduce placebo response.

9) The study would be 16 weeks in length from the point of the Stage 2 randomization in an SPCD design or the third blinded randomization in a stealth design. This length is suggested because the analysis of time of onset of action will be based on a comparison involving subjects treated with the antidepressant who achieve full remission that will be defined below. Three studies (Karp, Scott, Houck et al. 2005; Frank, Kupfer, Perel et al. 1990; Reimherr, Amsterdam, Qutikin et al. 1998) suggest that it can require 14 weeks or more for most of the population that will achieve this magnitude of improvement in association with antidepressant treatment to do so.

10) Visits would occur at days 3, 5 and 7 following the previous visit beginning with the first visit through 3 weeks after the Stage 2 randomization in an SPCD design and from the first blinded randomization, through 3 weeks following the third blinded randomization in a stealth design. In the SPCD design, there would be 3 visits per week from week 1 throughout Stage 1, generally 4 to 6 weeks in length, and then for the first 3 weeks of Stage 2. In the stealth design, there would be 3 visits per week from week 2 through week 7. For both designs after 3 visits per week, there would be 3 weeks of 2 visits per week. For both designs, there would then be 1 visit per week (every 7 days) for 2 weeks (cumulative total of at least 8 weeks of antidepressant treatment for subjects randomized to an antidepressant in Stage 2 of the SPCD design or at the third randomization in the stealth design). At that point visit frequency would decrease to 1 visit every 2 weeks until 10 weeks of antidepressant treatment in Stage 2 of the SPCD design or until 10 weeks of antidepressant treatment from the third randomization in the stealth design. Visits would then change to 1 visit per week until the end of the study. The increased frequency of visits at the end of the study is to observe the weekly stability of remission from the point at which most subjects are likely to achieve remission through the end of the study.

11) Symptom severity rating will be conducted by a third-party, expert rating service completely blinded to study design and subject visit number.

12) A single severity rating instrument will be used that rates all current DSM diagnostic symptoms of Major Depressive Disorder and that also includes the rating of items generally considered relevant to depression severity such as the HAMD-28 (Reimherr, Amsterdam, Qutikin et al. 1998). I would suggest the use of the HAMD-17* which is the traditional 17-item HAMD using either all the positive or all the reverse neurovegetative items at all visits depending on with score resulted in the highest score at baseline and using the positive items if scores were equivalent at bassline. As an alternative, all 28 items could be used.

13) Plasma concentrations of antidepressant and all active metabolites would be collected in a blinded fashion at all visits after the latest visit at which all subjects would be expected to achieve steady-state plasma concentrations. Data would be excluded from any use for all antidepressant-treated subjects where plasma concentration of the antidepressant was BLQ at any visit and where the combined concentration of antidepressant and active metabolite was below the 10^th percentile (based on all values observed in the study) at >20% of visits where values were measured.

14) For the study to be considered for the primary intent of determining the onset of antidepressant action (for the specific antidepressant studied), the antidepressant would be required to be associated with statistically significantly greater improvement (mean change from baseline) than placebo. The antidepressant to placebo comparison would be based on using an MMRM ANCOVA model with baseline score as a covariate, subject as a random effect, treatment, visit, and site as fixed effects. I would not require a statistically significant difference between the proportion of subjects achieving remission during antidepressant treatment compared to the proportion achieving remission during placebo treatment, but lack of a numerically substantial difference would give me concern. If the study intended to compare the onset of action between two drugs, then each drug would be required to be superior to placebo with appropriate adjustment for the multiple comparisons to placebo.

15) I am uncertain about what comparison should be made to estimate the time to onset of action. While I believe it should be a comparison between subjects who achieved remission on the antidepressant, I am uncertain about the choice of the comparison group between the full set or a subset of the placebo-treated subject or a subset of the antidepressant-treated subjects. I am inclined to favor the use of an alternative subset of antidepressant-treated subjects but not the entire subset remaining after selecting the subset achieving remission. I would favor the subset of subjects treated with antidepressants whose members did not improve by ≥50% (traditional definition of Responder) for comparison to the subjects showing remission.

16) I would define remission as a HAMD-17* ≤6 maintained during weeks 14-16 (last 3 weeks) of the study.

17) Concerning the statistical method used to compare the two groups to estimate the time of onset of action, I would require this to be an inferential analysis with the difference between groups that would estimate onset of action is a difference that was statistically significant by the conventional definition of α=0.05. The decision as to the specific statistical method for that inferential analysis that would be optimal is beyond my level of statistical training to recommend. It would determine the time at which the two groups being compared separated significantly in the magnitude of improvement based either on a difference in mean improvement, the difference in the proportions of subjects reaching a predefined magnitude of improvement, or the ratio of the proportions of subjects reaching a predefined magnitude of improvement. If the analysis were to be based on the proportions (both for the difference and ratio) of subjects achieving some magnitude of improvement, then how to define that magnitude of improvement is another design element about which I have no specific idea about what to recommend. This magnitude of improvement could be a specific value defined prospectively or a method used to define this magnitude based on observed study data could be prospectively developed.

18) It would be desirable to include not only the criterion of statistical separation but also the criterion of minimal effect size in estimating the onset of antidepressant action. A minimal effect size serves to prevent large sample sizes from having an undesirable influence on the estimation of onset of action. Determination of this effect size and its method of computation (depending on whether the comparison of groups was based on the difference in mean change, the difference in proportions, or the ratio of proportions) would require consensus building and I have no a priori suggestions. However, if the dual criterion of significant difference and minimum effect size was required to consider the study capable of defining the onset of action, a very expensive study could fail due to inadequate effect size. I do not suggest requiring a minimum effect size for the difference in efficacy between antidepressant and placebo in testing whether the antidepressant has demonstrated efficacy within the total randomized study population, but this could be considered as a requirement.

19) The sample sizes of interest are the number of subjects who remit on an antidepressant and the number who fail to respond to antidepressant if the primary comparison is the one I suggested above. Based on the Karp, Scott, Houck et al. (2005), as well as Frank, Kupfer, Perel et al. (1990) and Reimherr, Amsterdam, Qutikin et al. (1998) data, it is probably reasonable to expect a remission proportion associated with an antidepressant to be ~50%; perhaps ~35% will be non-responders. As discussed above, there is a substantial probability that as the sample sizes for the primary analysis would increase, the sensitivity of the analysis to smaller differences in mean changes, the differences in proportion, or the ratio of proportions would increase and therefore have a major impact on the estimation of onset of action. I would suggest that the sample size of subjects achieving remission be in the range of 125 to 150 subjects, no fewer and no greater. This sample size of subjects achieving remission would require 250 to 300 subjects who met criteria for inclusion in the analysis (all subjects assigned to an antidepressant in Stage 2 of the SPCD design or all subjects not improving by ≥25% from baseline to randomization in the stealth design). I have no robust empirical basis for suggesting the number of subjects beginning Stage 1 of the SPCD design or randomized at the first randomization point in the stealth design but believe it likely to require at least a doubling assigned to antidepressant treatment. Therefore, the total subject numbers would be in the range of 500 to 600 subjects for antidepressant treatment and placebo treatment (1000 to 1200 total subjects randomized in either design at the first randomization). The sample sizes for the primary comparison suggested (antidepressant remitters vs. antidepressant non-responders) will not be balanced as either design is likely to achieve ~50% remission and ~35% non-response with the antidepressant. The proportion of antidepressant-treated subjects estimated to achieve remission might be overly optimistic based on treatment Level 1 results (Karp, Scott, Houck et al. 2005), and if one believed that the proportion of remissions in this hypothetical study was to be more in line with STAR*D results (Sinyor, Schaffer and Levitt 2010), then the initial sample sizes would need to be increased accordingly.

20) Entry inclusion criteria:

a. Recurrent Major Depressive Episode with complete remission of all prior episodes and no prior episode meeting criteria for chronic. All prior remissions could be with or without treatment. The requirements of recurrence, remission, and no history of a chronic episode are intended to increase the confidence in the diagnosis MDD and the absence of potential confounding psychopathology or psychosocial circumstances. A Bipolar Disorder II diagnosis would not be exclusionary.

b. HAMD-17* ≥20 at screening visit and the visit of the first randomization.

c. Age: 20-62

21) Entry exclusion criteria

a. Current MDD episode psychotic

b. Suicidal ideation and intent are to commit suicide of magnitudes such that urgent treatment with ECT (or ketamine) would be the treatment of choice. Would not exclude based on the need for in-patient treatment.

c. Any past or current Axis I psychiatric diagnosis. A Bipolar Disorder II diagnosis would not be exclusionary.

d. Any current non-psychiatric medical condition that would likely necessitate hospitalization during the study, likely make it difficult for a subject to complete the study, require treatments that would be exclusionary, or where there is reasonable medical opinion consensus that the condition could contribute to depressive symptoms or alter the effects of an antidepressant on the depressive symptoms.

e. Ongoing treatment with or potential need for treatment with a medication that would impact the pharmacokinetics of the antidepressant or where there is reasonable medical opinion consensus that the medication could contribute to depressive symptoms or alter the effects of an antidepressant on the depressive symptoms.

What I hope that I have done is to build a convincing case that: 1) any estimation of onset of action would be highly dependent on the experimental means used to make that estimation; and 2) there are many, many things to consider and about which to reach agreement before such a study could be undertaken that would be convincing to the substantial majority of the interested scientific community. Some might consider my lengthy list of design elements and the lists of alternatives for each design element to stretch credibility and serve to unrealistically inflate the complexity of designing a quality study to estimate the onset of action on an antidepressant. I acknowledge that some of the alternatives for various design elements would likely make for a poor quality study but that there are reasons to give some considerations to these alternatives. For example, basing the estimation of the onset of action on non-comparative data from a subset of antidepressant-treated subjects would, I think to make for poor quality work. However, my inclusion of this alternative was driven by my belief that reliance on the simplistic statistically significant separation between two comparative groups could be manipulated by altering sample sizes. With regards to the design elements themselves, I believe that they would all merit careful consideration and that achieving robust consensus on the best alternative for some of these elements might be difficult to achieve.

References:

Doros G, Pencia M, Rybin A, Meisner A, Fava M. A repeated measures model for analysis of continuous outcome in sequential parallel comparison design studies. Statis Med 2013; 2767-2789.

Fava M, Mischoulon D, Iosifescu D, Witte J, Pencia M, Flynn M, Harper L, Levy M, Rickles K, Pollack M. A double-blind, placebo-controlled study of aripiprazole adjunctive to antidepressant therapy among depressed outpatients with inadequate response to prior antidepressant therapy (ADAPT-A Study). Psychother Psychosom 2012; 81:87-97.

Frank E, Kupfer DJ, Perel JM, Cornes C, Jarrett DB, Mallinger AG, Thase ME, McEachran AB, Grochocinski VJ. Three-year outcomes for maintenance therapies in recurrent depression. Arch Gen Psychiatry 1990; 47:1093-1099.

Karp JF, Scott J, Houck P, Reynolds CF III, Kupfer DJ, Frank E. Pain predicts longer time to remission during treatment of recurrent depression. J Clin Psychiatry 2005; 66:591-597

Katz MM, Koslow SH, Frazer A. Onset of antidepressant activity: reexamining the structure of depression and multiple action of drugs. Depression and Anxiety 1996/1997; 4:257-267.

Laska EM, Siegel C. Characterizing onset in psychopharmacological clinical trials. Psychopharmacology Bull 1995; 31: 29-35.

Leon AC. Measuring onset of antidepressant action in clinical trials: an overview of definitions and methodology. J Clin Psychiatry 2001a; 62:[suppl 4]:12-16.

Leon AC, Blier P, Culpepper C, Gorman JM, Hirschfeld RMA, Nierenberg AA, Roose SP, Rosenbaum JF, Stahl SM, Trivedi MH. An ideal trial to test the differential onset of an antidepressant effect. J Clin Psychiatry 2001b; 62:[suppl 4]:34-36.

Reimherr F, Amsterdam J, Qutikin F, Rosenbaum J, Fava M, Zajecka J, Beasley C, Michelson D, Roback P, Sundell K. Optimal length of continuation therapy in depression: a prospective assessment during long-term fluoxetine treatment. Am J Psychiatry 1998; 155:1247-1253.

Sinyor M, Schaffer A, Levitt A. The sequenced treatment alternatives to relieve depression (STAR*D): a review. Can J Psychiatry 2010; 55:126–135.

Stassen HH, Delini-Stula A, Angst J. Time course of improvement under antidepressant treatment: a survival-analytical approach. Eur Neuropsychopharmacol. 1993 Jun;3(2):127-35.

Wernicke JF, Dunlop SR, Dornseif, BE, Bosomworth JC, Humbert M. Low-dose fluoxetine therapy for depression. Psychopharmacol Bull 1988; 24:183-188.

July 18, 2019