Charles M. Beasley, Jr and Roy Tamura: What We Know and Do Not Know by Conventional Statistical Standards About Whether a Drug Does or Does Not Cause a Specific Side Effect (Adverse Drug Reaction)
A PostScript
In considering various comments in response to our work and thinking a bit more about its contents, we concluded that our position regarding the labeling of AEs might not be sufficiently clear. This postscript is an effort to remedy this potential fault.
Our position that we want to clarify is whether any adverse event (AE), that might or might not be an adverse drug reaction (ADR) observed in temporal association with administration of a medication where “proof” that AE was an ADR is of a lesser standard than the “proof” required to receive an efficacy claim should ever be included be included in the adverse safety findings section (not a formal section title but a conceptual description applicable to product labeling formats and section titles in all regulatory venues) of product labeling. In US Food and Drug Administration Full Prescribing Information such sections would include Warnings and Precautions, Adverse Reactions and potentially Use in Specific Populations, Clinical Pharmacology, Patient Counseling Information.
Much of our work was intended to make it clear that many AE observed in temporal association with administration of a medication that might be included in product labeling lack “proof” of being ADRs of comparable robustness to the robustness of “proof” required to obtain regulatory approval for an efficacy claim. We went to considerable lengths to demonstrate the matter from a statistical perspective. It might be quite easy to conclude that our position is that without comparable “proof” that an AE is an ADR to that required for an efficacy claim, the AE should not be included in prescribing information (the US Food and Drug Administration term for product labeling).
Our primary intent was to broaden the understanding of the statistical realities that pertain to infrequent-rare AEs that are found in product labels and the judgment as to whether they are ADRs. Our concern is that many persons who might read a product label might believe that any AE included in a product label has been conclusively proven to be an ADR. Robust proof that AEs included in product labeling are ADRs is sometimes lacking and all persons who read product labeling (or derivatives of product labeling that can be found on many internet sites that include subscription medical services) and use product labeling for any purpose should have a clear understanding of what has been robustly proven and what has not.
We unequivocally believe that some AEs that have been observed in temporal association with a medication are of such clinical significance with respect to their actual outcome or potential outcome that they should be included in product labeling, even with only a modest (in some cases very modest) amount of evidence suggesting that they are ADRs for a given drug. We also believe that such labeling should offer the reader guidance regarding the magnitude and quality of evidence supporting the hypothesis that the AE is an ADR if it appears in a medication’s product labeling. Such information is particularly important when that magnitude is minimal, and quality is marginal. The rationale for the inclusion of that AE in labeling should be briefly explained in such cases.
The following example, intended to illustrate our position, deals with a medication marketed in several international regulatory venues for a non-psychiatric indication.
During the medication’s development, AEs were observed that could be grouped clinically on a spectrum of clinical severity and severity of the outcome (analogous to but not necessarily the spectrum of erythema multiforme, Stevens-Johnson Syndrome and toxic epidermal necrolysis). In at least one international regulatory venue, these several AEs are described in product labeling in several paragraphs in sections of the label intended for the description of more clinical serious of AEs that are possible ADRs.
Multiple placebo-controlled trials in several indications had been completed and analyzed before regulatory submission for review and potential approval. These trials extended well beyond the standard length of 6-8 weeks for placebo-controlled, Phase 3 studies with psychiatric disorders (such as Major Depressive Disorder; Schizophrenia; Generalized Anxiety Disorder; Bipolar I Disorder, Manic Episode). More than 3,400 subjects were included in the placebo-controlled phases of these studies. Also, these trials included open-label, active medication only extension phases. At the last time analyses of this database were conducted, one trial had been completed after the set of trials that were reviewed by regulators for potential approval. To be thorough in the analyses, they were conducted comparing incidence differences, incidence ratios, and odds ratios for active medication versus placebo. Multiple non-exact (e.g., Chi-square) and exact plus bootstrap inferential methods were used for the analyses to provide for sensitivity analyses of the observations.
The incidence of combined events with the medication was approximately 1.25% and with placebo approximately 0.75%. In one set of analyses, all AEs in the spectrum were combined, and all studies across all indication were combined. Depending on the inferential method, the exact plus bootstrap methods that provided p-values resulted in p-values in the range of 0.2022 to 0.2264. Those methods that provided only confidence intervals (CIs) resulted in 95% CIs that ranged from (-0.0117 – 0.0017) to (-0.0117 – 0.0030) for comparisons of differences and ranged from (0.30 – 1.29) to (0.32 – 1.70) for comparisons of ratios.
In one of the indications, the difference in incidence and ratios suggested a slightly larger disparity between drug and placebo for observations of these AEs. Within this indication, the exact methods that provided p-values resulted in p-values in the range of 0.3063 to 0.4450. Those methods that provided only confidence intervals (CIs) resulted in 95% CIs that ranged from (-0.0174 – 0.0042) to (-0.0181 – 0.0072) for comparisons of differences and ranged from (0.20 – 1.62) to (0.23 – 3.65) for comparisons of ratios. The larger, non-significant p-values and 95% CIs for this indication with a greater imbalance in incidence than with the combined indications result from the sample sizes for the one indication being smaller than for the combined indications and the disparity between incidences being quite modest.
Most notably, however, for the AE of greatest clinical severity and most easily confirmable as an AE in this continuum, all the exceedingly small number of cases occurred during placebo treatment within the indication with the greatest disparity between medication and placebo.
The one trial not available in data reviewed at submission did not appear to alter the incidence with medication compared to placebo. Finally, these analyses were conducted based on a simple pooling of all available studies. Across the several indications (all indications were approved), there were differences in the incidence of the AE analyzed between medication and placebo. However, the conventional interpretation of the inferential results across indications would be consistent.
By conventional statistical standards, the interpretation of these results would be that observed outcomes were most likely due to chance rather than drug effect. The most severe outcome in the spectrum of outcomes was associated exclusively with placebo. But there was a slight excess incidence of the least severe AE with medication. During the open-label, medication-only extension phases, additional AEs in this continuum were reported with medication.
A non-inferiority analysis (potential for “proving” lack of drug effect) was not conducted with these data because there is no well agreed upon margin of excess with a drug for this AE spectrum or its least clinically significant specific AE that would still allow a conclusion of non-inferiority and a slight excess incidence with medication was observed.
We believe that it would be appropriate to describe this spectrum of AEs in product labeling as events to which the prescriber should be alert, that there was a numerical excess of the least serious manifestation (but still an important and potentially serious AE) with drug and that all of the extremely few cases of the most ominous manifestation were with placebo. Additionally, some quantification of the likelihood of the observed data being due to chance or drug should be included as these observations would customarily be viewed as due to chance.
In the product labeling for this medication in the regulatory venue we are discussing, it was acknowledged that all occurrences of the AE of the most severe outcome in this spectrum were during placebo treatment. In this regulatory venue, all AEs (with or without robust “proof” of being ADRs) are described with the English language label of “Adverse Drug Reactions.” This label for events included in product labeling (be they AEs or AEs with robust proof of being ADRs) is found across multiple regulatory venues. To describe an AE or set of AEs with the magnitude of proof of being ADRs, as in the example above, with the label “Adverse Drug Reaction” potentially conveys implications that are not supported by the data. Some AEs are of such clinical significance that they should be included in product labeling if there is the slightest excess with a drug compared to placebo or other, softer evidence (evidence from sources other than controlled clinical trials) of the potential for these AEs being ADRs. However, when the evidence is weak, and the AE is described out of an abundance of caution, it should be clearly stated that by conventional interpretive standards it cannot be concluded that the AE is an ADR.
Product labeling needs to serve the intent of “first do no harm” and also needs to assist the prescriber in accurately understanding the degree of evidence supporting the potential for doing harm by prescribing a given medication. Failing to treat a patient with a medication based on a potentially inaccurate understanding of the probability of that medication doing harm is itself potentially harmful. This evidence can grow over time.
October 24, 2019