Why are so many epidemiology associations inflated or wrong? does poorly conducted animal research suggest implausible hypotheses?
Why Are So Many Epidemiology Associations Inflated or Wrong? DoesPoorly Conducted Animal Research Suggest Implausible Hypotheses?
There is growing concern among epidemiologists that most discovered associations are either inflated orfalse. The reasons for this concern have focused on methodological issues in the conduct and publicationof epidemiologic research. This commentary suggests that another reason for discrepant findings may bethat animal research is producing implausible hypotheses. Many animal studies are methodologicallyweak, and the animal literature is not systematically reviewed and synthesized. Moreover, most bodies ofanimal literature may be so heterogeneous that they can be used selectively to support the plausibility ofalmost any epidemiology study result. Epidemiologists themselves also do not consistently conduct system-atic reviews of bodies of biological evidence which might point to sources of bias in an evidence base. Animal research will likely continue to provide the biological basis for epidemiological investigation,but substantial improvement is needed in how it is conducted and synthesized to improve the predictabilityof animal studies for the human condition. Ann Epidemiol 2009;19:220–224.
KEY WORDS: Animal Studies, Bias, Epidemiology Methods, Randomized Trials, Research Synthesis,Systematic Reviews.
many of the assumptions underlying the validity of theepidemiologic literature. In a thoughtful response, Willett
‘‘Experiments should be carried out on the human
suggests that ‘‘those who practice epidemiology under-
body.the quality of the medicine might mean that
stand that the primary research mode is still the develop-
it would affect the human body differently from the
ment of testable hypotheses based on sound biological
reasoning’’ (p. 655). This raises the question: How soundis the biological evidence from which hypotheses tested in
epidemiology are derived? Are the vulnerabilities observed
Why epidemiology has so much difficulty documenting
in epidemiologic investigations also found in the biology
valid and replicable associations has been widely discussed
research base? If epidemiologists are testing implausible
for several years . Recent observations from large
hypotheses derived from a poorly validated body of animal
randomized controlled trials (RCTs) have impressively
research, which is further amplified by publication bias,
refuted some of epidemiology’s most long-standing conclu-
this may be another reason why discovered associations
sions, often made from very large and highly publicized
observational studies, and the question persists as to why
so much observational epidemiology is not replicated by
In a recent commentary and discussion , Ioanni-
The concept that animal research, particularly that relating
dis suggested several reasons why most discovered true asso-
to diet, pharmaceuticals, and environmental agents, may be
ciations are inflated, including the use of thresholds of
a poor predictor of human experience is not new. A thou-
statistical significance, especially in underpowered studies;
sand years ago, Ibn Sina commented on the need to study
the many data manipulations used in variable construction
humans rather than animals and Alexander Pope’s
and statistical analysis; and biases in the publication process
dictum ‘‘The proper study of mankind is man’’ is well known
Elsewhere, Ioannidis has used similar argument to
and widely cited . Pharmacologists in particular have
suggest that most published research findings are false. These
long recognized the difficulties inherent in extrapolating
observations are important and provocative and challenge
drug data from animals to man . Given the largenumber of animal studies conducted, it would be expectedthat some animal experiments do predict some human reac-tions; for example, penicillin was observed to protect both
From the Schools of Public Health and Medicine, Yale University, New
mice and humans from Staphylococcus infections , and
Received October 7, 2008; accepted November 29, 2008.
Accutane (isotretinoin) causes birth defects in rabbits,
360 Park Avenue South, New York, NY 10010
WHY ARE SO MANY EPIDEMIOLOGY ASSOCIATIONS WRONG?
monkeys, and humans (but not in mice and rats)
to humans , a concern increasingly being made in other
However, corticosteroids are widely teratogenic in animals
fields of drug discovery . Some key problems are
but not in humans whereas thalidomide is not
a teratogen in many animal species but it is in humans
Disparate animal species and strains, with a variety of
. Recent experience in a phase 1 study of the mono-
metabolic pathways and drug metabolites, leading to varia-
clonal antibody TGN 1412 resulted in life-threatening
morbidity in all six healthy volunteers, reflecting inadequate
Different models for inducing illness or injury with
prediction, even in non-human primates, of the human
varying similarity to the human condition
Variations in drug dosing schedules and regimen of uncer-
It is has been known for some time that many animal
experiments are poorly designed, conducted, and analyzed
Variability in animals for study, methods of randomiza-
and that this may be one reason why they often do not trans-
tion, choice of comparison therapy (none, placebo, vehicle)
late into replication in human therapeutic trials or
Small experimental groups with inadequate power, simple
into cancer chemoprevention. Some human carcinogens
statistical analysis that does not account for confounding,
were predicted in animal studies (aflatoxins, benzene, dieth-
and failure to follow intention to treat principles
ylstilbestrol, vinyl chloride), but other agents were positive
Nuances in laboratory technique that may influence
in animal studies but not in human studies (acrylamide, alar,
results (e.g., methods for blinding investigators) may be
cyclamate, red dye #2, saccharin) It has only
recently been observed that most of the animal literature
Selection of outcome measures, may be disease surrogates
is also inadequately reviewed and summarized and this too
or precursors, of uncertain relevance to the human clinical
may contribute to failure to replicate animal research in hu-
mans. In one survey, only 1 in 10,000 MEDLINE records of
Length of follow up varies and may not correspond to
animal studies were tagged as being meta-analyses versus 1
in 1,000 for human research However, this researchoften provides the rationale for hypotheses studied by epide-
The quality of in-vitro research and review, much of
miologists. In recent reports, the poor quality of research
which is closely tied to animal experimentation, has been
synthesis was documented by a comprehensive search of
even less formally studied. In one rare study of how in-vitro
MEDLINE, which found only 25 systematic reviews of
research is reviewed, a total of only 45 systematic reviews of
animal research despite there being several million indi-
vidual studies in citation databases Other recent
The poor quality of much animal and in-vitro research
studies similarly found only 30 and 57 systematic
poses substantial difficulty for epidemiologists who use ‘‘bio-
reviews of any type of animal research. One recent study of
logic plausibility’’ as one of their guidelines for inferring
the health effects associated with low-dose Bisphenol A in
causality . A discussion of biological mechanisms,
human urine conflicts with the systematic review of
usually relying on animal research, is quite common in
the rodent studies that found little evidence for any health
reports of epidemiological association. However, it seems
that animal research on almost any topic of epidemiologic
Systematic review of animal studies is well advanced in
interest is so heterogeneous and inadequately synthesized
the field of stroke research , an area where almost no
that it is possible to selectively assemble a body of evidence
new human therapies have been developed despite decades
from the animal and in-vitro studies that support almost any
of experimental and human study. In one systematic review
of FK506 used for experimental stroke, in which 29 separatestudies were found in the literature, only one study blindedinvestigators to the intervention and two blinded them forthe outcome assessment; none met all 10 quality criteria es-
tablished by the reviewers (one study met no criteria and the
In contrast to large epidemiological projects, the smaller
highest score was 7). Meta-analysis of the animal FK506
scale of animal experiments, often from individual laborato-
studies demonstrated a strong trend for the methodologi-
ries, would suggest greater opportunity for publication bias.
cally weakest studies to show the strongest protective effects
Publication bias has been well documented in the random-
and the methodologically strongest studies to show no (or
ized trial literature and has been attributed to a range of
biases: authors being more likely to write up positive results
The limited number of systematic reviews of the animal
and to send their manuscripts reporting positive results to
literature that have been done point to the poor quality of
higher profile journals, to journal editors being more likely
animal research and the difficulty of extrapolating from it
to accept positive results and to publish them early
WHY ARE SO MANY EPIDEMIOLOGY ASSOCIATIONS WRONG?
There has been little formal study of publication bias in
a protocol secondary outcome, or reporting a primary
observational epidemiology, or in animal and in-vitro
outcome not mentioned in the protocol. Outcome reporting
research and the few studies that have been done
has not been systematically studied in observational epide-
have not found evidence of publication bias None-
miology or in animal and in-vitro experimentation, but,
theless, its documentation in the more transparent circum-
given the absence of specificity often found in observational
stances of RCTs suggests that publication bias must be
epidemiology or animal protocols and the lack of registra-
a common phenomenon in observational epidemiology,
tion of protocols (compared to what is now expected of
animal, and in-vitro studies. Failure to systematically review
randomized trial protocols it seems highly likely
these bodies of evidence together with publication bias in
that outcome bias is a problem in these areas of research.
the literature base provide the opportunity for substantial
The influence of choice of referent group (or ‘‘comparator’’)
bias and misleading results in the animal literature used to
has also been studied most formally in the RCT literature
create hypotheses for testing in epidemiological studies.
raising concern that similar sources of bias occur
Publications in genetic epidemiology, where up to
in observational epidemiology and in animal research.
a million single nucleotide polymorphism associations areexamined have provided an opportunity to observehow publication bias operates in this area of observational
epidemiology. Ioannides et al have documented theearly publication of extreme genetic associations (those sug-
Animal research will likely continue to be an important
gesting both higher risk and protective genes), whereas later
component of the biological underpinnings of hypothesis
studies, often of higher quality and on larger samples, report
development in epidemiology; therefore epidemiologists
smaller effects or do not show any association with the same
have a vested interest in ensuring that the research they
genotype. In another comparative study, 20 candidate genes
rely on is as valid as possible and that it has been systemat-
previously significantly associated with atorvastatin could
ically reviewed. Given the likelihood that some epidemi-
not be replicated in a genome-wide association study.
ology studies may be testing implausible hypotheses, what
Only one was found to be statistically significantly associ-
measures can be taken to improve this aspect of our science?
ated and eight showed opposite directions of effect
More rigorous animal experiments and their systematic
Not only do candidate genes represent a very small fraction
review should lead to more valid hypotheses for epidemio-
of the genome, they are often based on animal models
logical investigation. While one would hope that bench
which, while they may represent genes conserved in hu-
scientists would learn to do systematic reviews themselves,
mans, have different RNA, proteins, gene interactions,
they are likely to need the help of epidemiologists trained
and other epigenetic characteristics . Moreover, the
in systematic reviewing. Epidemiologists who depend on
animal phenotype is not always analogous to the human
animal research may themselves need to conduct systematic
phenotype all of which may make animal genetic
reviews of the animal research they rely on both for hypoth-
studies uncertain predictors of human genetic associations.
esis development and when using animal research to under-
Systematic review of the murine models for amyotrophic
stand the biological plausibility of their research findings.
lateral sclerosis and other neurodegenerative diseases have
All too often, animal research may be selectively reported
recently identified major design flaws , including
to support epidemiological observations rather than by
genetic heterogeneity even in inbred littermates so that
reference to a systematic review of the totality of animal
the designed phenotype may be lost .
evidence. Ensure that systematic reviewing methodology is part ofthe education of epidemiologists and that it is routinelypracticed. This will lead to more valid and unbiased summa-
ries of the state of biological and epidemiological
Bias in reporting the primary outcome is a recently docu-
mented phenomenon in randomized trials. Chan et al. showed major discrepancies between declared primaryoutcomes in randomized trial protocols from what was pub-
I am grateful to Iain Chalmers for his comments on an early draft of this
lished as the primary outcome in the same study. Overall,
paper. Arienne Hoey provided technical assistance with the manuscript.
62% of trials were discrepant between the protocol andthe published primary outcome, with trials changing the
proposed primary to secondary, completely ignoring (and
1. Taubes G. Epidemiology faces its limits. Science. 1995;269:164–169.
not mentioning) the proposed primary outcome in the
2. Bracken MB. Alarums false, alarums real: challenges and threats to the
publication, introducing a primary outcome that was
future of epidemiology. Ann Epidemiol. 1998;8:79–82.
WHY ARE SO MANY EPIDEMIOLOGY ASSOCIATIONS WRONG?
3. Davey Smith G, Ebrahim S. Epidemiologydis it time to call it a day? Int J
28. Corpet DE, Pierre F. How good are rodent models of carcinogenesis in pre-
dicting efficacy in humans? A systematic review and meta-analysis of colon
4. von Elm E, Egger M. The scandal of poor epidemiological research. BMJ.
chemoprevention in rats, mice and men. Eur J Cancer. 2005;41:1911–
5. Shapiro S. Looking to the 21st century: have we learned from our mistakes,
29. Corpet DE, Pierre F. Point: From animal models to prevention of colon
or are we doomed to compound them? Pharmacoepidemiol Drug Saf.
cancer. Systematic review of chemoprevention in min mice and choice of
the model system. Cancer Epidemiol Biomarkers Prev. 2003;12:391–400.
6. Bonovas S, Filioussi K, Sitaras NM. Statin use and the risk of prostate
30. America’s War on Carcinogens: reassessing the use of animal tests to
cancer: a metaanalysis of 6 randomized clinical trials and 13 observational
predict human cancer riskAmerican Council for Science and Health;
studies. Int J Cancer. 2008;123:899–904.
7. Furlan AD, Tomlinson G, Jadad AA, Bombardier C. Methodological
31. Sandercock P, Roberts I. Systematic reviews of animal experiments.
quality and homogeneity influenced agreement between randomized trials
and nonrandomized studies of the same intervention for back pain. J Clin
32. Pound P, Ebrahim S, Sandercock P, Bracken MB, Roberts I. Where is the
evidence that animal research benefits humans? BMJ. 2004;328:514–517.
8. Martinez ME, Marshall JR, Giovannucci E. Diet and cancer prevention:
33. Mignini LE, Khan KS. Methodological quality of systematic reviews of
the roles of observation and experimentation. Nat Rev Cancer. 2008
animal studies: a survey of reviews of basic research. BMC Med Res Meth-
9. Rosano GM, Vitale C, Lello S. Postmenopausal hormone therapy: lessons
34. Peters JL, Sutton AJ, Jones DR, Rushton L, Abrams KR. A systematic
from observational and randomized studies. Endocrine. 2004;24:251–254.
review of systematic reviews and meta-analyses of animal experiments
10. Wolfe F, Michaud K, Dewitt EM. Why results of clinical trials and observa-
with guidelines for reporting. J Environ Sci Health B. 2006;41:1245–1258.
tional studies of antitumour necrosis factor (anti-TNF) therapy differ: method-
35. Lang IA, Galloway TS, Scarlett A, Henley WE, Depledge M, Wallace RB,
ological and interpretive issues. Ann Rheum Dis. 2004;63(Suppl 2):ii13–ii17.
et al. Association of urinary bisphenol A concentration with medical disor-
11. Ioannidis JP. Why most discovered true associations are inflated. Epidemi-
ders and laboratory abnormalities in adults. JAMA. 2008;300:1303–1310.
36. Goodman JE, McConnell EE, Sipes IG, Witorsch RJ, Slayton TM, Yu CJ,
et al. An updated weight of the evidence evaluation of reproductive and
dwinner’s and otherwisedin genetic epidemiology. Epide-
miology. 2008;19:649–651; discussion 657–668.
developmental effects of low doses of bisphenol A. Crit Rev Toxicol. 2006;36:387–457.
13. Senn S. Transposed conditionals, shrinkage, and direct and indirect unbi-
asedness. Epidemiology. 2008;19:652–654; discussion 657–658.
37. Sena E, van der Worp HB, Howells D, Macleod M. How can we improve
the pre-clinical development of drugs for stroke? Trends Neurosci.
14. Willett WC. The search for truth must go beyond statistics. Epidemiology.
2008;19:655–656; discussion 657–658.
38. Macleod MR, O’Collins T, Horky LL, Howells DW, Donnan GA. System-
15. Ioannidis JP. Why most published research findings are false. PLoS Med.
atic review and metaanalysis of the efficacy of FK506 in experimental
stroke. J Cereb Blood Flow Metab. 2005;25:713–721.
16. Ibn Sina from the James Lind Library. Available at: http://www.jameslin-
39. Bebarta V, Luyten D, Heard K. Emergency medicine animal research: does
dlibrary.org/. Accessed August 2008.
use of randomization and blinding affect the results? Acad Emerg Med.
17. Gold H. The proper study of mankind is the man. Am J Med.
40. Kenter MJ, Cohen AF. Establishing risk of human experimentation with
18. Lasagna L. The diseases drugs cause. Perspect Biol Med. 1964;7:457–470.
drugs: lessons from TGN1412. Lancet. 2006;368:1387–1391.
19. Brodie BB. Symposium on clinical drug evaluation and human pharma-
41. Sundstrom L. Thinking inside the box. To cope with an increasing disease
cology. VI. Difficulties in extrapolating data on metabolism of drugs
burden, drug discovery needs biologically relevant and predictive testing
from animal to man. Clin Pharmacol Ther. 1962;3:374–380.
systems. EMBO Rep. 2007;8 Spec No:S40–43.
20. Florey HW, Abraham EP. The work on penicillin at Oxford. J Hist Med
42. Hill AB. The environment and disease: association or causation? Proc R
21. Nau H. Teratogenicity of isotretinoin revisited: species variation and the
43. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in
role of all-trans-retinoic acid. J Am Acad Dermatol. 2001;45:S183–187.
a cohort study of clinical research projects. BMJ. 1997;315:640–645.
22. Needs CJ, Brooks PM. Antirheumatic medication in pregnancy. Br J Rheu-
44. Ioannidis JP, Contopoulos-Ioannidis DG. Reporting of safety data from
randomised trials. Lancet. 1998;352:1752–1753.
23. Lepper ER, Smith NF, Cox MC, Scripture CD, Figg WD. Thalidomide
45. Neitzke U, Harder T, Schellong K, Melchior K, Ziska T, Rodekamp E,
metabolism and hydrolysis: mechanisms and implications. Curr Drug
et al. Intrauterine growth restriction in a rodent model and developmental
programming of the metabolic syndrome: a critical appraisal of the exper-
24. Stebbings R, Findlay L, Edwards C, Eastwood D, Bird C, North D, et al.
imental evidence. Placenta. 2008;29:246–254.
‘‘Cytokine Storm’’ in the Phase I trial of monoclonal antibody
46. Macleod MR, O’Collins T, Howells DW, Donnan GA. Pooling of animal
TGN1412: better understanding the causes to improve preclinical testing
experimental data reveals influence of study design and publication bias.
of immunotherapeutics. J Immunol. 2007;179:3325–3331.
25. Hackam DG, Redelmeier DA. Translation of research evidence from
47. Juutilainen J, Kumlin T, Naarala J. Do extremely low frequency magnetic
animals to humans. JAMA. 2006;296:1731–1732.
fields enhance the effects of environmental carcinogens? A meta-analysis
26. Perel P, Roberts I, Sena E, Wheble P, Briscoe C, Sandercock P, et al.
of experimental studies. Int J Radiat Biol. 2006;82:1–12.
Comparison of treatment effects between animal experiments and clinical
48. Dirx MJ, Zeegers MP, Dagnelie PC, van den Bogaard T, van den Brandt
trials: systematic review. BMJ. 2007;334:197.
PA. Energy restriction and the risk of spontaneous mammary tumors in
27. Roberts I, Kwan I, Evans P, Haig S. Does animal experimentation
mice: a meta-analysis. Int J Cancer. 2003;106:766–770.
inform human healthcare? Observations from a systematic review of
49. Bracken MB, DeWan A, Hoh J. Genome wide association studies. In: Re-
bbeck TR, Ambrosome CB, Shields PG, eds. Fundamentals of molecular
epidemiology. New York: Taylor & Francis; 2008 pp. 225–238.
WHY ARE SO MANY EPIDEMIOLOGY ASSOCIATIONS WRONG?
50. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Repli-
56. Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empir-
cation validity of genetic association studies. Nat Genet. 2001;29:306–309.
ical evidence for selective reporting of outcomes in randomized trials:
51. Thompson JF, Man M, Johnson KJ, Wood LS, Lira ME, Lloyd DB, et al.
comparison of protocols to published articles. JAMA. 2004;291:2457–
An association study of 43 SNPs in 16 candidate genes with atorvastatin
response. Pharmacogenomics J. 2005;5:352–358.
57. Krleza-Jeric K, Chan AW, Dickersin K, Sim I, Grimshaw J, Guud C. Prin-
52. Williams SM, Haines JL, Moore JH. The use of animal models in the study
ciples for international registration of protocol information and results
of complex disease: all else is never equal or why do so many human studies
from human trials of health related interventions: Ottawa statement
fail to replicate animal findings? Bioessays. 2004;26:170–179.
53. Wojczynski MK, Tiwari HK. Definition of phenotype. Adv Genet.
58. Montori VM, Jaeschke R, Schunemann HJ, Bhandara M, Brozek JL,
Devereaux PJ, et al. Users’ guide to detecting misleading claims inclinical research reports. BMJ. 2004;329:1093–1096.
54. Scott S, Kranz JE, Cole J, Lincecum JM, Thompson K, Kelly N, et al.
Design, power, and interpretation of studies in the standard murine model
59. Mann H, Djulbegovic B. Why comparisons must address genuine uncer-
of ALS. Amyotroph Lateral Scler. 2008;9:4–15.
tainties. James Lind Library. Available at: Accessed Jan 7, 2009.
55. Schnabel J. Neuroscience: standard model. Nature. 2008;454:682–685.
What’s on the bookshelves today? » 5. God Is Not Great “Double Take” by Catherine Coulter (Penguin Group) $25.95 » “Blaze” by Stephen King (Simon & Schuster Adult Publishing Group) $25 “The Diana Chronicles” by Tina Brown (Doubleday Publishing) $27.50 “Penny” by Joyce Meyer, Deborah Bedford (FaithWords) $21.99 Ken Ackerman examines early m
MINUTES OF THE 137th MEETING OF THE THERAPEUTIC ADVISORY SERVICE Held on Tuesday, 26th February 2013 Apologies: Prof I Squire, Ms C Clarke, Dr L Dabydeen, Mr P Golightly, Dr P Topham, Dr A Palfreeman, Mr M Qualie, Dr B Collett, Ms L Gant, Mr D Harris, Ms J Islam, Mrs S Khalid 1 Minutes of last Meeting Dr N Langford and Ms B Pattani attended, with these additions Minutes