Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 05 January 2022

The 5 min meta-analysis: understanding how to read and interpret a forest plot

  • Yaping Chang   ORCID: orcid.org/0000-0002-0549-5087 1 , 2 ,
  • Mark R. Phillips   ORCID: orcid.org/0000-0003-0923-261X 1 , 3 ,
  • Robyn H. Guymer   ORCID: orcid.org/0000-0002-9441-4356 4 , 5 ,
  • Lehana Thabane   ORCID: orcid.org/0000-0003-0355-9734 1 , 6 ,
  • Mohit Bhandari   ORCID: orcid.org/0000-0001-9608-4808 1 , 2 , 3 &
  • Varun Chaudhary   ORCID: orcid.org/0000-0002-9988-4146 1 , 3

on behalf of the R.E.T.I.N.A. study group

Eye volume  36 ,  pages 673–675 ( 2022 ) Cite this article

96k Accesses

25 Citations

249 Altmetric

Metrics details

  • Outcomes research

A Correction to this article was published on 08 May 2023

This article has been updated

Introduction

In the evidence-based practice of ophthalmology, we often read systematic reviews. Why do we bother about systematic reviews? In science, new findings are built cumulatively on multiple and repeatable experiments [ 1 ]. In clinical research, rarely is one study definitive. Using a comprehensive and cumulative approach, systematic reviews synthesize results of individual studies to address a focused question that can guide important decisions, when well-conducted and current [ 2 , 3 , 4 , 5 ].

A systematic review may or may not include a meta-analysis, which provides a statistical approach to quantitatively combine results of studies eligible for a systematic review topic [ 2 , 3 , 4 , 5 ]. Such pooling also improves precision [ 2 , 4 , 5 ]. A “forest plot” is a form of graphical result presentation [ 2 , 4 ]. In this editorial, we start with introducing the anatomy of a forest plot and present 5 tips for understanding the results of a meta-analysis.

Anatomy of a forest plot

We demonstrate the components of a typical forest plot in Fig.  1 , using a topic from a recently published systematic review [ 6 ] but replaced with mockup numbers in analysis. In this example, four randomized trials (Studies #1 to #4) are included to compare a new surgical approach with the conventional surgery for patients with pseudoexfoliation glaucoma. Outcomes of intraocular pressure (IOP) and incidence of minor zonulolysis are evaluated at 1-year follow-up after surgery.

figure 1

A Example of a continuous outcome measure: Intraocular pressure assessed with mean difference; B Example of a dichotomous outcome measure: Incidence of minor zonulolysis, at 1 year after surgery. Tau, the estimated standard deviation of underlying effects across studies (Tau 2 is only displayed in the random model). Chi 2 , the value of Chi-square test for heterogeneity. Random, random model (an analysis model in meta-analysis).

In a forest plot, the box in the middle of each horizontal line (confidence interval, CI) represents the point estimate of the effect for a single study. The size of the box is proportional to the weight of the study in relation to the pooled estimate. The diamond represents the overall effect estimate of the meta-analysis. The placement of the center of the diamond on the x-axis represents the point estimate, and the width of the diamond represents the 95% CI around the point estimate of the pooled effect.

Tip 1: Know the type of outcome than

There are differences in a forest plot depending on the type of outcomes. For a continuous outcome, the mean, standard deviation and number of patients are provided in Columns 2 and 3. A mean difference (MD, the absolute difference between the mean scores in the two groups) with its 95% CI is presented in Column 5 (Fig.  1A ). Some examples of continuous outcomes include IOP (mmHg), visual acuity in rank values, subfoveal choroidal thickness (μm) and cost.

For a dichotomous outcome, the number of events and number of patients, and a risk ratio (RR), also called relative risk, along with its 95% CI are presented in Columns 2,3 and 5 (Fig.  1B ). Examples of dichotomous outcomes include incidence of any adverse events, zonulolysis, capsulotomy and patients’ needing of medication (yes or no).

Tip 2: Understand the weight in a forest plot

Weights (Column 4) are assigned to individual studies according to their contributions to the pooled estimate, by calculating the inverse of the variance of the treatment effect, i.e., one over the square of the standard error. The weight is closely related to a study’s sample size [ 2 ]. In our example, Study #4 consisting of the largest sample size of 114 patients (57 in each group) has the greatest weight, 42.2% in IOP result (Figs.  1A ) and 49.9% in zonulolysis result (Fig.  1B ).

Tip 3: Pay attention to heterogeneity

Heterogeneity represents variation in results that might relate to population, intervention, comparator, outcome measure, risk of bias, study method, healthcare systems and other factors of the individual studies in a meta-analysis [ 2 , 7 ]. If no important heterogeneity is observed, we can trust the pooled estimate more because most or all the individual studies are telling the same answer [ 7 ].

We can identify heterogeneity by visual inspection of similarity of point estimates, overlapping of confidence intervals, and looking at the results of statistical heterogeneity tests outlined at near the bottom of a forest plot [ 2 , 7 ]. When more similarity of point estimates and more overlapping of confidence intervals are observed, it means less heterogeneity [ 2 , 7 ]. The P value generated by the Chi-squared test is the probability of the null hypothesis that there is no heterogeneity between studies. When P  < 0.10 is shown, we reject this null hypothesis and consider that there is heterogeneity across the studies [ 2 ]. P value of 0.10 is typically used for the test of heterogeneity because of the lack of power for the test [ 2 ]. The I 2 statistic ranging from 0 to 100%, indicates the magnitude of heterogeneity. Greater I 2 indicates more heterogeneity. The I 2 below 40% may suggest not important heterogeneity; while the I 2 over 75% may suggest considerable heterogeneity [ 2 ].

For example in Fig.  1A , the point estimate of Study #1 (i.e., the between-group difference of mean IOP, 2.60 mmHg) is different from the point estimates of Studies #2 to #4 (0.20, 0.60 and 0.90 mmHg, respectively). By virtual observation of 95% CI (the horizontal lines), the 95% of Study #1 just partly overlaps with the other studies’. P -value for heterogeneity of 0.12 is relatively small but still >0.05. The I 2 of 49% indicates that a moderate heterogeneity may present [ 2 ]. In Fig.  1B , the 95% CIs of all the four studies largely overlap. The large P value for heterogeneity of 0.74 and the I 2 of 0% both indicate that no important heterogeneity is detected.

Tip 4: Understand subgroups

When heterogeneity is detected, which may indicate the unexplained differences between study estimates, using a subgroup analysis is one of the approaches to explain heterogeneity [ 2 ]. In our example, Study #3 only studied patients who were equal and below 65 years; Studies #1, 2, and 4 also reported IOP for patients of the two different age groups separately (Fig.  2 ). We can find the pooled effects of the two subgroups respectively in the forest plot: 1.1.1 over 65 years, the overall effect favours the new surgery (Section A in Fig.  2 , subtotal MD and 95% CI does not include the line of no effect, P value for overall effect <0.00001, I 2  = 0); and 1.1.2 equal and below 65 years, there is no difference between the conventional and new surgeries (Section B in Fig.  2 , subtotal MD and 95% CI includes the line of no effect, P value for overall effect is 0.10, I 2  = 0%).

figure 2

Subgroup results of IOP by age groups.

There is a subgroup effect by patients' age groups. We can find the result of test for subgroup difference in the last row of the forest plot (Section C in Fig.  2 ): P value of 0.001 and I 2 of 90.1% indicate a significant difference in treatment effects between the subgroups of patients of older or younger age.

Tip 5: Interpret the results in plain language

In our example, lower IOP and fewer zonulolysis are favoured outcomes. The statistical significance of a pooled estimate can be detected by visual inspection of the diamond (if the diamond width includes the line of no effect, there is no statistical difference between the two groups) or checking the p-value in the last row of a forest plot, “Test for overall effect” ( P  < 0.05 indicates a significant difference).

In plain language, for patients with pseudoexfoliation glaucoma, the overall effect for IOP is in favour of the new surgery. More specifically, the new surgery is associated with the lower IOP compared to the conventional surgery 1 year after surgery (mean difference, 0.92 mmHg; 95% CI, 0.21 to 1.63 mmHg) with some concerns of heterogeneity and risk of bias. There is no difference in the incidence of minor zonulolysis between new and conventional surgeries.

In summary, knowing the structure of a forest plot, types of outcome measures, heterogeneity and risk of bias assessments will help us to understand the results of a systematic review. With more practice, the readers will gain more confidence in interpreting a forest plot and making application of systematic reviews’ results in your clinical practice.

Change history

08 may 2023.

A Correction to this paper has been published: https://doi.org/10.1038/s41433-023-02493-0

Zeigler D. Evolution and the cumulative nature of science. Evolution: Education Outreach. 2012;5:585–8. https://doi.org/10.1007/s12052-012-0454-6 .

Article   Google Scholar  

Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions. John Wiley & Sons; 2019.

Haynes RB. Clinical epidemiology: how to do clinical practice research. Lippincott williams & wilkins; 2012.

Murad MH, Montori VM, Ioannidis JP, Neumann I, Hatala R, Meade MO, et al. Understanding and applying the results of a systematic review and meta-analysis. User’s guides to the medical literature: a manual for evidence-based clinical practice. 3rd edn. New York: JAMA/McGraw-Hill Global. 2015.

Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence—imprecision. J Clin Epidemiol. 2011;64:1283–93. https://doi.org/10.1016/j.jclinepi.2011.01.012 .

Article   PubMed   Google Scholar  

Pose-Bazarra S, López-Valladares MJ, López-de-Ullibarri I, Azuara-Blanco A. Surgical and laser interventions for pseudoexfoliation glaucoma systematic review of randomized controlled trials. Eye. 2021;35:1551–61. https://doi.org/10.1038/s41433-021-01424-1 .

Article   PubMed   PubMed Central   Google Scholar  

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence—inconsistency. J Clinl Epidemiol. 2011;64:1294–302. https://doi.org/10.1016/j.jclinepi.2011.03.017 .

Download references

Author information

Authors and affiliations.

Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada

Yaping Chang, Mark R. Phillips, Lehana Thabane, Mohit Bhandari & Varun Chaudhary

OrthoEvidence Inc., Burlington, ON, Canada

Yaping Chang & Mohit Bhandari

Department of Surgery, McMaster University, Hamilton, ON, Canada

Mark R. Phillips, Mohit Bhandari & Varun Chaudhary

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Australia

Robyn H. Guymer

Department of Surgery, (Ophthalmology), The University of Melbourne, Melbourne, Australia

Biostatistics Unit, St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada

Lehana Thabane

Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

Charles C. Wykoff

Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

Sobha Sivaprasad

Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Peter Kaiser

Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

David Sarraf

Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

Sophie Bakri

The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

Sunir J. Garg

Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Rishi P. Singh

Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

Department of Ophthalmology, University of Bonn, Boon, Germany

Frank G. Holz

Singapore Eye Research Institute, Singapore, Singapore

Tien Y. Wong

Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

  • Varun Chaudhary
  • , Mohit Bhandari
  • , Charles C. Wykoff
  • , Sobha Sivaprasad
  • , Lehana Thabane
  • , Peter Kaiser
  • , David Sarraf
  • , Sophie Bakri
  • , Sunir J. Garg
  • , Rishi P. Singh
  • , Frank G. Holz
  • , Tien Y. Wong
  •  & Robyn H. Guymer

Contributions

YC was responsible for the conception of idea, writing of manuscript and review of manuscript. MRP was responsible for the conception of idea, and review of the manuscript. VC was responsible for conception of idea, and review of manuscript. MB was responsible for conception of idea, and review of manuscript. RHG was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary .

Ethics declarations

Competing interests.

YC: Nothing to disclose. MRP: Nothing to disclose. RHG: Advisory boards: Bayer, Novartis, Apellis, Roche, Genentech Inc. LT: Nothing to disclose. MB: Research funds: Pendopharm, Bioventus, Acumed – unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis – unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: in part 'TIP 4: UNDERSTAND SUBGROUPS', the phrase "In our example, Study #3 only studied patients over 65 years" was corrected to read "In our example, Study #3 only studied patients who were equal and below 65 years".

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Chang, Y., Phillips, M.R., Guymer, R.H. et al. The 5 min meta-analysis: understanding how to read and interpret a forest plot. Eye 36 , 673–675 (2022). https://doi.org/10.1038/s41433-021-01867-6

Download citation

Received : 11 November 2021

Revised : 12 November 2021

Accepted : 16 November 2021

Published : 05 January 2022

Issue Date : April 2022

DOI : https://doi.org/10.1038/s41433-021-01867-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

meta analysis research question example

  • Statistical Techniques
  • Mathematics
  • Meta-Analysis

How to conduct a meta-analysis in eight steps: a practical guide

  • November 2021
  • Management Review Quarterly 72(1)

Christopher Hansen at University of Luxembourg

  • University of Luxembourg

Holger Steinmetz at Universität Trier

  • Universität Trier

Joern Hendrich Block at Universität Trier

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Educ Inform Tech

Yoseph Gebrehiwot Tedla

  • Russell Kabir

Haniya zehra Syed

  • Richard Hayhoe
  • Pradeep Dwivedi

Hasan Maksum

  • Dara G. Schniederjans
  • Christina Strauss
  • Michael Dominic Harr
  • Torsten M. Pieper

Hailemariam Meaza

  • · Abbadi Girmay

Richard Yeaw Chong Seow

  • Patrick M. Bossuyt
  • David Moher

Tanja Burgard

  • Robert Studtrucker

Piers Steel

  • Beau Gamble
  • Tomas Havranek

T. D. Stanley

  • Hristos Doucouliagos
  • Robbie C. M. van Aert
  • Frank L. Schmidt
  • John E. Hunter

Michael Borenstein

  • Julian P T Higgins

Hannah Rothstein

  • Justin A. DeSimone

Michael T Brannick

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Doing a Meta-Analysis: A Practical, Step-by-Step Guide

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

What is a Meta-Analysis?

Meta-analysis is a statistical procedure used to combine and synthesize findings from multiple independent studies to estimate the average effect size for a particular research question.

Meta-analysis goes beyond traditional narrative reviews by using statistical methods to integrate the results of several studies, leading to a more objective appraisal of the evidence.

This method addresses limitations like small sample sizes in individual studies, providing a more precise estimate of a treatment effect or relationship strength.

Meta-analyses are particularly valuable when individual study results are inconclusive or contradictory, as seen in the example of vitamin D supplementation and the prevention of fractures.

For instance, a meta-analysis published in JAMA in 2017 by Zhao et al. examined 81 randomized controlled trials involving 53,537 participants.

The results of this meta-analysis suggested that vitamin D supplementation was not associated with a lower risk of fractures among community-dwelling adults. This finding contradicted some earlier beliefs and individual study results that had suggested a protective effect.

What’s the difference between a meta-analysis, systematic review, and literature review?

Literature reviews can be conducted without defined procedures for gathering information. Systematic reviews use strict protocols to minimize bias when gathering and evaluating studies, making them more transparent and reproducible.

While a systematic review thoroughly maps out a field of research, it cannot provide unbiased information on the magnitude of an effect. Meta-analysis statistically combines effect sizes of similar studies, going a step further than a systematic review by weighting each study by its precision.

What is Effect Size?

Statistical significance is a poor metric in meta-analysis because it only indicates whether an effect is likely to have occurred by chance. It does not provide information about the magnitude or practical importance of the effect.

While a statistically significant result may indicate an effect different from zero, this effect might be too small to hold practical value. Effect size, on the other hand, offers a standardized measure of the magnitude of the effect, allowing for a more meaningful interpretation of the findings

Meta-analysis goes beyond simply synthesizing effect sizes; it uses these statistics to provide a weighted average effect size from studies addressing similar research questions. The larger the effect size the stronger the relationship between two variables.

If effect sizes are consistent, the analysis demonstrates that the findings are robust across the included studies. When there is variation in effect sizes, researchers should focus on understanding the reasons for this dispersion rather than just reporting a summary effect.

Meta-regression is one method for exploring this variation by examining the relationship between effect sizes and study characteristics.

T here are three primary families of effect sizes used in most meta-analyses:

  • Mean difference effect sizes : Used to show the magnitude of the difference between means of groups or conditions, commonly used when comparing a treatment and control group.
  • Correlation effect sizes : Represent the degree of association between two continuous measures, indicating the strength and direction of their relationship.
  • Odds ratio effect sizes : Used with binary outcomes to compare the odds of an event occurring between two groups, like whether a patient recovers from an illness or not.

The most appropriate effect size family is determined by the nature of the research question and dependent variable. All common effect sizes are able to be transformed from one version to another.

Real-Life Example

Brewin, C. R., Andrews, B., & Valentine, J. D. (2000). Meta-analysis of risk factors for posttraumatic stress disorder in trauma-exposed adults.  Journal of Consulting and Clinical Psychology ,  68 (5), 748.

This meta-analysis of 77 articles examined risk factors for posttraumatic stress disorder (PTSD) in trauma-exposed adults, with sample sizes ranging from 1,149 to over 11,000. Several factors consistently predicted PTSD with small effect sizes (r = 0.10 to 0.19), including female gender, lower education, lower intelligence, previous trauma, childhood adversity, and psychiatric history. Factors occurring during or after trauma showed somewhat stronger effects (r = 0.23 to 0.40), including trauma severity, lack of social support, and additional life stress. Most risk factors did not predict PTSD uniformly across populations and study types, with only psychiatric history, childhood abuse, and family psychiatric history showing homogeneous effects. Notable differences emerged between military and civilian samples, and methodological factors influenced some risk factor effects. The authors concluded that identifying a universal set of pretrauma predictors is premature and called for more research to understand how vulnerability to PTSD varies across populations and contexts.

How to Conduct a Meta-Analysis

Researchers should develop a comprehensive research protocol that outlines the objectives and hypotheses of their meta-analysis.

This document should provide specific details about every stage of the research process, including the methodology for identifying, selecting, and analyzing relevant studies.

For example, the protocol should specify search strategies for relevant studies, including whether the search will encompass unpublished works.

The protocol should be created before beginning the research process to ensure transparency and reproducibility.

Research Protocol

  • To estimate the overall effect of growth mindset interventions on the academic achievement of students in primary and secondary school.
  • To investigate if the effect of growth mindset interventions on academic achievement differs for students of different ages (e.g., elementary school students vs. high school students).
  • To examine if the duration of the growth mindset intervention impacts its effectiveness.
  • Growth mindset interventions will have a small, but statistically significant, positive effect on student academic achievement.
  • Growth mindset interventions will be more effective for younger students than for older students.
  • Longer growth mindset interventions will be more effective than shorter interventions.

Eligibility Criteria

  • Published studies in English-language journals.
  • Studies must include a quantitative measure of academic achievement (e.g., GPA, course grades, exam scores, or standardized test scores).
  • Studies must involve a growth mindset intervention as the primary focus (including control vs treatment group comparison).
  • Studies that combine growth mindset training with other interventions (e.g., study skills training, other types of psychological interventions) should be excluded.

Search Strategy

The researchers will search the following databases:

Keywords Combined with Boolean Operators:

  • (“growth mindset” OR “implicit theories of intelligence” OR “mindset theory”) AND (“intervention” OR “training” OR “program”) ” OR “educational outcomes”) * OR “pupil ” OR “learner*”)**

Additional Search Strategies:

  • Citation Chaining: Examining the reference lists of included studies can uncover additional relevant articles.
  • Contacting Experts: Reaching out to researchers in the field of growth mindset can reveal unpublished studies or ongoing research.

Coding of Studies

The researchers will code each study for the following information:

  • Sample size
  • Age of participants
  • Duration of intervention
  • Type of academic outcome measured
  • Study design (e.g., randomized controlled trial, quasi-experiment)

Statistical Analysis

  • The researchers will calculate an effect size (e.g., standardized mean difference) for each study.
  • The researchers will use a random-effects model to account for variation in effect sizes across studies.
  • The researchers will use meta-regression to test the hypotheses about moderators of the effect of growth mindset interventions.

meta analysis

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) is a reporting guideline designed to improve the transparency and completeness of systematic review reporting.

PRISMA was created to tackle the issue of inadequate reporting often found in systematic reviews

  • Checklist : PRISMA features a 27-item checklist covering all aspects of a meta-analysis, from the rationale and objectives to the synthesis of findings and discussion of limitations. Each checklist item is accompanied by detailed reporting recommendations in an Explanation and Elaboration document .
  • Flow Diagram : PRISMA also includes a flow diagram to visually represent the study selection process, offering a clear, standardized way to illustrate how researchers arrived at the final set of included studies

Step 1: Defining a Research Question

A well-defined research question is a fundamental starting point for any research synthesis. The research question should guide decisions about which studies to include in the meta-analysis, and which statistical model is most appropriate.

For example:

  • How do dysfunctional attitudes and negative automatic thinking directly and indirectly impact depression?
  • Do growth mindset interventions generally improve students’ academic achievement?
  • What is the association between child-parent attachment and prosociality in children?
  • What is the relation of various risk factors to Post Traumatic Stress Disorder (PTSD)?

Step 2: Search Strategy

Present the full search strategies for all databases, registers and websites, including any filters and limits used. PRISMA 2020 Checklist

A search strategy is a comprehensive and reproducible plan for identifying all relevant research studies that address a specific research question.

This systematic approach to searching helps minimize bias.

It’s important to be transparent about the search strategy and document all decisions for auditability. The goal is to identify all potentially relevant studies for consideration.

PRISMA  (Preferred Reporting Items for Systematic reviews and Meta-Analyses) provide appropriate guidance for reporting quantitative literature searches.

Information Sources

The primary goal is to find all published and unpublished studies that meet the predefined criteria of the research question. This includes considering various sources beyond typical databases

Information sources for a meta-analysis can include a wide range of resources like scholarly databases, unpublished literature, conference papers, books, and even expert consultations.

Specify all databases, registers, websites, organisations, reference lists and other sources searched or consulted to identify studies. Specify the date when each source was last searched or consulted. PRISMA 2020 Checklist

An exhaustive, systematic search strategy is developed with the assistance of an expert librarian.

  • Databases:  Searches should include seven key databases: CINAHL, Medline, APA PsycArticles, Psychology and Behavioral Sciences Collection, APA PsycInfo, SocINDEX with Full Text, and Web of Science: Core Collections.
  • Grey Literature : In addition to databases, forensic or ‘expansive’ searches can be conducted. This includes: grey literature database searches (e.g.  OpenGrey , WorldCat , Ethos ),  conference proceedings, unpublished reports, theses  , clinical trial databases , searches by names of authors of relevant publications. Independent research bodies may also be good sources of material, e.g. Centre for Research in Ethnic Relations , Joseph Rowntree Foundation , Carers UK .
  • Citation Searching : Reference lists often lead to highly cited and influential papers in the field, providing valuable context and background information for the review.
  • Contacting Experts: Reaching out to researchers or experts in the field can provide access to unpublished data or ongoing research not yet publicly available.

It is important to note that this may not be an exhaustive list of all potential databases.

Search String Construction

It is recommended to consult topic experts on the review team and advisory board in order to create as complete a list of search terms as possible for each concept.

To retrieve the most relevant results, a search string is used. This string is made up of:

  • Keywords:  Search terms should be relevant to the research questions, key variables, participants, and research design. Searches should include indexed terms, titles, and abstracts. Additionally, each database has specific indexed terms, so a targeted search strategy must be created for each database.
  • Synonyms: These are words or phrases with similar meanings to the keywords, as authors may use different terms to describe the same concepts. Including synonyms helps cover variations in terminology and increases the chances of finding all relevant studies. For example, a drug intervention may be referred to by its generic name or by one of its several proprietary names.
  • Truncation symbols : These broaden the search by capturing variations of a keyword. They function by locating every word that begins with a specific root. For example, if a user was researching interventions for smoking, they might use a truncation symbol to search for “smok*” to retrieve records with the words “smoke,” “smoker,” “smoking,” or “smokes.” This can save time and effort by eliminating the need to input every variation of a word into a database.
  • Boolean operators: The use of Boolean operators (AND/OR/NEAR/NOT) helps to combine these terms effectively, ensuring that the search strategy is both sensitive and specific. For instance, using “AND” narrows the search to include only results containing both terms, while “OR” expands it to include results containing either term.

When conducting these searches, it is important to combine browsing of texts (publications) with periods of more focused systematic searching. This iterative process allows the search to evolve as the review progresses.

It is important to note that this information may not be entirely comprehensive and up-to-date.

Studies were identified by searching PubMed, PsycINFO, and the Cochrane Library. We conducted searches for studies published between the first available year and April 1, 2009, using the search term mindfulness combined with the terms meditation, program, therapy, or intervention and anxi , depress , mood, or stress. Additionally, an extensive manual review was conducted of reference lists of relevant studies and review articles extracted from the database searches. Articles determined to be related to the topic of mindfulness were selected for further examination.
Specify the inclusion and exclusion criteria for the review. PRISMA 2020 Checklist

Before beginning the literature search, researchers should establish clear eligibility criteria for study inclusion

To maintain transparency and minimize bias, eligibility criteria for study inclusion should be established a priori. Ideally, researchers should aim to include only high-quality randomized controlled trials that adhere to the intention-to-treat principle.

The selection of studies should not be arbitrary, and the rationale behind inclusion and exclusion criteria should be clearly articulated in the research protocol.

When specifying the inclusion and exclusion criteria, consider the following aspects:

  • Intervention Characteristics: Researchers might decide that, in order to be included in the review, an intervention must have specific characteristics. They might require the intervention to last for a certain length of time, or they might determine that only interventions with a specific theoretical basis are appropriate for their review.
  • Population Characteristics: A meta-analysis might focus on the effects of an intervention for a specific population. For instance, researchers might choose to focus on studies that included only nurses or physicians.
  • Outcome Measures: Researchers might choose to include only studies that used outcome measures that met a specific standard.
  • Age of Participants: If a meta-analysis is examining the effects of a treatment or intervention for children, the authors of the review will likely choose to exclude any studies that did not include children in the target age range.
  • Diagnostic Status of Participants: Researchers conducting a meta-analysis of treatments for anxiety will likely exclude any studies where the participants were not diagnosed with an anxiety disorder.
  • Study Design: Researchers might determine that only studies that used a particular research design, such as a randomized controlled trial, will be included in the review.
  • Control Group: In a meta-analysis of an intervention, researchers might choose to include only studies that included certain types of control groups, such as a waiting list control or another type of intervention.
  • Publication status : Decide whether only published studies will be included or if unpublished works, such as dissertations or conference proceedings, will also be considered.
Studies were selected if (a) they included a mindfulness-based intervention, (b) they included a clinical sample (i.e., participants had a diagnosable psychological or physical/medical disorder), (c) they included adult samples (18 – 65 years of age), (d) the mindfulness program was not coupled with treatment using acceptance and commitment therapy or dialectical behavior therapy, (e) they included a measure of anxiety and/or mood symptoms at both pre and postintervention, and (f) they provided sufficient data to perform effect size analyses (i.e., means and standard deviations, t or F values, change scores, frequencies, or probability levels). Studies were excluded if the sample overlapped either partially or completely with the sample of another study meeting inclusion criteria for the meta-analysis. In these cases, we selected for inclusion the study with the larger sample size or more complete data for measures of anxiety and depression symptoms. For studies that provided insufficient data but were otherwise appropriate for the analyses, authors were contacted for supplementary data.

Iterative Process

The iterative nature of developing a search strategy stems from the need to refine and adapt the search process based on the information encountered at each stage.

A single attempt rarely yields the perfect final strategy. Instead, it is an evolving process involving a series of test searches, analysis of results, and discussions among the review team.

Here’s how the iterative process unfolds:

  • Initial Strategy Formulation: Based on the research question, the team develops a preliminary search strategy, including identifying relevant keywords, synonyms, databases, and search limits.
  • Test Searches and Refinement: The initial search strategy is then tested on chosen databases. The results are reviewed for relevance, and the search strategy is refined accordingly. This might involve adding or modifying keywords, adjusting Boolean operators, or reconsidering the databases used.
  • Discussions and Iteration: The search results and proposed refinements are discussed within the review team. The team collaboratively decides on the best modifications to improve the search’s comprehensiveness and relevance.
  • Repeating the Cycle: This cycle of test searches, analysis, discussions, and refinements is repeated until the team is satisfied with the strategy’s ability to capture all relevant studies while minimizing irrelevant results.

By constantly refining the search strategy based on the results and feedback, researchers can be more confident that they have identified all relevant studies.

This iterative process ensures that the applied search strategy is sensitive enough to capture all relevant studies while maintaining a manageable scope.

Throughout this process, meticulous documentation of the search strategy, including any modifications, is crucial for transparency and future replication of the meta-analysis.

Step 3: Search the Literature

Conduct a systematic search of the literature using clearly defined search terms and databases.

Applying the search strategy involves entering the constructed search strings into the respective databases’ search interfaces. These search strings, crafted using Boolean operators, truncation symbols, wildcards, and database-specific syntax, aim to retrieve all potentially relevant studies addressing the research question.

The researcher, during this stage, interacts with the database’s features to refine the search and manage the retrieved results.

This might involve employing search filters provided by the database to focus on specific study designs, publication types, or other relevant parameters.

Applying the search strategy is not merely a mechanical process of inputting terms; it demands a thorough understanding of database functionalities and a discerning eye to adjust the search based on the nature of retrieved results.

Step 4: Screening & Selecting Research Articles

Once the literature search is complete, the next step is to screen and select the studies that will be included in the meta-analysis.

This involves carefully reviewing each study to determine its relevance to the research question and its methodological quality.

The goal is to identify studies that are both relevant to the research question and of sufficient quality to contribute to a meaningful synthesis.

Studies meeting the eligibility criteria are usually saved into electronic databases, such as Endnote or Mendeley , and include title, authors, date and publication journal along with an abstract (if available).

Selection Process

Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process. PRISMA 2020 Checklist

The selection process in a meta-analysis involves multiple reviewers to ensure rigor and reliability.

Two reviewers should independently screen titles and abstracts, removing duplicates and irrelevant studies based on predefined inclusion and exclusion criteria.

  • Initial screening of titles and abstracts: After applying a strategy to search the literature,, the next step involves screening the titles and abstracts of the identified articles against the predefined inclusion and exclusion criteria. During this initial screening, reviewers aim to identify potentially relevant studies while excluding those clearly outside the scope of the review. It is crucial to prioritize over-inclusion at this stage, meaning that reviewers should err on the side of keeping studies even if there is uncertainty about their relevance. This cautious approach helps minimize the risk of inadvertently excluding potentially valuable studies.
  • Retrieving and assessing full texts: For studies which a definitive decision cannot be made based on the title and abstract alone, reviewers need to obtain the full text of the articles for a comprehensive assessment against the predefined inclusion and exclusion criteria. This stage involves meticulously reviewing the full text of each potentially relevant study to determine its eligibility definitively.
  • Resolution of Disagreements : In cases of disagreement between reviewers regarding a study’s eligibility, a predefined strategy involving consensus-building discussions or arbitration by a third reviewer should be in place to reach a final decision. This collaborative approach ensures a fair and impartial selection process, further strengthening the review’s reliability.

PRISMA Flowchart

The PRISMA flowchart is a visual representation of the study selection process within a systematic review.

The flowchart illustrates the step-by-step process of screening, filtering, and selecting studies based on predefined inclusion and exclusion criteria.

The flowchart visually depicts the following stages:

  • Identification: The initial number of titles and abstracts identified through database searches.
  • Screening: The screening process, based on titles and abstracts.
  • Eligibility: Full-text copies of the remaining records are retrieved and assessed for eligibility.
  • Inclusion: Applying the predefined inclusion criteria resulted in the inclusion of publications that met all the criteria for the review.
  • Exclusion: The flowchart details the reasons for excluding the remaining records.

This systematic and transparent approach, as visualized in the PRISMA flowchart, ensures a robust and unbiased selection process, enhancing the reliability of the systematic review’s findings.

The flowchart serves as a visual record of the decisions made during the study selection process, allowing readers to assess the rigor and comprehensiveness of the review.

  • How to fill a PRISMA flow diagram

Meta analysis PRISMA flow diagram

Step 5: Evaluating the Quality of Studies

Data collection process.

Specify the methods used to collect data from reports, including how many reviewers collected data from each report, whether they worked independently, any processes for obtaining or confirming data from study investigators, and if applicable, details of automation tools used in the process. PRISMA 2020 Checklist

Data extraction focuses on information relevant to the research question, such as risk or recovery factors related to a particular phenomenon.

Extract data relevant to the research question, such as effect sizes, sample sizes, means, standard deviations, and other statistical measures.

It can be useful to focus on the authors’ interpretations of findings rather than individual participant quotes, as the latter lacks the full context of the original data.

The coding of studies in a meta-analysis involves carefully and systematically extracting data from each included study in a standardized and reliable manner. This step is essential for ensuring the accuracy and validity of the meta-analysis’s findings.

This information is then used to calculate effect sizes, examine potential moderators, and draw overall conclusions.

Coding procedures typically involve creating a standardized record form or coding protocol. This form guides the extraction of data from each study in a consistent and organized manner. Two independent observers can help to ensure accuracy and minimize errors during data extraction.

Beyond basic information like authors and publication year, code crucial study characteristics relevant to the research question.

For example, if the meta-analysis focuses on the effects of a specific therapy, relevant characteristics to code might include:
  • Study characteristics : Publicatrion year, authors, country of origin, publication status ( Published : Peer-reviewed journal articles and book chapters Unpublished : Government reports, websites, theses/dissertations, conference presentations, unpublished manuscripts).
  • Intervention : Type (e.g., CBT), duration of treatment, frequency (e.g., weekly sessions), delivery method (e.g., individual, group, online), intention-to-treat analysis (Yes/No)
  • Outcome measures : Primary vs. secondary outcomes, time points of measurement (e.g., post-treatment, follow-up).
  • Moderators : Participant characteristics that might moderate the effect size. (e.g., age, gender, diagnosis, socioeconomic status, education level, comorbidities).
  • Study design : Design (RCT quasi-experiment, etc.), blinding, control group used (e.g., waitlist control, treatment as usual), study setting (clinical, community, online/remote, inpatient vs. outpatient), pre-registration (yes/no), allocation method (simple randomization, block randomization, etc.).
  • Sample : Recruitment method (snowball, random, etc.), sample size (total and groups), sample location (treatment & control group), attrition rate, overlap with sample(s) from another study?
  • Adherence to reporting guidelines : e.g., CONSORT, STROBE, PRISMA
  • Funding source : Government, industry, non-profit, etc.
  • Effect Size : Comprehensive meta-analysis program is used to compute d and/or r. Include up to 3 digits after the decimal point for effect size information and internal consistency information. Also record the page number and table number from which the information is coded. This information helps when checking reliability and accuracy to ensure we are coding from the same information.

Before applying the coding protocol to all studies, it’s crucial to pilot test it on a small subset of studies. This helps identify any ambiguities, inconsistencies, or areas for improvement in the coding protocol before full-scale coding begins.

It’s common to encounter missing data in primary research articles. Develop a clear strategy for handling missing data, which might involve contacting study authors, using imputation methods, or performing sensitivity analyses to assess the impact of missing data on the overall results.

Quality Appraisal Tools

Researchers use standardized tools to assess the quality and risk of bias in the quantitative studies included in the meta-analysis. Some commonly used tools include:

  • Recommended by the Cochrane Collaboration for assessing randomized controlled trials (RCTs).
  • Evaluates potential biases in selection, performance, detection, attrition, and reporting.
  • Used for assessing the quality of non-randomized studies, including case-control and cohort studies.
  • Evaluates selection, comparability, and outcome assessment.
  • Assesses risk of bias in non-randomized studies of interventions.
  • Evaluates confounding, selection bias, classification of interventions, deviations from intended interventions, missing data, measurement of outcomes, and selection of reported results.
  • Specifically designed for diagnostic accuracy studies.
  • Assesses risk of bias and applicability concerns in patient selection, index test, reference standard, and flow and timing.

By using these tools, researchers can ensure that the studies included in their meta-analysis are of high methodological quality and contribute reliable quantitative data to the overall analysis.

Step 6: Choice of Effect Size

The choice of effect size metric is typically determined by the research question and the nature of the dependent variable.

  • Odds Ratio (OR) : For instance, if researchers are working in medical and health sciences where binary outcomes are common (e.g., yes/no, failed/success), effect sizes like relative risk and odds ratio are often used.
  • Mean Difference : Studies focusing on experimental or between-group comparisons often employ mean differences. The raw mean difference, or unstandardized mean difference, is suitable when the scale of measurement is inherently meaningful and comparable across studies.
  • Standardized Mean Difference (SMD) : If studies use different scales or measures, the standardized mean difference (e.g., Cohen’s d) is more appropriate. When analyzing observational studies, the correlation coefficient is commonly chosen as the effect size.
  • Pearson correlation coefficient (r) : A statistical measure frequently employed in meta-analysis to examine the strength of the relationship between two continuous variables.

Conversion of efect sizes to a common measure

May be necessary to convert reported findings to the chosen primary effect size. The goal is to harmonize different effect size measures to a common metric for meaningful comparison and analysis.

This conversion allows researchers to include studies that report findings using various effect size metrics. For instance, r can be approximately converted to d, and vice versa, using specific equations. Similarly, r can be derived from an odds ratio using another formula.

Many equations relevant to converting effect sizes can be found in Rosenthal (1991).

Step 7: Assessing Heterogeneity

Heterogeneity refers to the variation in effect sizes across studies after accounting for within-study sampling errors.

Heterogeneity refers to how much the results (effect sizes) vary between different studies, where no variation would mean all studies showed the same improvement (no heterogeneity), while greater variation indicates more heterogeneity.

Assessing heterogeneity matters because it helps us understand if the study intervention works consistently across different contexts and guides how we combine and interpret the results of multiple studies.

While little heterogeneity allows us to be more confident in our overall conclusion, significant heterogeneity necessitates further investigation into its underlying causes.

How to assess heterogeneity

  • Homogeneity Test : Meta-analyses typically include a homogeneity test to determine if the effect sizes are estimating the same population parameter. The test statistic, denoted as Q, is a weighted sum of squares that follows a chi-square distribution. A significant Q statistic suggests that the effect sizes are heterogeneous.
  • I2 Statistic : The I2 statistic is a relative measure of heterogeneity that represents the ratio of between-study variance (τ2) to the total variance (between-study variance plus within-study variance). Higher I2 values indicate greater heterogeneity.
  • Prediction Interval : Examining the width of a prediction interval can provide insights into the degree of heterogeneity. A wide prediction interval suggests substantial heterogeneity in the population effect size.

Step 8: Choosing the Meta-Analytic Model

Meta-analysts address heterogeneity by choosing between fixed-effects and random-effects analytical models.

Use a random-effects model if heterogeneity is high. Use a fixed-effect model if heterogeneity is low, or if all studies are functionally identical and you are not seeking to generalize to a range of scenarios.

Although a statistical test for homogeneity can help assess the variability in effect sizes across studies, it shouldn’t dictate the choice between fixed and random effects models.

The decision of which model to use is ultimately a conceptual one, driven by the researcher’s understanding of the research field and the goals of the meta-analysis.

If the number of studies is limited, a fixed-effects analysis is more appropriate, while more studies are required for a stable estimate of the between-study variance in a random-effects model.

It is important to note that using a random-effects model is generally a more conservative approach.

Fixed-effects models

  • Assumes all studies are measuring the exact same thing
  • Gives much more weight to larger studies
  • Use when studies are very similar

Fixed-effects models assume that there is one true effect size underlying all studies. The goal is to estimate this common effect size with the greatest precision, which is achieved by minimizing the within-study (sampling).

Consequently, studies are weighted by the inverse of their variance.

This means that larger studies, which generally have smaller variances, are assigned greater weight in the analysis because they provide more precise estimates of the common effect size

  • Simplicity: The fixed-effect model is straightforward to implement and interpret, making it computationally simpler.
  • Precision: When the assumption of a common effect size is met, fixed-effect models provide more precise estimates with narrower confidence intervals compared to random-effects models.
  • Suitable for Conditional Inferences: Fixed-effect models are appropriate when the goal is to make inferences specifically about the studies included in the meta-analysis, without generalizing to a broader population.
  • Restrictive Assumptions: The fixed-effect model assumes all studies estimate the same population parameter, which is often unrealistic, particularly with studies drawn from diverse methodologies or populations.
  • Limited Generalizability: Findings from fixed-effect models are conditional on the included studies, limiting their generalizability to other contexts or populations.
  • Sensitivity to Heterogeneity: Fixed-effect models are sensitive to the presence of heterogeneity among studies, and may produce misleading results if substantial heterogeneity exists.

Random-effects models

  • Assumes studies might be measuring slightly different things
  • Gives more balanced weight to both large and small studies
  • Use when studies might vary in methods or populations

Random-effects models assume that the true effect size can vary across studies. The goal here is to estimate the mean of these varying effect sizes, considering both within-study variance and between-study variance (heterogeneity).

This approach acknowledges that each study might estimate a slightly different effect size due to factors beyond sampling error, such as variations in study populations, interventions, or designs.

This balanced weighting prevents large studies from disproportionately influencing the overall effect size estimate, leading to a more representative average effect size that reflects the distribution of effects across a range of studies.

  • Realistic Assumptions: Random-effects models acknowledge the presence of between-study variability by assuming true effects are randomly distributed, making it more suitable for real-world research scenarios.
  • Generalizability: Random-effects models allow for broader inferences to be made about a population of studies, enhancing the generalizability of findings.
  • Accommodation of Heterogeneity: Random-effects models explicitly model heterogeneity, providing a more accurate representation of the overall effect when studies have varying effect sizes.
  • Complexity: Random-effects models are computationally more complex, requiring the estimation of additional parameters, such as between-study variance.
  • Reduced Precision: Confidence intervals tend to be wider compared to fixed-effect models, particularly when between-study heterogeneity is substantial.
  • Requirement for Sufficient Studies: Accurate estimation of between-study variance necessitates a sufficient number of studies, making random-effects models less reliable with smaller meta-analyses.

Step 9: Perform the Meta-Analysis

This step involves statistically combining effect sizes from chosen studies. Meta-analysis uses the weighted mean of effect sizes, typically giving larger weights to more precise studies (often those with larger sample sizes).

The main function of meta-analysis is to estimate effects in a population by combining the effect sizes from multiple articles.

It uses a weighted mean of the effect sizes, typically giving larger weights to more precise studies, often those with larger sample sizes.

This weighting scheme makes statistical sense because an effect size with good sampling accuracy (i.e., likely to be an accurate reflection of reality) is weighted highly.

On the other hand, effect sizes from studies with lower sampling accuracy are given less weight in the calculations.

the process:

  • Calculate weights for each study
  • Multiply each study’s effect by its weight
  • Add up all these weighted effects
  • Divide by the sum of all weights

Estimating effect size using fixed effects

The fixed-effects model in meta-analysis operates under the assumption that all included studies are estimating the same true effect size.

This model focuses solely on within-study variance when determining the weight of each study.

The weight is calculated as the inverse of the within-study variance, which typically results in larger studies receiving substantially more weight in the analysis.

This approach is based on the idea that larger studies provide more precise estimates of the true effect.

The weighted mean effect size (M) is calculated by summing the products of each study’s effect size (ESi) and its corresponding weight (wi) and dividing that sum by the total sum of the weights:

1. Calculate weights (wi) for each study:

The weight is often the inverse of the variance of the effect size. This means studies with larger sample sizes and less variability will have greater weight, as they provide more precise estimates of the effect size

This weighting scheme reflects the assumption in a fixed-effect model that all studies are estimating the same true effect size, and any observed differences in effect sizes are solely due to sampling error. Therefore, studies with less sampling error (i.e., smaller variances) are considered more reliable and are given more weight in the analysis.

Here’s the formula for calculating the weight in a fixed-effect meta-analysis:

Wi = 1 / VYi 1

  • Wi represents the weight assigned to study i.
  • VYi is the within-study variance for study i.

Practical steps:

  • The weight for each study is calculated as: Weight = 1 / (within-study variance)
  • For example: Let’s say a study reports a within-study variance of 0.04. The weight for this study would be: 1 / 0.04 = 25
  • Calculate the weight for every study included in your meta-analysis using this method.
  • These weights will be used in subsequent calculations, such as computing the weighted mean effect size.
  • Note : In a fixed-effects model, we do not calculate or use τ² (tau squared), which represents between-study variance. This is only used in random-effects models.

2. Multiply each study’s effect by its weight:

After calculating the weight for each study, multiply the effect size by its corresponding weight. This step is crucial because it ensures that studies with more precise effect size estimates contribute proportionally more to the overall weighted mean effect size

  • For each study, multiply its effect size by the weight we just calculated.

3. Add up all these weighted effects:

Sum up all the products from step 2.

4. Divide by the sum of all weights:

  • Add up all the weights we calculated in step 1.
  • Divide the sum from step 3 by this total weight.

Implications of the fixed-effects model

  • Larger studies (with smaller within-study variance) receive substantially more weight.
  • This model assumes that differences between study results are due only to sampling error.
  • It’s most appropriate when studies are very similar in methods and sample characteristics.

Estimating effect size using random effects

Random effects meta-analysis is slightly more complicated because multiple sources of differences potentially affecting effect sizes must be accounted for.

The main difference in the random effects model is the inclusion of τ² (tau squared) in the weight calculation. This accounts for between-study heterogeneity, recognizing that studies might be measuring slightly different effects.

This process results in an overall effect size that takes into account both within-study and between-study variability, making it more appropriate when studies differ in methods or populations.

The model estimates the variance of the true effect sizes (τ²). This requires a reasonable number of studies, so random effects estimation might not be feasible with very few studies.

Estimation is typically done using statistical software, with restricted maximum likelihood (REML) being a common method.

1. Calculate weights for each study:

In a random-effects meta-analysis, the weight assigned to each study (W*i) is calculated as the inverse of that study’s variance, similar to a fixed-effect model. However, the variance in a random-effects model considers both the within-study variance (VYi) and the between-studies variance (T^2).

The inclusion of T^2 in the denominator of the weight formula reflects the random-effects model’s assumption that the true effect size can vary across studies.

This means that in addition to sampling error, there is another source of variability that needs to be accounted for when weighting the studies. The between-studies variance, T^2, represents this additional source of variability.

Here’s the formula for calculating the weight in a random-effects meta-analysis:

W*i = 1 / (VYi + T^2)

  • W*i represents the weight assigned to study i.
  • T^2 is the estimated between-studies variance.

First, we need to calculate something called τ² (tau squared). This represents the between-study variance.

The estimation of T^2 can be done using different methods, one common approach being the method of moments (DerSimonian and Laird method).

The formula for T^2 using the method of moments is: T^2 = (Q – df) / C

  • Q is the homogeneity statistic.
  • df is the degrees of freedom (number of studies -1).
  • C is a constant calculated based on the study weights
  • The weight for each study is then calculated as: Weight = 1 / (within-study variance + τ²). This is different from the fixed effects model because we’re adding τ² to account for between-study variability.

Add up all the weights we calculated in step 1. Divide the sum from step 3 by this total weight

Implications of the random-effects model

  • Weights are more balanced between large and small studies compared to the fixed-effects model.
  • It’s most appropriate when studies vary in methods, sample characteristics, or other factors that might influence the true effect size.
  • The random-effects model typically produces wider confidence intervals, reflecting the additional uncertainty from between-study variability.
  • Results are more generalizable to a broader population of studies beyond those included in the meta-analysis.
  • This model is often more realistic for social and behavioral sciences, where true effects may vary across different contexts or populations.

Step 10: Sensitivity Analysis

Assess the robustness of your findings by repeating the analysis using different statistical methods, models (fixed-effects and random-effects), or inclusion criteria. This helps determine how sensitive your results are to the choices made during the process.

Sensitivity analysis strengthens a meta-analysis by revealing how robust the findings are to the various decisions and assumptions made during the process. It helps to determine if the conclusions drawn from the meta-analysis hold up when different methods, criteria, or data subsets are used.

This is especially important since opinions may differ on the best approach to conducting a meta-analysis, making the exploration of these variations crucial.

Here are some key ways sensitivity analysis contributes to a more robust meta-analysis:

  • Assessing Impact of Different Statistical Methods : A sensitivity analysis can involve calculating the overall effect using different statistical methods, such as fixed and random effects models. This comparison helps determine if the chosen statistical model significantly influences the overall results. For instance, in the meta-analysis of β-blockers after myocardial infarction, both fixed and random effects models yielded almost identical overall estimates. This suggests that the meta-analysis findings are resilient to the statistical method employed.
  • Evaluating the Influence of Trial Quality and Size : By analyzing the data with and without trials of questionable quality or varying sizes, researchers can assess the impact of these factors on the overall findings.
  • Examining the Effect of Trials Stopped Early : Including trials that were stopped early due to interim analysis results can introduce bias. Sensitivity analysis helps determine if the inclusion or exclusion of such trials noticeably changes the overall effect. In the example of the β-blocker meta-analysis, excluding trials stopped early had a negligible impact on the overall estimate.
  • Addressing Publication Bias : It’s essential to assess and account for publication bias, which occurs when studies with statistically significant results are more likely to be published than those with null or nonsignificant findings. This can be accomplished by employing techniques like funnel plots, statistical tests (e.g., Begg and Mazumdar’s rank correlation test, Egger’s test), and sensitivity analyses.

By systematically varying different aspects of the meta-analysis, researchers can assess the robustness of their findings and address potential concerns about the validity of their conclusions.

This process ensures a more reliable and trustworthy synthesis of the research evidence.

Common Mistakes

When conducting a meta-analysis, several common pitfalls can arise, potentially undermining the validity and reliability of the findings. Sources caution against these mistakes and offer guidance on conducting methodologically sound meta-analyses.

  • Insufficient Number of Studies: If there are too few primary studies available, a meta-analysis might not be appropriate. While a meta-analysis can technically be conducted with only two studies, the research community might not view findings based on a limited number of studies as reliable evidence. A small number of studies could suggest that the research field is not mature enough for meaningful synthesis.
  • Inappropriate Combination of Studies : Meta-analyses should not simply combine studies indiscriminately. Avoid the “apples and oranges” problem, where studies with different research objectives, designs, measures, or samples are inappropriately combined. Such practices can obscure important differences between studies and lead to misleading conclusions.
  • Misinterpreting Heterogeneity : One common mistake is using the Q statistic or p-value from a test of heterogeneity as the sole indicator of heterogeneity. While these statistics can signal heterogeneity, they do not quantify the extent of variation in effect sizes.
  • Over-Reliance on Published Studies : This dependence on published literature introduces the risk of publication bias, where studies with statistically significant or favorable results are more likely to be published. Failure to acknowledge and address publication bias can lead to overestimating the true effect size.
  • Neglecting Study Quality : Including studies with poor methodological quality can bias the results of a meta-analysis leading to unreliable and inaccurate effect size estimates. The decision of which studies to include should be based on predefined eligibility criteria to ensure the quality and relevance of the synthesis.
  • Fixation on Statistical Significance : Placing excessive emphasis on the statistical significance of an overall effect while neglecting its practical significance is a critical mistake in meta-analysis, as is the case in primary studies. Considers both statistical and clinical or substantive significance.
  • Misinterpreting Significance Testing in Subgroup Analyses : When comparing effect sizes across subgroups, merely observing that an effect is statistically significant in one subgroup but not another is insufficient. Conduct formal tests of statistical significance for the difference in effects between subgroups or to calculate the difference in effects with confidence intervals.
  • Ignoring Dependence : Neglecting dependence among effect sizes, particularly when multiple effect sizes are extracted from the same study, is a mistake. This oversight can inflate Type I error rates and lead to inaccurate estimations of average effect sizes and standard errors.
  • Inadequate Reporting : Failing to transparently and comprehensively report the meta-analysis process is a crucial mistake. A meta-analysis should include a detailed written protocol outlining the research question, search strategy, inclusion criteria, and analytical methods.

Reading List

  • Bar-Haim, Y., Lamy, D., Pergamin, L., Bakermans-Kranenburg, M. J., & Van Ijzendoorn, M. H. (2007). Threat-related attentional bias in anxious and nonanxious individuals: a meta-analytic study .  Psychological bulletin ,  133 (1), 1.
  • Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2021).  Introduction to meta-analysis . John Wiley & Sons.
  • Crits-Christoph, P. (1992). A Meta-analysis .  American Journal of Psychiatry ,  149 , 151-158.
  • Duval, S. J., & Tweedie, R. L. (2000). A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. Journal of the American Statistical Association, 95 (449), 89–98.
  • Egger, M., Davey Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test . BMJ, 315 (7109), 629–634.
  • Egger, M., Smith, G. D., & Phillips, A. N. (1997). Meta-analysis: principles and procedures .  Bmj ,  315 (7121), 1533-1537.
  • Field, A. P., & Gillett, R. (2010). How to do a meta‐analysis .  British Journal of Mathematical and Statistical Psychology ,  63 (3), 665-694.
  • Hedges, L. V., & Pigott, T. D. (2004). The power of statistical tests for moderators in meta-analysis .  Psychological methods ,  9 (4), 426.
  • Hedges, L. V., & Olkin, I. (2014).  Statistical methods for meta-analysis . Academic press.
  • Hofmann, S. G., Sawyer, A. T., Witt, A. A., & Oh, D. (2010). The effect of mindfulness-based therapy on anxiety and depression: A meta-analytic review .  Journal of consulting and clinical psychology ,  78 (2), 169.
  • Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis . Oxford University Press.
  • Lyubomirsky, S., King, L., & Diener, E. (2005). The benefits of frequent positive affect: Does happiness lead to success? .  Psychological bulletin ,  131 (6), 803.
  • Macnamara, B. N., & Burgoyne, A. P. (2022). Do growth mindset interventions impact students’ academic achievement? A systematic review and meta-analysis with recommendations for best practices.  Psychological Bulletin .
  • Polanin, J. R., & Pigott, T. D. (2015). The use of meta‐analytic statistical significance testing .  Research Synthesis Methods ,  6 (1), 63-73.
  • Rodgers, M. A., & Pustejovsky, J. E. (2021). Evaluating meta-analytic methods to detect selective reporting in the presence of dependent effect sizes .  Psychological methods ,  26 (2), 141.
  • Rosenthal, R. (1991). Meta-analysis: a review.  Psychosomatic medicine ,  53 (3), 247-271.
  • Tipton, E., Pustejovsky, J. E., & Ahmadi, H. (2019). A history of meta‐regression: Technical, conceptual, and practical developments between 1974 and 2018 .  Research synthesis methods ,  10 (2), 161-179.
  • Zhao, J. G., Zeng, X. T., Wang, J., & Liu, L. (2017). Association between calcium or vitamin D supplementation and fracture incidence in community-dwelling older adults: a systematic review and meta-analysis.  Jama ,  318 (24), 2466-2482.

Print Friendly, PDF & Email

  • How it works

researchprospect post subheader

Meta-Analysis – Guide with Definition, Steps & Examples

Published by Owen Ingram at April 26th, 2023 , Revised On April 26, 2023

“A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. “

Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning their research work, they are advised to begin from the top of the evidence pyramid. The evidence available in the form of meta-analysis or systematic reviews addressing important questions is significant in academics because it informs decision-making.

What is Meta-Analysis  

Meta-analysis estimates the absolute effect of individual independent research studies by systematically synthesising or merging the results. Meta-analysis isn’t only about achieving a wider population by combining several smaller studies. It involves systematic methods to evaluate the inconsistencies in participants, variability (also known as heterogeneity), and findings to check how sensitive their findings are to the selected systematic review protocol.   

When Should you Conduct a Meta-Analysis?

Meta-analysis has become a widely-used research method in medical sciences and other fields of work for several reasons. The technique involves summarising the results of independent systematic review studies. 

The Cochrane Handbook explains that “an important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention” (section 10.2).

A researcher or a practitioner should choose meta-analysis when the following outcomes are desirable. 

For generating new hypotheses or ending controversies resulting from different research studies. Quantifying and evaluating the variable results and identifying the extent of conflict in literature through meta-analysis is possible. 

To find research gaps left unfilled and address questions not posed by individual studies. Primary research studies involve specific types of participants and interventions. A review of these studies with variable characteristics and methodologies can allow the researcher to gauge the consistency of findings across a wider range of participants and interventions. With the help of meta-analysis, the reasons for differences in the effect can also be explored. 

To provide convincing evidence. Estimating the effects with a larger sample size and interventions can provide convincing evidence. Many academic studies are based on a very small dataset, so the estimated intervention effects in isolation are not fully reliable.

Elements of a Meta-Analysis

Deeks et al. (2019), Haidilch (2010), and Grant & Booth (2009) explored the characteristics, strengths, and weaknesses of conducting the meta-analysis. They are briefly explained below. 

Characteristics: 

  • A systematic review must be completed before conducting the meta-analysis because it provides a summary of the findings of the individual studies synthesised. 
  • You can only conduct a meta-analysis by synthesising studies in a systematic review. 
  • The studies selected for statistical analysis for the purpose of meta-analysis should be similar in terms of comparison, intervention, and population. 

Strengths: 

  • A meta-analysis takes place after the systematic review. The end product is a comprehensive quantitative analysis that is complicated but reliable. 
  • It gives more value and weightage to existing studies that do not hold practical value on their own. 
  • Policy-makers and academicians cannot base their decisions on individual research studies. Meta-analysis provides them with a complex and solid analysis of evidence to make informed decisions. 

Criticisms: 

  • The meta-analysis uses studies exploring similar topics. Finding similar studies for the meta-analysis can be challenging.
  • When and if biases in the individual studies or those related to reporting and specific research methodologies are involved, the meta-analysis results could be misleading.

Steps of Conducting the Meta-Analysis 

The process of conducting the meta-analysis has remained a topic of debate among researchers and scientists. However, the following 5-step process is widely accepted. 

Step 1: Research Question

The first step in conducting clinical research involves identifying a research question and proposing a hypothesis . The potential clinical significance of the research question is then explained, and the study design and analytical plan are justified.

Step 2: Systematic Review 

The purpose of a systematic review (SR) is to address a research question by identifying all relevant studies that meet the required quality standards for inclusion. While established journals typically serve as the primary source for identified studies, it is important to also consider unpublished data to avoid publication bias or the exclusion of studies with negative results.

While some meta-analyses may limit their focus to randomized controlled trials (RCTs) for the sake of obtaining the highest quality evidence, other experimental and quasi-experimental studies may be included if they meet the specific inclusion/exclusion criteria established for the review.

Step 3: Data Extraction

After selecting studies for the meta-analysis, researchers extract summary data or outcomes, as well as sample sizes and measures of data variability for both intervention and control groups. The choice of outcome measures depends on the research question and the type of study, and may include numerical or categorical measures.

For instance, numerical means may be used to report differences in scores on a questionnaire or changes in a measurement, such as blood pressure. In contrast, risk measures like odds ratios (OR) or relative risks (RR) are typically used to report differences in the probability of belonging to one category or another, such as vaginal birth versus cesarean birth.

Step 4: Standardisation and Weighting Studies

After gathering all the required data, the fourth step involves computing suitable summary measures from each study for further examination. These measures are typically referred to as Effect Sizes and indicate the difference in average scores between the control and intervention groups. For instance, it could be the variation in blood pressure changes between study participants who used drug X and those who used a placebo.

Since the units of measurement often differ across the included studies, standardization is necessary to create comparable effect size estimates. Standardization is accomplished by determining, for each study, the average score for the intervention group, subtracting the average score for the control group, and dividing the result by the relevant measure of variability in that dataset.

In some cases, the results of certain studies must carry more significance than others. Larger studies, as measured by their sample sizes, are deemed to produce more precise estimates of effect size than smaller studies. Additionally, studies with less variability in data, such as smaller standard deviation or narrower confidence intervals, are typically regarded as higher quality in study design. A weighting statistic that aims to incorporate both of these factors, known as inverse variance, is commonly employed.

Step 5: Absolute Effect Estimation

The ultimate step in conducting a meta-analysis is to choose and utilize an appropriate model for comparing Effect Sizes among diverse studies. Two popular models for this purpose are the Fixed Effects and Random Effects models. The Fixed Effects model relies on the premise that each study is evaluating a common treatment effect, implying that all studies would have estimated the same Effect Size if sample variability were equal across all studies.

Conversely, the Random Effects model posits that the true treatment effects in individual studies may vary from each other, and endeavors to consider this additional source of interstudy variation in Effect Sizes. The existence and magnitude of this latter variability is usually evaluated within the meta-analysis through a test for ‘heterogeneity.’

Forest Plot

The results of a meta-analysis are often visually presented using a “Forest Plot”. This type of plot displays, for each study, included in the analysis, a horizontal line that indicates the standardized Effect Size estimate and 95% confidence interval for the risk ratio used. Figure A provides an example of a hypothetical Forest Plot in which drug X reduces the risk of death in all three studies.

However, the first study was larger than the other two, and as a result, the estimates for the smaller studies were not statistically significant. This is indicated by the lines emanating from their boxes, including the value of 1. The size of the boxes represents the relative weights assigned to each study by the meta-analysis. The combined estimate of the drug’s effect, represented by the diamond, provides a more precise estimate of the drug’s effect, with the diamond indicating both the combined risk ratio estimate and the 95% confidence interval limits.

odds ratio

Figure-A: Hypothetical Forest Plot

Relevance to Practice and Research 

  Evidence Based Nursing commentaries often include recently published systematic reviews and meta-analyses, as they can provide new insights and strengthen recommendations for effective healthcare practices. Additionally, they can identify gaps or limitations in current evidence and guide future research directions.

The quality of the data available for synthesis is a critical factor in the strength of conclusions drawn from meta-analyses, and this is influenced by the quality of individual studies and the systematic review itself. However, meta-analysis cannot overcome issues related to underpowered or poorly designed studies.

Therefore, clinicians may still encounter situations where the evidence is weak or uncertain, and where higher-quality research is required to improve clinical decision-making. While such findings can be frustrating, they remain important for informing practice and highlighting the need for further research to fill gaps in the evidence base.

Methods and Assumptions in Meta-Analysis 

Ensuring the credibility of findings is imperative in all types of research, including meta-analyses. To validate the outcomes of a meta-analysis, the researcher must confirm that the research techniques used were accurate in measuring the intended variables. Typically, researchers establish the validity of a meta-analysis by testing the outcomes for homogeneity or the degree of similarity between the results of the combined studies.

Homogeneity is preferred in meta-analyses as it allows the data to be combined without needing adjustments to suit the study’s requirements. To determine homogeneity, researchers assess heterogeneity, the opposite of homogeneity. Two widely used statistical methods for evaluating heterogeneity in research results are Cochran’s-Q and I-Square, also known as I-2 Index.

Difference Between Meta-Analysis and Systematic Reviews

Meta-analysis and systematic reviews are both research methods used to synthesise evidence from multiple studies on a particular topic. However, there are some key differences between the two.

Systematic reviews involve a comprehensive and structured approach to identifying, selecting, and critically appraising all available evidence relevant to a specific research question. This process involves searching multiple databases, screening the identified studies for relevance and quality, and summarizing the findings in a narrative report.

Meta-analysis, on the other hand, involves using statistical methods to combine and analyze the data from multiple studies, with the aim of producing a quantitative summary of the overall effect size. Meta-analysis requires the studies to be similar enough in terms of their design, methodology, and outcome measures to allow for meaningful comparison and analysis.

Therefore, systematic reviews are broader in scope and summarize the findings of all studies on a topic, while meta-analyses are more focused on producing a quantitative estimate of the effect size of an intervention across multiple studies that meet certain criteria. In some cases, a systematic review may be conducted without a meta-analysis if the studies are too diverse or the quality of the data is not sufficient to allow for statistical pooling.

Software Packages For Meta-Analysis

Meta-analysis can be done through software packages, including free and paid options. One of the most commonly used software packages for meta-analysis is RevMan by the Cochrane Collaboration.

Assessing the Quality of Meta-Analysis 

Assessing the quality of a meta-analysis involves evaluating the methods used to conduct the analysis and the quality of the studies included. Here are some key factors to consider:

  • Study selection: The studies included in the meta-analysis should be relevant to the research question and meet predetermined criteria for quality.
  • Search strategy: The search strategy should be comprehensive and transparent, including databases and search terms used to identify relevant studies.
  • Study quality assessment: The quality of included studies should be assessed using appropriate tools, and this assessment should be reported in the meta-analysis.
  • Data extraction: The data extraction process should be systematic and clearly reported, including any discrepancies that arose.
  • Analysis methods: The meta-analysis should use appropriate statistical methods to combine the results of the included studies, and these methods should be transparently reported.
  • Publication bias: The potential for publication bias should be assessed and reported in the meta-analysis, including any efforts to identify and include unpublished studies.
  • Interpretation of results: The results should be interpreted in the context of the study limitations and the overall quality of the evidence.
  • Sensitivity analysis: Sensitivity analysis should be conducted to evaluate the impact of study quality, inclusion criteria, and other factors on the overall results.

Overall, a high-quality meta-analysis should be transparent in its methods and clearly report the included studies’ limitations and the evidence’s overall quality.

Hire an Expert Writer

Orders completed by our expert writers are

  • Formally drafted in an academic style
  • Free Amendments and 100% Plagiarism Free – or your money back!
  • 100% Confidential and Timely Delivery!
  • Free anti-plagiarism report
  • Appreciated by thousands of clients. Check client reviews

Hire an Expert Writer

Examples of Meta-Analysis

  • STANLEY T.D. et JARRELL S.B. (1989), « Meta-regression analysis : a quantitative method of literature surveys », Journal of Economics Surveys, vol. 3, n°2, pp. 161-170.
  • DATTA D.K., PINCHES G.E. et NARAYANAN V.K. (1992), « Factors influencing wealth creation from mergers and acquisitions : a meta-analysis », Strategic Management Journal, Vol. 13, pp. 67-84.
  • GLASS G. (1983), « Synthesising empirical research : Meta-analysis » in S.A. Ward and L.J. Reed (Eds), Knowledge structure and use : Implications for synthesis and interpretation, Philadelphia : Temple University Press.
  • WOLF F.M. (1986), Meta-analysis : Quantitative methods for research synthesis, Sage University Paper n°59.
  • HUNTER J.E., SCHMIDT F.L. et JACKSON G.B. (1982), « Meta-analysis : cumulating research findings across studies », Beverly Hills, CA : Sage.

Frequently Asked Questions

What is a meta-analysis in research.

Meta-analysis is a statistical method used to combine results from multiple studies on a specific topic. By pooling data from various sources, meta-analysis can provide a more precise estimate of the effect size of a treatment or intervention and identify areas for future research.

Why is meta-analysis important?

Meta-analysis is important because it combines and summarizes results from multiple studies to provide a more precise and reliable estimate of the effect of a treatment or intervention. This helps clinicians and policymakers make evidence-based decisions and identify areas for further research.

What is an example of a meta-analysis?

A meta-analysis of studies evaluating physical exercise’s effect on depression in adults is an example. Researchers gathered data from 49 studies involving a total of 2669 participants. The studies used different types of exercise and measures of depression, which made it difficult to compare the results.

Through meta-analysis, the researchers calculated an overall effect size and determined that exercise was associated with a statistically significant reduction in depression symptoms. The study also identified that moderate-intensity aerobic exercise, performed three to five times per week, was the most effective. The meta-analysis provided a more comprehensive understanding of the impact of exercise on depression than any single study could provide.

What is the definition of meta-analysis in clinical research?

Meta-analysis in clinical research is a statistical technique that combines data from multiple independent studies on a particular topic to generate a summary or “meta” estimate of the effect of a particular intervention or exposure.

This type of analysis allows researchers to synthesise the results of multiple studies, potentially increasing the statistical power and providing more precise estimates of treatment effects. Meta-analyses are commonly used in clinical research to evaluate the effectiveness and safety of medical interventions and to inform clinical practice guidelines.

Is meta-analysis qualitative or quantitative?

Meta-analysis is a quantitative method used to combine and analyze data from multiple studies. It involves the statistical synthesis of results from individual studies to obtain a pooled estimate of the effect size of a particular intervention or treatment. Therefore, meta-analysis is considered a quantitative approach to research synthesis.

You May Also Like

Content analysis is used to identify specific words, patterns, concepts, themes, phrases, or sentences within the content in the recorded communication.

Descriptive research is carried out to describe current issues, programs, and provides information about the issue through surveys and various fact-finding methods.

Thematic analysis is commonly used for qualitative data. Researchers give preference to thematic analysis when analysing audio or video transcripts.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works
  • Open access
  • Published: 03 March 2017

Meta-evaluation of meta-analysis: ten appraisal questions for biologists

  • Shinichi Nakagawa 1 , 2 ,
  • Daniel W. A. Noble 1 ,
  • Alistair M. Senior 3 , 4 &
  • Malgorzata Lagisz 1  

BMC Biology volume  15 , Article number:  18 ( 2017 ) Cite this article

43k Accesses

331 Citations

97 Altmetric

Metrics details

Meta-analysis is a statistical procedure for analyzing the combined data from different studies, and can be a major source of concise up-to-date information. The overall conclusions of a meta-analysis, however, depend heavily on the quality of the meta-analytic process, and an appropriate evaluation of the quality of meta-analysis (meta-evaluation) can be challenging. We outline ten questions biologists can ask to critically appraise a meta-analysis. These questions could also act as simple and accessible guidelines for the authors of meta-analyses. We focus on meta-analyses using non-human species, which we term ‘biological’ meta-analysis. Our ten questions are aimed at enabling a biologist to evaluate whether a biological meta-analysis embodies ‘mega-enlightenment’, a ‘mega-mistake’, or something in between.

Meta-analyses can be important and informative, but are they all?

Last year saw 40 years since the coining of the term ‘meta-analysis’ by Gene Glass in 1976 [ 1 , 2 ]. Meta-analyses, in which data from multiple studies are combined to evaluate an overall effect, or effect size, were first introduced to the medical and social sciences, where humans are the main species of interest [ 3 , 4 , 5 ]. Decades later, meta-analysis has infiltrated different areas of biological sciences [ 6 ], including ecology, evolutionary biology, conservation biology, and physiology. Here non-human species, or even ecosystems, are the main focus [ 7 , 8 , 9 , 10 , 11 , 12 ]. Despite this somewhat later arrival, interest in meta-analysis has been rapidly increasing in biological sciences. We have argued that the remarkable surge in interest over the last several years may indicate that meta-analysis is superseding traditional (narrative) reviews as a more objective and informative way of summarizing biological topics [ 8 ].

It is likely that the majority of us (biologists) have never conducted a meta-analysis. Chances are, however, that almost all of us have read at least one. Meta-analysis can not only provide quantitative information (such as overall effects and consistency among studies), but also qualitative information (such as dominant research trends and current knowledge gaps). In contrast to that of many medical and social scientists [ 3 , 5 ], the training of a biologist does not typically include meta-analysis [ 13 ] and, consequently, it may be difficult for a biologist to evaluate and interpret a meta-analysis. As with original research studies, the quality of meta-analyses vary immensely. For example, recent reviews have revealed that many meta-analyses in ecology and evolution miss, or perform poorly, several critical steps that are routinely implemented in the medical and social sciences [ 14 , 15 ] (but also see [ 16 , 17 ]).

The aim of this review is to provide ten appraisal questions that one should ask when reading a meta-analysis (cf., [ 18 , 19 ]), although these questions could also be used as simple and accessible guidelines for researchers conducting meta-analyses. In this review, we only deal with ‘narrow sense’ or ‘formal’ meta-analyses, where a statistical model is used to combine common effect sizes across studies, and the model takes into account sampling error, which is a function of sample size upon which each effect size is based (more details below; for discussions on the definitions of meta-analysis, see [ 15 , 20 , 21 ]). Further, our emphasis is on ‘biological’ meta-analyses, which deal with non-human species, including model organisms (nematodes, fruit flies, mice, and rats [ 22 ]) and non-model organisms, multiple species, or even entire ecosystems. For medical and social science meta-analyses concerning human subjects, large bodies of literature and excellent guidelines already exist, especially from overseeing organizations such as the Cochrane (Collaboration) and the Campbell Collaboration. We refer to the literature and the practices from these ‘experienced’ disciplines where appropriate. An overview and roadmap of this review is presented in Fig.  1 . Clearly, we cannot cover all details, but we cite key references in each section so that interested readers can follow up.

Mapping the process (on the left ) and main evaluation questions (on the right ) for meta-analysis. References to the relevant figures (Figs.  2 , 3 , 4 , 5 and 6 ) are included in the blue ovals

Q1: Is the search systematic and transparently documented?

When we read a biological meta-analysis, it used to be (and probably still is) common to see a statement like “a comprehensive search of the literature was conducted” without mention of the date and type of databases the authors searched. Documentation on keyword strings and inclusion criteria is often also very poor, making replication of search outcomes difficult or impossible. Superficial documentation also makes it hard to tell whether the search really was comprehensive, and, more importantly, systematic.

A comprehensive search attempts to identify (almost) all relevant studies/data for a given meta-analysis, and would thus not only include multiple major databases for finding published studies, but also make use of various lesser-known databases to locate reports and unpublished studies. Despite the common belief that search results should be similar among major databases, overlaps can sometimes be only moderate. For example, overlap in search results between Web of Science and Scopus (two of the most popular academic databases) is only 40–50% in many major fields [ 23 ]. As well as reading that a search is comprehensive, it is not uncommon to read that a search was systematic. A systematic search needs to follow a set of pre-determined protocols aimed at minimizing bias in the resulting data set. For example, a search of a single database, with pre-defined focal questions, search strings, and inclusion/exclusion criteria, can be considered systematic, negating some bias, though not necessarily being comprehensive. It is notable that a comprehensive search is preferable but not necessary (and often very difficult to do) whereas a systematic search is a must [ 24 ].

For most meta-analyses in medicine and social sciences, the search steps are systematic and well documented for reproducibility. This is because these studies follow a protocol named the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [ 25 , 26 ]; note that a meta-analysis should usually be a part of a systematic review, although a systematic review may or may not include meta-analysis. The PRISMA statement facilitates transparency in reporting meta-analytic studies. Although it was developed for health sciences, we believe that the details of the four key elements of the PRISMA flow diagram (‘identification’, ‘screening’, ‘eligibility’, and ‘included’) should also be reported in a biological meta-analysis [ 8 ]. Figure  2 shows: A) the key ideas of the PRISMA statement, which the reader should compare with the content of a biological meta-analysis; and B) an example of a PRISMA diagram, which should be included as part of meta-analysis documentation. The bottom line is that one should assess whether search and screening procedures are reproducible and systematic (if not comprehensive; to minimize potential bias), given what is described in the meta-analytic paper [ 27 , 28 ].

Preferred Reporting Items for Systematic Reviews and Meta-Analyses. (PRISMA). a The main components of a systematic review or meta-analysis. The data search (identification) stage should, ideally, be preceded by the development of a detailed study protocol and its preregistration. Searching at least two literature databases, along with other sources of published and unpublished studies (using backward and forward citations, reviews, field experts, own data, grey and non-English literature) is recommended. It is also necessary to report search dates and exact keyword strings. The screening and eligibility stage should be based on a set of predefined study inclusion and exclusion criteria. Criteria might differ for the initial screening (title, abstract) compared with the full-text screening, but both need to be reported in detail. It is good practice to have at least two people involved in screening, with a plan in place for disagreement resolution and calculating disagreement rates. It is recommended that the list of studies excluded at the full-text screening stage, with reasons for their exclusion, is reported. It is also necessary to include a full list of studies included in the final dataset, with their basic characteristics. The extraction and coding (included) stage may also be performed by at least two people (as is recommended in medical meta-analysis). The authors should record the figures, tables, or text fragments within each paper from which the data were extracted, as well as report intermediate calculations, transformations, simplifications, and assumptions made during data extraction. These details make tracing mistakes easier and improve reproducibility. Documentation should include: a summary of the dataset, information on data and study details requested from authors, details of software used, and code for analyses (if applicable). b It is now becoming compulsory to present a PRISMA diagram, which records the flow of information starting from the data search and leading to the final data set. WoS Web of Science

Q2: What question and what effect size?

A meta-analysis should not just be descriptive. The best meta-analyses ask questions or test hypotheses, as is the case with original research. The meta-analytic questions and hypotheses addressed will generally determine the types of effect size statistics the authors use [ 29 , 30 , 31 , 32 ], as we explain below. Three broad groups of effect size statistics are based on are: 1) the difference between the means of two groups (for example, control versus treatment); 2) the relationship, or correlation, between two variables; and 3) the incidence of two outcomes (for example, dead or alive) in two groups (often represented in a 2 by 2 contingency table); see [ 3 , 7 ] for comprehensive lists of effect size statistics. Corresponding common effect size statistics are: 1) standardized mean difference (SMD; often referred to as d , Cohen’s d , Hedges’ d or Hedges’ g ) and the natural logarithm (log) of the response ratio (denoted as either ln R or ln RR [ 33 ]); 2) Fisher’s z -transformed correlation coefficient (often denoted as Zr ); and 3) the natural logarithm of the odds ratio (ln OR ) and relative risk (ln RR ; not to be confused with the response ratio).

We have also used and developed methods associated with less common effect size statistics such as log hazard ratio (ln HR ) for comparing survival curves [ 34 , 35 , 36 , 37 ], and also the log coefficient of variation ratio (ln CVR ) for comparing differences between the variances, rather than means, of two groups [ 38 , 39 , 40 ]. It is important to assess whether a study used an appropriate effect size statistic for the focal question. For example, when the authors are interested in the effect of a certain treatment, they should typically use SMD or response ratio, rather than Zr . Most biological meta-analyses will use one of the standardized effect sizes mentioned above. These effect sizes are referred to as standardized because they are unit-less (dimension-less), and thus are comparable across studies, even if those studies use different units for reporting (for example, size can be measured by weight [g] or length [cm]). However, unstandardized effect sizes (raw mean difference or regression coefficients) can be used, as happens in medical and social sciences, when all studies use common and directly comparable units (for example, blood pressure [mmHg]).

That being said, a biological meta-analysis will often bring together original studies of different types (such as combinations of experimental and observational studies). As a general rule, SMD is considered a better fit for experimental studies, whereas Zr is better for observational (correlational) studies. In some cases different effect sizes might be calculated for different studies in a meta-analysis and then be converted to a common type prior to analysis: for example, Zr and SMD (and also ln OR ) are inter-convertible. Thus, if we were, for example, interested in the effect of temperature on growth, we could combine results from experimental studies that compare mean growth at two temperatures (SMD) with results from observational studies that compare growth across a temperature gradient ( Zr ) in a single meta-analysis by transforming SMD from experimental studies to Zr [ 29 , 30 , 31 , 32 ].

Q3: Is non-independence taken into account?

Statistical non-independence occurs when data points (in this case, effect sizes) are somewhat related to each other. For example, multiple effect sizes may be taken from a single study, making such effect sizes correlated. Failing to account for non-independence among effect sizes (or data points) can lead to erroneous conclusions [ 14 , 42 , 43 ,, 41 – 44 ]—typically, an invalid conclusion of statistical significance (type I error; also see Q7). Many authors do not correct for non-independence (see [ 15 ]). There are two main reasons for this: the authors may be unaware of non-independence among effect sizes or they may have difficulty in appropriately accounting for the correlated structure despite being aware of the problem.

To help the reader to detect non-independence where the authors have failed to take it into account, we have illustrated four common types of dependent effect sizes in Fig.  3 , with the legend including a biological example for each type. Phylogenetic relatedness (Fig.  3d ) is unique to biological meta-analyses that include multiple species [ 14 , 42 , 45 ]. Correction for phylogenetic non-independence can now be implemented in several mainstream software packages, including metafor [ 46 ].

Common sources of non-independence in biological meta-analyses. a – d Hypothetical examples of the four most common scenarios of non-independence ( a - d ). Orange lines and arrows indicate correlations between effect sizes. Effect size estimate ( gray boxes , ‘ ES ’) is the ratio of (or difference between) the means of two groups (control versus treatment). Scenarios a , b , and d may apply to other types of effect sizes (e.g., correlation), while scenario c is unique to situations where two or more groups are compared to one control group. a Multiple effect sizes can be calculated from a single study. Effect sizes in study 3 are not independent of each other because effects (ES3 and ES4) are derived from two experiments using samples from the same population. For example, a study exposed females and males to increased temperatures, and the results are reported separately for the two sexes. b Effect sizes taken from the same study (study 3) are derived from different traits measured from the same subjects, resulting in correlations among these effect sizes. For example, body mass and body length are both indicators of body size, with studies 1 and 2 reporting just one of these measurements and study 3 reporting both for the same group of individuals. c Effect sizes can be correlated via contrast with a common ‘control’ group of individuals; for example, both effect sizes from study 3 share a common control treatment. A study may, for example, compare a balanced diet (control) with two levels of a protein-enriched diet. d In a multi-species study effect sizes can be correlated when they are based on data from organisms from the same taxonomic unit, due to evolutionary history. Effect sizes taken from studies 3 and 4 are not independent, because these studies were performed on the same species ( Sp.3 ). Additionally, all species share a phylogenetic history, and thus all effect sizes can be correlated with one another in accordance with time since evolutionary divergence between species

Where non-independence goes uncorrected because of the difficulty of appropriately accounting for the correlated structure, it is usually because the non-independence is incompatible with the two traditional meta-analytic models (the fixed-effect and the random-effects models—see Q4) that are implemented in widely used software (for example, Metawin [ 47 ]). Therefore, it was (and still is) common to see averaging of non-independent effect sizes or the selection of one among several related effect sizes. These solutions are not necessarily incorrect (see [ 48 ]), but may be limiting, and clearly lead to a loss of information [ 14 , 49 ]. The reader should be aware that it is preferable to model non-independence directly by using multilevel meta-analytic models (see Q4) if the dataset contains a sufficient number of studies (complex models usually require a large sample size) [ 14 ].

Q4: Which meta-analytic model?

There are three main kinds of meta-analytic models, which differ in their assumptions about the data being analyzed, but for all three the common and primary goal is to estimate an overall effect (but see Q5). These models are: i) fixed-effect models (also referred to as common-effect models [ 31 ]); ii) random-effects models [ 50 ]; and iii) multilevel (hierarchical) models [ 14 , 49 ]. We have depicted these three kinds of models in Fig.  4 . When assessing a meta-analysis, the reader should be aware of the different assumptions each model makes. For the fixed-effect (Fig.  4a ) and random-effects (Fig.  4b ) models, all effect sizes are assumed to be independent (that is, one effect per study, with no other sources of non-independence; see Q3). The other major assumption of a fixed-effect model is that all effect sizes share a common mean, and thus that variation among data is solely attributable to sampling error (that is, the sampling variance, v i , which is related to the sample size for each effect size; Fig.  4a ). This assumption, however, is unrealistic for most biological meta-analyses (see [ 22 ]), especially those involving multiple populations, species, and/or ecosystems [ 14 , 51 ]. The use of a fixed-effect model could be justified where the effect sizes are obtained from the same species or population (assuming one effect per study and that the effect sizes are independent of each other). Random-effects models relax the assumption that all studies are based on samples from the same underlying population, meaning that these models can be used when different studies are likely to quantify different underlying mean effects (for example, one study design yields a different effect than another), as is likely to be the case for a biological meta-analysis (Fig.  4b ). A random-effects model needs to quantify the between-study variance, τ 2 , and to estimate this variance correctly requires a sample size of perhaps over ten effect sizes. Thus, random-effects models may not be appropriate for a meta-analysis with very few effect sizes, and fixed-effect models may be appropriate in such situations (bearing in mind the aforementioned assumptions). Multilevel models relax the assumptions of independence made by fixed-effect and random-effects models; that is, for example, these models allow for multiple effect sizes to come from the same study, which may be the case if one study contains several different experimental treatments, or the same experimental treatment is applied across species within one study. The simplest multilevel model depicted in Fig.  4c includes study effects, but it is probably not difficult to imagine this multilevel approach being extended to incorporate more ‘levels’, such as species effects, as well (for more details see [ 13 , 52 , 53 ,, 14 , 41 , 45 , 49 , 51 – 54 ]; incorporating the types of non-independence described in Fig.  3b–d requires modeling of correlation and covariance matrices).

Visualizations of the three main types of meta-analytic models and their assumptions. a The fixed-effect model can be written as y i  =  b 0  +  e i , where y i is the observed effect for the i th study ( i  = 1… k ; orange circles ), b 0 is the overall effect (overall mean; thick grey line and black diamond ) for all k studies and e i is the deviation from b 0 for the i th study ( dashed orange lines ), and e i is distributed with the sampling variance ν i ( orange curves ); note that this variance is sometimes called within-study variance in the literature, but we reserve this term for the multilevel model below. b The random-effects model can be written as y i  =  b 0  +  s i  +  e i , where b 0 is the overall mean for different studies, each of which has a different study-specific mean ( green squares and green solid lines ), deviating by s i ( green dashed lines ) from b 0 , s i is distributed with a variance of τ 2 (the between-study variance; green curves ); note that this is the conventional notation for the between-study variance, but in a biological meta-analysis, it can be referred to as, say, σ 2 [study] . The other notation is as above. Displayed on the top-right is the formula for the heterogeneity statistic, I 2 for the random-effects model, where \( \overline{v} \) is a typical sampling variance (perhaps, most easily conceptualized as the average value of sampling variances, ν i ). c The simplest multilevel model can be written as y ij  =  b 0  +  s i  +  u ij  +  e ij , where u ij is the deviation from s i for j th effect size for the i th study ( blue triangles and dashed blue lines ) and is distributed with the variance of σ 2 (the within-study variance or it may be denoted as σ 2 [effect size] ; blue curves ), e ij is the deviation from u ij , and the other notations are the same as above. Each of k studies has m effect sizes ( j  = 1… m ). Displayed on the top-right is the multilevel meta-analysis formula for the heterogeneity statistic, I 2 , where both the numerator and denominator include the within-study variance, σ 2 , in addition to what appears in the formula for the random-effects model

It is important for you, as the reader, to check whether the authors, given their data, employed an appropriate model or set of models (see Q3), because results from inappropriate models could lead to erroneous conclusions. For example, applying a fixed effect model, when a random effects model is more appropriate, may lead to errors in both the estimated magnitude of the overall effect and its uncertainty [ 55 ]. As can be seen from Fig.  4 , each of the three main meta-analytical models assume that effect sizes are distributed around an overall effect ( b 0 ). The reader should also be aware that this estimated overall effect (meta-analytic mean) is most commonly presented in an accompanying forest plot(s) [ 22 , 56 , 57 ]. Figure  5a is a forest plot of the kind that is typically seen in medical and social sciences, with both overall means from the fixed-effect or the common effect meta-analysis (FEMA/CEMA) model, and the random-effects meta-analysis (REMA) model. In a multiple-species meta-analysis, you may see an elaborate forest plot such as that in Fig.  5b .

Examples of forest plots used in a biological meta-analysis to represent effect sizes and their associated precisions. a A conventional forest plot displaying the magnitude and uncertainty (95% confidence interval, CI) of each effect size in the dataset, as well as reporting the associated numerical values and a reference to the original paper. The sizes of the shapes representing point estimates are usually scaled based on their precision (1/Standard error). Diamonds at the bottom of the plot display the estimated overall mean based on both fixed-effect meta-analysis/‘common-effect’ meta-analysis ( FEMA/CEMA ) and random-effects meta-analysis ( REMA ) models. b A forest plot that has been augmented to display a phylogenetic relationship between different taxa in the analysis; the estimated d seems on average to be higher in some clades than in the others. A diamond at the bottom summarizes the aggregate mean as estimated by a multi-level meta-analysis accounting for the given phylogenetic structure. On the right is the number of effect sizes for each species ( k ), although similarly one could also display the number of individuals/sample-size ( n ), where only one effect size per species is included. c As well as displaying overall effect ( diamond ), forest plots are sometimes used to display the mean effects from different sub-groups of the data (e.g., effects separated by sex or treatment type), as estimated with data sub-setting or meta-regression, or even a slope from meta-regression (indicating how an effect changes with increasing continuous variable, e.g., dosage). d Different magnitudes of correlation coefficient ( r ), and associated 95% CIs, p values, and the sample size on which each estimate is based. The space is shaded according to effect magnitude based on established guidelines; light grey , medium grey , and dark grey correspond to small, medium, and large effects, respectively

Q5: Is the level of consistency among studies reported?

The overall effect reported by a meta-analysis cannot be properly interpreted without an analysis of the heterogeneity, or inconsistency, among effect sizes. For example, an overall mean of zero can be achieved when effect sizes are all zero (homogenous; that is, the between-study variance is 0) or when all effect sizes are very different (heterogeneous; the between study variance is >0) but centered on zero, and clearly one should draw different conclusions in each case. Rather disturbingly, we have recently found that in ecology and evolutionary biology, tests of heterogeneity and their corresponding statistics ( τ 2 , Q , and I 2 ) are only reported in about 40% of meta-analyses [ 58 ]. Cochran’s Q (often referred to as Q total or Q T ) is a test statistic for the between-study variance ( τ 2 ), which allows one to assess whether the estimated between-study variance is non-zero (in other words, whether a fixed-effect model is appropriate as this model assumes τ 2  = 0) [ 59 ]. As a test statistic, Q is often presented with a corresponding p value, which is interpreted in the conventional manner. However, if presented without the associated τ 2 , Q can be misleading because, as is the case with most statistical tests, Q is more likely to be significant when more studies are included even if τ 2 is relatively small (see also Q7); the reader should therefore check whether both statistics are presented. Having said that, the magnitude of the between-study variance ( τ 2 ) can be hard to interpret because it is dependent on the scale of the effect size. The heterogeneity statistic, I 2 , which is a type of intra-class correlation, has also been recommended as it addresses some of the issues associated with Q and τ 2 [ 60 , 61 ]. I 2 ranges from 0 to 1 (or 0 to 100%) and indicates how much of the variation in effect sizes is due to the between-study variance ( τ 2 ; Fig.  4b ) or, more generally, the proportion of variance not attributable to sampling (error) variance ( \( \overline{v} \) ; see Fig.  4b, c ; for more details and extensions, see [ 13 , 14 , 49 , 58 ]). Tentatively suggested benchmarks for I 2 are low, medium, and high heterogeneity of 25, 50, and 75% [ 61 ]. These values are often used in meta-analyses in medical and social sciences for interpreting the degree of heterogeneity [ 62 , 63 ]. However, we have shown that the average I 2 in meta-analyses in ecology and evolution may be as high as 92%, which may not be surprising as these meta-analyses are not confined to a single species (or human subjects) [ 58 ]. Accordingly, the reader should consider whether these conventional benchmarks are applicable to the biological meta-analysis under consideration. The quantification and reporting of heterogeneity statistics is essential for any meta-analysis, and you need to make sure some or combinations of these three statistics are reported in a meta-analysis before making generalisations based on the overall mean effect (except when using fixed-effect models).

Q6: Are the causes of variation among studies investigated?

After quantifying variation among effect sizes beyond sampling variation ( I 2 ), it is important to understand the factors, or moderators, that might explain this additional variation, because it can elucidate important processes mediating variation in the strength of effect. Moderators are equivalent to explanatory (independent) variables or predictors in a normal linear model [ 8 , 49 , 62 ]. For example, in a meta-analysis examining the effect of experimentally increased temperature on growth using SMD (control versus treatment comparison) studies might vary in the magnitude of temperature increase: say 10 versus 20 °C in the first study, but 12 versus 16 °C in the second. In this case, the moderator of interest is the temperature difference between control and treatment groups (10 °C for the first study and 4 °C for the second). This difference in study design may explain variation in the magnitude of the observed effect sizes (that is, the SMD of growth at the two temperatures). Models that examine the effects of moderators are referred to as meta-regressions. One important thing to note is that meta-regression is just a special type of weighted regression. Therefore, the usual standard practices for regression analysis also apply to meta-regression. This means that, as a reader, you may want to check for the inclusion of too many predictors/moderators in a single model, or ‘over-fitting’ (the rule of thumb is that the authors may need at least ten effect sizes per estimated moderator) [ 64 ], and for ‘fishing expeditions’ (also known as ‘data dredging’ or ‘ p hacking’; that is, non-hypothesis-based exploration for statistical significance [ 28 , 65 , 66 ]).

Moderators can be correlated with each other (that is, be subject to the multicollinearity problem) and this dependence, in turn, could lead authors to attribute an effect to the wrong moderator [ 67 ]. For example, in the aforementioned meta-analysis of temperature on growth, the study may claim that females grew faster than males when exposed to increased temperatures. However, if most females came from studies where higher temperature increases were used but males were usually exposed to small increases, the moderators for sex and temperature would be confounded. Accordingly, the effect may be due to the severity of the temperature change rather than a sex effect. Readers should check whether the authors have examined potential confounding effects of moderators and reported how different potential moderators are related to one another. It is also important to know the sources of the moderator data; for example, species-specific data can be obtained from sources (papers, books, databases) other than the primary studies from which effect sizes were taken (Q1). Meta-regression results can be presented in a forest plot, as in Fig.  5c (see also Q6 and Fig.  6e, f ; the standardization of moderators may often be required for analyzing moderators [ 68 ]).

Graphical assessment tools for testing for publication bias. a A funnel plot showing greater variance among effects that have larger standard errors ( SE ) and that are thus more susceptible to sampling variability. Some studies in the lower right corner of the plot, opposite to most major findings, with large SE (less likely to detect significant results) are potentially missing (not shown), suggesting publication bias. b Often funnel plots are depicted using precision (1/SE), giving a different perspective of publication bias, where studies with low precision (or large SE) are expected to show greater sampling variability compared to studies with high precision (or low SE). Note that the data in panel b are the same as in panel a , except that a trim-and-fill analysis has been performed in b . A trim-and-fill analysis estimates the number of studies missing from the meta-analysis and creates ‘mirrored’ studies on the opposite side of the funnel ( unfilled dots ) to estimate how the overall effect size estimate is impacted by these missing studies. c Radial (Galbraith) plot in which the slope should be close to zero, if little publication bias exists, indicating little asymmetry in a corresponding funnel plot (compare it with b ); radial plots are closely associated with Egger’s tests. d Cumulative meta-analysis showing how the effect size changes as the number of studies on a particular topic increases. In this situation, the addition of effect size estimates led to convergence on an overall estimate of 0.36, and the confidence intervals decrease as the precision of the estimate increases. e Bubble plot showing a temporal trend in effect size ( Zr ) across years. Here effect sizes are weighted by their precision; larger bubbles indicate more precise estimates and smaller bubbles less precise. f Bubble plot of the relationship between effect size and impact factors of journals, indicating that larger magnitudes of effect sizes (the absolute values of Zr ) tend to be published in higher impact journals

Another way of exploring heterogeneity is to run separate meta-analysis on data subsets (for example, separating effect sizes by the sex of exposed animals). This is similar to running a meta-regression with categorical moderators (often referred to as subgroup analysis), with the key difference being that the authors can obtain heterogeneity statistics (such as I 2 ) for each subset in a subset analysis [ 69 ]. It is important to note that many meta-analytic studies include more than one meta-analysis, because several different types of data are included, even though these data pertain to one topic (for example, the effect of increased temperature not only on body growth, but also on parasite load). You, as a reader, will need to evaluate whether the authors’ sub-grouping or sub-setting of their data makes sense biologically; hopefully the authors will have provided clear justification (Q1).

Q7: Are effects interpreted in terms of biological importance?

Meta-analyses should focus on biological importance (which is reflected in estimated effects and their uncertainties) rather than on p values and statistical significance, as is outlined in Fig.  5d [ 29 , 71 ,, 70 – 72 ]. It should be clear to most readers that interpreting results only in terms of statistical significance ( p values) can be misleading. For example, in terms of effects’ magnitudes and uncertainties, ES4 and ES6 in Fig.  5d are nearly identical, yet ES4 is statistically significant, while ES6 is not. Also, ES1–3 are all what people describe as ‘highly significant’, but their magnitudes of effect, and thus biological relevance, are very different. The term ‘effective thinking’ is used to refer to the philosophy of placing emphasis on the interpretation of overall effect size in terms of biological importance rather than statistical significance [ 29 ]. It is useful for the reader to know that each of ES1–3 in Fig.  5d can be classified as what Jacob Cohen proposed as small, medium, and large effects, which are r  = 0.1, 0.3, and 0.5, respectively [ 73 ]; for SMD, corresponding benchmarks are d (SMD) = 0.2, 0.5, and 0.8 [ 29 , 61 ]. Researchers may have good intuition for the biological relevance of a particular r value, but this may not be the case for SMD. Thus, it may be helpful to know that Cohen’s benchmarks for r and d are comparable. Having said that, these benchmarks, along with those for I 2 , have to be used carefully, because what constitute biologically important effect magnitudes can vary according to the biological questions and systems (for example, 1% difference in fitness would not matter in ecological time but it certainly does over evolutionary time). We stress that authors should primarily be discussing their effect sizes (point estimates) and uncertainties in terms of point estimates (confidence intervals, or credible intervals, CIs) [ 29 , 70 , 72 ]. Meta-analysts can certainly note statistical significance, which is related to CI width, but direct description of precision may be more useful. Note that effect magnitude and precision are exactly what are displayed in forest plots (Fig.  5 ).

Q8: Has publication bias been considered?

Meta-analysts have to assume that research is published regardless of statistical significance, and that authors have not selectively reported results (that is, that there is no publication bias and no reporting bias) [ 74 , 75 , 76 ]. This is unlikely. Therefore, meta-analysts should check for publication bias using statistical and graphical tools. The reader should know that the commonly used methods for assessing publication bias are funnel plots (Fig.  6a, b ), radial (Galbraith) plots (Fig.  6c ), and Egger’s (regression) tests [ 57 , 77 , 78 ]; these methods visually or statistically (Egger’s test) help to detect funnel asymmetry, which can be caused by publication bias [ 79 ]. However, you should also know that funnel asymmetry may be an artifact of too few a number of effect sizes. Further, funnel asymmetry can result from heterogeneity (non-zero between-study variance, τ 2 ) [ 77 , 80 ]. Some readily-implementable methods for correcting for publication bias also exist, such as trim-and-fill methods [ 81 , 82 ] or the use of the p curve [ 83 ]. The reader should be aware that these methods have shortcomings; for example, the trim-and-fill method can under- or overestimate an overall effect size, while the p curve probably only works when effect sizes come from tightly controlled experiments [ 83 , 84 , 85 , 86 ] (see Q9; note that ‘selection modeling’ is an alternative approach, but it is more technically difficult [ 79 ]). A less contentious topic in this area is the time-lag bias, where the magnitudes of an effect diminish over time [ 87 , 88 , 89 ]. This bias can be easily tested with a cumulative meta-analysis and visualized using a forest plot [ 90 , 91 ] (Fig.  6d ) or a bubble plot combined with meta-regression (Fig.  6e ; note that journal impact factor can also be associated with the magnitudes of effect sizes [ 92 ], Fig.  6f ).

Alarmingly, meta-reviews have found that only half of meta-analyses in ecology and evolution assessed publication bias [ 14 , 15 ]. Disappointingly, there are no perfect solutions for detecting and correcting for publication bias, because we never really know with certainty what kinds of data are actually missing (although usually statistically non-significant and small effect sizes are underrepresented in the dataset; see also Q9). Regardless, the existing tools should still be used and the presentation of results from at least two different methods is recommended.

Q9: Are results really robust and unbiased?

Although meta-analyses from the medical and social sciences are often accompanied by sensitivity analysis [ 69 , 93 ], biological meta-analyses are often devoid of such tests. Sensitivity analyses include not only running meta-analysis and meta-regression without influential effect sizes or studies (for example, many effect sizes that come from one study or one clear outlier effect size; sometimes also termed ‘subset analysis’), but also, for example, comparing meta-analytic models with and without modeling non-independence (Q3–5), or other alternative analyses [ 44 , 93 ]. Analyses related to publication bias could generally also be regarded as part of a sensitivity analysis (Q8). In addition, it is worthwhile checking if the authors discuss missing data [ 94 , 95 ] (different from publication bias; Q8). Two major cases of missing data in meta-analysis are: 1) a lack of the information required to obtain sampling variance for a portion of the dataset (for example, missing standard deviations); and 2) missing information for moderators [ 96 ] (for example, most studies report the sex of animals used but a few studies do not). For the former, the authors should run models both with and without data with sampling variance information; note that without sampling variance (that is, unweighted meta-analysis) the analysis becomes a normal linear model [ 21 ]. For both cases 1 and 2, the authors could use data imputation techniques (as of yet, this is not standard practice). Although data imputation methods are rather technical, their implementation is becoming easier [ 96 , 97 , 98 ]. Furthermore, it may often be important to consider the sample size (the number and precision of constituent effect sizes) and statistical power of a meta-analysis. One of the main reasons to conduct meta-analysis is to increase statistical power. However, where an overall effect is expected to be small (as is often the case with biological phenomena) it is possible that a meta-analysis may be underpowered [ 99 , 100 , 101 ].

Q10: Is the current state (and lack) of knowledge summarized?

In the discussion of a meta-analysis, it is reasonable to expect the authors to discuss what conventional wisdoms the meta-analysis has confirmed or refuted and what new insights the meta-analysis has revealed [ 8 , 19 , 71 , 100 ]. New insights from meta-analyses are known as ‘review-generated evidence’ (as opposed to ‘study-generated evidence’) [ 18 ] because only aggregation of studies can generate such insights. This is analogous to comparative analyses bringing biologists novel understanding of a topic which would be impossible to obtain from studying a single species in isolation [ 14 ]. Because meta-analysis brings available (published) studies together in a systematic and/or comprehensive way (but see Q1), the authors can also summarize less quantitative themes along with the meta-analytic results. For example, the authors could point out what types of primary studies are lacking (that is, identify knowledge gaps). Also, the study should provide clear future directions for the topic under investigation [ 8 , 19 , 71 , 100 ]; for example, what types of empirical work are required to push the topic forward. An obvious caveat is that the value of these new insights, knowledge gaps and future directions is contingent upon the answers to the previous nine questions (Q1–9).

Post meta-evaluation: more to think about

Given that we are advocates of meta-analysis, we are certainly biased in saying ‘meta-analyses are enlightening’. A more nuanced interpretation of what we really mean is that meta-analyses are enlightening when they are done well. Mary Smith and Gene Glass published the first research synthesis carrying the label of ‘meta-analysis’ in 1977 [ 102 ]. At the time, their study and the general concept was ridiculed with the term ‘mega-silliness’ [ 103 ] (see also [ 16 , 17 ]). Although the results of this first meta-analysis on the efficacy of psychotherapies still stand strong, it is possible that a meta-analysis contains many mistakes. In a similar vein, Robert Whittaker warned that the careless use of meta-analyses could lead to ‘mega-mistakes’, reinforcing his case by drawing upon examples from ecology [ 104 , 105 ].

Even where a meta-analysis is conducted well, a future meta-analysis can sometimes yield a completely opposing conclusion from the original (see [ 106 ] for examples from medicine and the reasons why). Thus, medical and social scientists are aware that updating meta-analyses is extremely important, especially given that time-lag bias is a common phenomenon [ 87 , 88 , 89 ]. Although updating is still rare in biological meta-analyses [ 8 ], we believe this should become part of the research culture in the biological sciences. We appreciate the view of John Ioannidis who wrote, “Eventually, all research [both primary and meta-analytic] can be seen as a large, ongoing, cumulative meta-analysis” [ 106 ] (cf. effective thinking; Fig.  6d ).

Finally, we have to note that we have just scratched the surface of the enormous subject of meta-analysis. For example, we did not cover other relevant topics such as multilevel (hierarchical) meta-analytic and meta-regression models [ 14 , 45 , 49 ], which allow more complex sources of non-independence to be modeled, as well as multivariate (multi-response) meta-analyses [ 107 ] and network meta-analyses [ 108 ]. Many of the ten appraisal questions above, however, are also relevant for these extended methods. More importantly, we believe that asking the ten questions above will readily equip biologists with the knowledge necessary to differentiate among mega-enlightenment, mega-mistakes, and something in-between.

Glass GV. Primary, secondary, and meta-analysis research. Educ Res. 1976;5:3–8.

Article   Google Scholar  

Glass GV. Meta-analysis at middle age: a personal history. Res Synth Methods. 2015;6(3):221–31.

Article   PubMed   Google Scholar  

Cooper H, Hedges LV, Valentine JC. The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation; 2009.

Google Scholar  

Hedges L, Olkin I. Statistical methods for meta-analysis. New York: Academic Press; 1985.

Egger M, Smith GD, Altman DG. Systematic reviews in health care: meta-analysis in context. 2nd ed. London: BMJ; 2001.

Book   Google Scholar  

Arnqvist G, Wooster D. Meta-analysis: synthesizing research findings in ecology and evolution. Trends Ecol Evol. 1995;10:236–40.

Article   CAS   PubMed   Google Scholar  

Koricheva J, Gurevitch J, Mengersen K. Handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013.

Nakagawa S, Poulin R. Meta-analytic insights into evolutionary ecology: an introduction and synthesis. Evolutionary Ecol. 2012;26:1085–99.

van der Worp HB, Howells DW, Sena ES, Porritt MJ, Rewell S, O'Collins V, Macleod MR. Can animal models of disease reliably inform human studies? PLoS Med. 2010;7(3), e1000245.

Article   PubMed   PubMed Central   Google Scholar  

Stewart G. Meta-analysis in applied ecology. Biol Lett. 2010;6(1):78–81.

Stewart GB, Schmid CH. Lessons from meta-analysis in ecology and evolution: the need for trans-disciplinary evidence synthesis methodologies. Res Synth Methods. 2015;6(2):109–10.

Lortie CJ, Stewart G, Rothstein H, Lau J. How to critically read ecological meta-analyses. Res Synth Methods. 2015;6(2):124–33.

Nakagawa S, Kubo T. Statistical models for meta-analysis in ecology and evolution (in Japanese). Proc Inst Stat Math. 2016;64(1):105–21.

Nakagawa S, Santos ESA. Methodological issues and advances in biological meta-analysis. Evol Ecol. 2012;26:1253–74.

Koricheva J, Gurevitch J. Uses and misuses of meta-analysis in plant ecology. J Ecol. 2014;102:828–44.

Page MJ, Moher D. Mass production of systematic reviews and meta-analyses: an exercise in mega-silliness? Milbank Q. 2016;94(5):515–9.

Ioannidis JPA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94(5):485–514.

Cooper HM. Research synthesis and meta-analysis : a step-by-step approach. 4th ed. London: SAGE; 2010.

Rothstein HR, Lorite CJ, Stewart GB, Koricheva J, Gurevitch J. Quality standards for research syntheses. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 323–38.

Vetter D, Rcker G, Storch I. Meta-analysis: a need for well-defined usage in ecology and conservation biology. Ecosphere. 2013;6:1–24.

Morrissey M. Meta-analysis of magnitudes, differences, and variation in evolutionary parameters. J Evol Biol. 2016;29(10):1882–904.

Vesterinen HM, Sena ES, Egan KJ, Hirst TC, Churolov L, Currie GL, Antonic A, Howells DW, Macleod MR. Meta-analysis of data from animal studies: a practical guide. J Neurosci Methods. 2014;221:92–102.

Mongeon P, Paul-Hus A. The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics. 2016;106(1):213–28.

Côté IM, Jennions MD. The procedure of meta-analysis in a nutshell. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princton University Press; 2013. p. 14–24.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009;6:e1000100. doi: 10.1371/journal.pmed.1000100 .

Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Internal Med. 2009;151:264–9.

Ellison AM. Repeatability and transparency in ecological research. Ecology. 2010;91(9):2536–9.

Parker TH, Forstmeier W, Koricheva J, Fidler F, Hadfield JD, Chee YE, Kelly CD, Gurevitch J, Nakagawa S. Transparency in ecology and evolution: real problems, real solutions. Trends Ecol Evol. 2016;31(9):711–9.

Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev. 2007;82:591–605.

Borenstein M. Effect size for continuous data. In: Cooper H, Hedges LV, Valentine JC, editors. The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation; 2009. p. 221–35.

Borenstein M, Hedges LV, Higgens JPT, Rothstein HR. Introduction to meta-analysis. West Sussex: Wiley; 2009.

Fleiss JL, Berlin JA. Effect sizes for dichotomous data. In: Cooper H, Hedges LV, Valentine JC, editors. The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation; 2009. p. 237–53.

Hedges LV, Gurevitch J, Curtis PS. The meta-analysis of response ratios in experimental ecology. Ecology. 1999;80(4):1150–6.

Hector KL, Lagisz M, Nakagawa S. The effect of resveratrol on longevity across species: a meta-analysis. Biol Lett. 2012. doi: 10.1098/rsbl.2012.0316 .

Lagisz M, Hector KL, Nakagawa S. Life extension after heat shock exposure: Assessing meta-analytic evidence for hormesis. Ageing Res Rev. 2013;12(2):653–60.

Nakagawa S, Lagisz M, Hector KL, Spencer HG. Comparative and meta-analytic insights into life-extension via dietary restriction. Aging Cell. 2012;11:401–9.

Garratt M, Nakagawa S, Simons MJ. Comparative idiosyncrasies in life extension by reduced mTOR signalling and its distinctiveness from dietary restriction. Aging Cell. 2016;15(4):737–43.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Nakagawa S, Poulin R, Mengersen K, Reinhold K, Engqvist L, Lagisz M, Senior AM. Meta-analysis of variation: ecological and evolutionary applications and beyond. Methods Ecol Evol. 2015;6(2):143–52.

Senior AM, Nakagawa S, Lihoreau M, Simpson SJ, Raubenheimer D. An overlooked consequence of dietary mixing: a varied diet reduces interindividual variance in fitness. Am Nat. 2015;186(5):649–59.

Senior AM, Gosby AK, Lu J, Simpson SJ, Raubenheimer D. Meta-analysis of variance: an illustration comparing the effects of two dietary interventions on variability in weight. Evol Med Public Health. 2016;2016(1):244–55.

Mengersen K, Jennions MD, Schmid CH. Statistical models for the meta-analysis of non-independent data. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 255–83.

Lajeunesse MJ. Meta-analysis and the comparative phylogenetic method. Am Nat. 2009;174(3):369–81.

PubMed   Google Scholar  

Chamberlain SA, Hovick SM, Dibble CJ, Rasmussen NL, Van Allen BG, Maitner BS. Does phylogeny matter? Assessing the impact of phylogenetic information in ecological meta-analysis. Ecol Lett. 2012;15:627–36.

Noble DWA, Lagisz M, O'Dea RE, Nakagawa S. Non-independence and sensitivity analyses in ecological and evolutionary meta-analyses. Mol Ecol. 2017; in press. doi: 10.1111/mec.14031 .

Hadfield J, Nakagawa S. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J Evol Biol. 2010;23:494–508.

Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Software. 2010;36(3):1–48.

Rosenberg MS, Adams DC, Gurevitch J. MetaWin: statistical software for meta-analysis. 2nd ed. Sunderland: Sinauer; 2000.

Marín-Martínez F, Sánchez-Meca J. Averaging dependent effect sizes in meta-analysis: a cautionary note about procedures. Spanish J Psychol. 1999;2:32–8.

Cheung MWL. Modeling dependent effect sizes with three-level meta-analyses: a structural equation modeling approach. Psychol Methods. 2014;19:211–29.

Sutton AJ, Higgins JPI. Recent developments in meta-analysis. Stat Med. 2008;27(5):625–50.

Mengersen K, Schmid CH, Jennions MD, Gurevitch J. Statistical models and approcahes to inference. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 89–107.

Lajeunesse MJ. Meta-analysis and the comparative phylogenetic method. Am Nat. 2009;174:369–81.

Lajeunesse MJ. On the meta-analysis of response ratios for studies with correlated and multi-group designs. Ecology. 2011;92:2049–55.

Lajeunesse MJ, Rosenberg MS, Jennions MD. Phylogenetic nonindepedence and meta-analysis. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 284–99.

Borenstein M, Hedges LV, Higgens JPT, Rothstein H. A basic introduction to fixed-effect and andom-effects models for meta-analysis. Res Synth Methods. 2010;1:97–111.

Vetter D, Rucker G, Storch I. Meta-analysis: a need for well-defined usage in ecology and conservation biology. Ecosphere. 2013;4(6):1–24.

Anzures-Cabrera J, Higgins JPT. Graphical displays for meta-analysis: an overview with suggestions for practice. Res Synth Methods. 2010;1(1):66–80.

Senior AM, Grueber CE, Kamiya T, Lagisz M, O'Dwyer K, Santos ESA, Nakagawa S. Heterogeneity in ecological and evolutionary meta-analyses: its magnitudes and implications. Ecology. 2016; in press.

Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10(1):101–29.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;12:1539–58.

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327:557–60.

Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or I-2 index? Psychol Methods. 2006;11(2):193–206.

Rucker G, Schwarzer G, Carpenter JR, Schumacher M. Undue reliance on I-2 in assessing heterogeneity may mislead. BMC Med Res Methodol. 2008;8:79.

Harrell FEJ. Regression modeling strategies with applications to linear models, logistic regression, and survival analysis. New York: Springer; 2001.

Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):696–701.

Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22(11):1359–66.

Lipsey MW. Those confounded moderators in meta-analysis: Good, bad, and ugly. Ann Am Acad Polit Social Sci. 2003;587:69–81.

Schielzeth H. Simple means to improve the interpretability of regression coefficients. Methods Ecol Evol. 2010;1(2):103–13.

Higgins JPT, Green S. Cochrane handbook for systematic reviews of interventions. West Sussex: Wiley-Blackwell; 2009.

Cumming G, Finch S. A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educ Psychol Meas. 2001;61:532–84.

Jennions MD, Lorite CJ, Koricheva J. Role of meta-analysis in interpreting the scientific literature. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 364–80.

Thompson B. What future quantitative social science research could look like: confidence intervals for effect sizes. Educ Res. 2002;31:25–32.

Cohen J. Statistical power analysis for the beahvioral sciences. 2nd ed. Hillsdale: Lawrence Erlbaum; 1988.

Rothstein HR, Sutton AJ, Borenstein M. Publication bias in meta-analysis: prevention, assessment and adjustments. Chichester: Wiley; 2005.

Sena ES, van der Worp HB, Bath PMW, Howells DW, Macleod MR. Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol. 2010;8(3), e1000344.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Moller AP, Jennions MD. Testing and adjusting for publication bias. Trends Ecol Evol. 2001;16(10):580–6.

Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315:629–34.

Sterne JAC, Egger M. Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. J Clin Epidemiol. 2001;54:1046–55.

Sutton AJ. Publication bias. In: Cooper H, Hedges L, Valentine J, editors. The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation; 2009. p. 435–52.

Lau J, Ioannidis JPA, Terrin N, Schmid CH, Olkin I. Evidence based medicine--the case of the misleading funnel plot. BMJ. 2006;333(7568):597–600.

Duval S, Tweedie R. Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics. 2000;56:455–63.

Duval S, Tweedie R. A nonparametric "trim and fill" method of accounting for publication bias in meta-analysis. J Am Stat Assoc. 2000;95(449):89–98.

Simonsohn U, Nelson LD, Simmons JP. p-curve and effect size: correcting for publication bias using only significant results. Perspect Psychol Sci. 2014;9(6):666–81.

Terrin N, Schmid CH, Lau J, Olkin I. Adjusting for publication bias in the presence of heterogeneity. Stat Med. 2003;22(13):2113–26.

Bruns SB, Ioannidis JPA. p-curve and p-hacking in observational research. PLoS One. 2016;11(2), e0149144.

Schuch FB, Vancampfort D, Rosenbaum S, Richards J, Ward PB, Veronese N, Solmi M, Cadore EL, Stubbs B. Exercise for depression in older adults: a meta-analysis of randomized controlled trials adjusting for publication bias. Rev Bras Psiquiatr. 2016;38(3):247–54.

Jennions MD, Moller AP. Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Proc R Soc Lond B Biol Sci. 2002;269(1486):43–8.

Trikalinos TA, Ioannidis JP. Assessing the evolution of effect sizes over time. In: Rothstein H, Sutton AJ, Borenstein M, editors. Publication bias in meta-analysis: prevention, assessment and adjustments. Chichester: Wiley; 2005. p. 241–59.

Koricheva J, Jennions MD, Lau J. Temporal trends in effect sizes: causes, detection and implications. In: Koricheva J, Gurevitch J, editors. Mengersen K, editors. Princeton: Princeton University Press; 2013. p. 237–54.

Lau J, Schmid CH, Chalmers TC. Cumulative meta-analysis of clinical trials builds evidence for exemplary medical care. J Clin Epidemiol. 1995;48(1):45–57. discussion 59–60.

Leimu R, Koricheva J. Cumulative meta-analysis: a new tool for detection of temporal trends and publication bias in ecology. Proc R Soc Lond B Biol Sci. 2004;271(1551):1961–6.

Murtaugh PA. Journal quality, effect size, and publication bias in meta-analysis. Ecology. 2002;83(4):1162–6.

Greenhouse JB, Iyengar S. Sensitivity analysis and diagnostics. In: Cooper H, Hedges L, Valentine J, editors. The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation; 2009. p. 417–34.

Lajeunesse MJ. Recovering missing or partial data from studies: a survey. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 195–206.

Nakagawa S, Freckleton RP. Missing inaction: the dangers of ignoring missing data. Trends Ecol Evol. 2008;23(11):592–6.

Ellington EH, Bastille-Rousseau G, Austin C, Landolt KN, Pond BA, Rees EE, Robar N, Murray DL. Using multiple imputation to estimate missing data in meta-regression. Methods Ecol Evol. 2015;6(2):153–63.

Gurevitch J, Nakagawa S. Research synthesis methods in ecology. In: Fox GA, Negrete-Yankelevich S, Sosa VJ, editors. Ecological statistics: contemporary theory and application. Oxford: Oxford University Press; 2015. p. 201–28.

Nakagawa S. Missing data: mechanisms, methods and messages. In: Fox GA, Negrete-Yankelevich S, Sosa VJ, editors. Ecological statistics. Oxford: Oxford University Press; 2015. p. 81–105.

Chapter   Google Scholar  

Ioannidis J, Patsopoulos N, Evangelou E. Uncertainty in heterogeneity estimates in meta-analyses. BMJ. 2007;335:914–6.

Jennions MD, Lorite CJ, Koricheva J. Using meta-analysis to test ecological and evolutionary theory. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 38–403.

Lajeunesse MJ. Power statistics for meta-analysis: tests for mean effects and homogeneity. In: Koricheva J, Gurevitch J, Mengersen K, editors. The handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 348–63.

Smith ML, Glass GV. Meta-analysis of psychotherapy outcome studies. Am Psychologist. 1977;32(9):752–60.

Article   CAS   Google Scholar  

Eysenck HJ. Exercise in mega-silliness. Am Psychologist. 1978;33(5):517.

Whittaker RJ. Meta-analyses and mega-mistakes: calling time on meta-analysis of the species richness-productivity relationship. Ecology. 2010;91(9):2522–33.

Whittaker RJ. In the dragon's den: a response to the meta-analysis forum contributions. Ecology. 2010;91(9):2568–71.

Ioannidis JP. Meta-research: the art of getting it wrong. Res Synth Methods. 2010;3:169–84.

Jackson D, Riley R, White IR. Multivariate meta-analysis: potential and promise. Stat Med. 2011;30(20):2481–98.

Salanti G, Schmid CH. Special issue on network meta-analysis: introduction from the editors. Res Synth Methods. 2012;3(2):69–70.

Download references

Acknowledgements

We are grateful for comments on our article from the members of I-DEEL. We also thank John Brookfield, one anonymous referee, and the BMC Biology editorial team for comments, which significantly improved our article. SN acknowledges an ARC (Australian Research Council) Future Fellowship (FT130100268), DWAN is supported by an ARC Discovery Early Career Research Award (DE150101774) and UNSW Vice Chancellors Fellowship. AMS is supported by a Judith and David Coffey Fellowship from the University of Sydney.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and affiliations.

Evolution & Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, 2052, Australia

Shinichi Nakagawa, Daniel W. A. Noble & Malgorzata Lagisz

Diabetes and Metabolism Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW, 2010, Australia

Shinichi Nakagawa

Charles Perkins Centre, University of Sydney, Sydney, NSW, 2006, Australia

Alistair M. Senior

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shinichi Nakagawa .

Additional information

All authors contributed equally to the preparation of this manuscript

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Nakagawa, S., Noble, D.W.A., Senior, A.M. et al. Meta-evaluation of meta-analysis: ten appraisal questions for biologists. BMC Biol 15 , 18 (2017). https://doi.org/10.1186/s12915-017-0357-7

Download citation

Published : 03 March 2017

DOI : https://doi.org/10.1186/s12915-017-0357-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Effect size
  • Biological importance
  • Non-independence
  • Meta-regression
  • Meta-research
  • Publication bias
  • Quantitative synthesis
  • Reporting bias
  • Statistical significance
  • Systematic review

BMC Biology

ISSN: 1741-7007

meta analysis research question example

  • Open access
  • Published: 01 August 2019

A step by step guide for conducting a systematic review and meta-analysis with simulation data

  • Gehad Mohamed Tawfik 1 , 2 ,
  • Kadek Agus Surya Dila 2 , 3 ,
  • Muawia Yousif Fadlelmola Mohamed 2 , 4 ,
  • Dao Ngoc Hien Tam 2 , 5 ,
  • Nguyen Dang Kien 2 , 6 ,
  • Ali Mahmoud Ahmed 2 , 7 &
  • Nguyen Tien Huy 8 , 9 , 10  

Tropical Medicine and Health volume  47 , Article number:  46 ( 2019 ) Cite this article

829k Accesses

300 Citations

94 Altmetric

Metrics details

The massive abundance of studies relating to tropical medicine and health has increased strikingly over the last few decades. In the field of tropical medicine and health, a well-conducted systematic review and meta-analysis (SR/MA) is considered a feasible solution for keeping clinicians abreast of current evidence-based medicine. Understanding of SR/MA steps is of paramount importance for its conduction. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, this methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly conduct a SR/MA, in which all the steps here depicts our experience and expertise combined with the already well-known and accepted international guidance.

We suggest that all steps of SR/MA should be done independently by 2–3 reviewers’ discussion, to ensure data quality and accuracy.

SR/MA steps include the development of research question, forming criteria, search strategy, searching databases, protocol registration, title, abstract, full-text screening, manual searching, extracting data, quality assessment, data checking, statistical analysis, double data checking, and manuscript writing.

Introduction

The amount of studies published in the biomedical literature, especially tropical medicine and health, has increased strikingly over the last few decades. This massive abundance of literature makes clinical medicine increasingly complex, and knowledge from various researches is often needed to inform a particular clinical decision. However, available studies are often heterogeneous with regard to their design, operational quality, and subjects under study and may handle the research question in a different way, which adds to the complexity of evidence and conclusion synthesis [ 1 ].

Systematic review and meta-analyses (SR/MAs) have a high level of evidence as represented by the evidence-based pyramid. Therefore, a well-conducted SR/MA is considered a feasible solution in keeping health clinicians ahead regarding contemporary evidence-based medicine.

Differing from a systematic review, unsystematic narrative review tends to be descriptive, in which the authors select frequently articles based on their point of view which leads to its poor quality. A systematic review, on the other hand, is defined as a review using a systematic method to summarize evidence on questions with a detailed and comprehensive plan of study. Furthermore, despite the increasing guidelines for effectively conducting a systematic review, we found that basic steps often start from framing question, then identifying relevant work which consists of criteria development and search for articles, appraise the quality of included studies, summarize the evidence, and interpret the results [ 2 , 3 ]. However, those simple steps are not easy to be reached in reality. There are many troubles that a researcher could be struggled with which has no detailed indication.

Conducting a SR/MA in tropical medicine and health may be difficult especially for young researchers; therefore, understanding of its essential steps is crucial. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, we recommend a flow diagram (Fig. 1 ) which illustrates a detailed and step-by-step the stages for SR/MA studies. This methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly and succinctly conduct a SR/MA; all the steps here depicts our experience and expertise combined with the already well known and accepted international guidance.

figure 1

Detailed flow diagram guideline for systematic review and meta-analysis steps. Note : Star icon refers to “2–3 reviewers screen independently”

Methods and results

Detailed steps for conducting any systematic review and meta-analysis.

We searched the methods reported in published SR/MA in tropical medicine and other healthcare fields besides the published guidelines like Cochrane guidelines {Higgins, 2011 #7} [ 4 ] to collect the best low-bias method for each step of SR/MA conduction steps. Furthermore, we used guidelines that we apply in studies for all SR/MA steps. We combined these methods in order to conclude and conduct a detailed flow diagram that shows the SR/MA steps how being conducted.

Any SR/MA must follow the widely accepted Preferred Reporting Items for Systematic Review and Meta-analysis statement (PRISMA checklist 2009) (Additional file 5 : Table S1) [ 5 ].

We proposed our methods according to a valid explanatory simulation example choosing the topic of “evaluating safety of Ebola vaccine,” as it is known that Ebola is a very rare tropical disease but fatal. All the explained methods feature the standards followed internationally, with our compiled experience in the conduct of SR beside it, which we think proved some validity. This is a SR under conduct by a couple of researchers teaming in a research group, moreover, as the outbreak of Ebola which took place (2013–2016) in Africa resulted in a significant mortality and morbidity. Furthermore, since there are many published and ongoing trials assessing the safety of Ebola vaccines, we thought this would provide a great opportunity to tackle this hotly debated issue. Moreover, Ebola started to fire again and new fatal outbreak appeared in the Democratic Republic of Congo since August 2018, which caused infection to more than 1000 people according to the World Health Organization, and 629 people have been killed till now. Hence, it is considered the second worst Ebola outbreak, after the first one in West Africa in 2014 , which infected more than 26,000 and killed about 11,300 people along outbreak course.

Research question and objectives

Like other study designs, the research question of SR/MA should be feasible, interesting, novel, ethical, and relevant. Therefore, a clear, logical, and well-defined research question should be formulated. Usually, two common tools are used: PICO or SPIDER. PICO (Population, Intervention, Comparison, Outcome) is used mostly in quantitative evidence synthesis. Authors demonstrated that PICO holds more sensitivity than the more specific SPIDER approach [ 6 ]. SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) was proposed as a method for qualitative and mixed methods search.

We here recommend a combined approach of using either one or both the SPIDER and PICO tools to retrieve a comprehensive search depending on time and resources limitations. When we apply this to our assumed research topic, being of qualitative nature, the use of SPIDER approach is more valid.

PICO is usually used for systematic review and meta-analysis of clinical trial study. For the observational study (without intervention or comparator), in many tropical and epidemiological questions, it is usually enough to use P (Patient) and O (outcome) only to formulate a research question. We must indicate clearly the population (P), then intervention (I) or exposure. Next, it is necessary to compare (C) the indicated intervention with other interventions, i.e., placebo. Finally, we need to clarify which are our relevant outcomes.

To facilitate comprehension, we choose the Ebola virus disease (EVD) as an example. Currently, the vaccine for EVD is being developed and under phase I, II, and III clinical trials; we want to know whether this vaccine is safe and can induce sufficient immunogenicity to the subjects.

An example of a research question for SR/MA based on PICO for this issue is as follows: How is the safety and immunogenicity of Ebola vaccine in human? (P: healthy subjects (human), I: vaccination, C: placebo, O: safety or adverse effects)

Preliminary research and idea validation

We recommend a preliminary search to identify relevant articles, ensure the validity of the proposed idea, avoid duplication of previously addressed questions, and assure that we have enough articles for conducting its analysis. Moreover, themes should focus on relevant and important health-care issues, consider global needs and values, reflect the current science, and be consistent with the adopted review methods. Gaining familiarity with a deep understanding of the study field through relevant videos and discussions is of paramount importance for better retrieval of results. If we ignore this step, our study could be canceled whenever we find out a similar study published before. This means we are wasting our time to deal with a problem that has been tackled for a long time.

To do this, we can start by doing a simple search in PubMed or Google Scholar with search terms Ebola AND vaccine. While doing this step, we identify a systematic review and meta-analysis of determinant factors influencing antibody response from vaccination of Ebola vaccine in non-human primate and human [ 7 ], which is a relevant paper to read to get a deeper insight and identify gaps for better formulation of our research question or purpose. We can still conduct systematic review and meta-analysis of Ebola vaccine because we evaluate safety as a different outcome and different population (only human).

Inclusion and exclusion criteria

Eligibility criteria are based on the PICO approach, study design, and date. Exclusion criteria mostly are unrelated, duplicated, unavailable full texts, or abstract-only papers. These exclusions should be stated in advance to refrain the researcher from bias. The inclusion criteria would be articles with the target patients, investigated interventions, or the comparison between two studied interventions. Briefly, it would be articles which contain information answering our research question. But the most important is that it should be clear and sufficient information, including positive or negative, to answer the question.

For the topic we have chosen, we can make inclusion criteria: (1) any clinical trial evaluating the safety of Ebola vaccine and (2) no restriction regarding country, patient age, race, gender, publication language, and date. Exclusion criteria are as follows: (1) study of Ebola vaccine in non-human subjects or in vitro studies; (2) study with data not reliably extracted, duplicate, or overlapping data; (3) abstract-only papers as preceding papers, conference, editorial, and author response theses and books; (4) articles without available full text available; and (5) case reports, case series, and systematic review studies. The PRISMA flow diagram template that is used in SR/MA studies can be found in Fig. 2 .

figure 2

PRISMA flow diagram of studies’ screening and selection

Search strategy

A standard search strategy is used in PubMed, then later it is modified according to each specific database to get the best relevant results. The basic search strategy is built based on the research question formulation (i.e., PICO or PICOS). Search strategies are constructed to include free-text terms (e.g., in the title and abstract) and any appropriate subject indexing (e.g., MeSH) expected to retrieve eligible studies, with the help of an expert in the review topic field or an information specialist. Additionally, we advise not to use terms for the Outcomes as their inclusion might hinder the database being searched to retrieve eligible studies because the used outcome is not mentioned obviously in the articles.

The improvement of the search term is made while doing a trial search and looking for another relevant term within each concept from retrieved papers. To search for a clinical trial, we can use these descriptors in PubMed: “clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH terms] OR “clinical trial”[All Fields]. After some rounds of trial and refinement of search term, we formulate the final search term for PubMed as follows: (ebola OR ebola virus OR ebola virus disease OR EVD) AND (vaccine OR vaccination OR vaccinated OR immunization) AND (“clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH Terms] OR “clinical trial”[All Fields]). Because the study for this topic is limited, we do not include outcome term (safety and immunogenicity) in the search term to capture more studies.

Search databases, import all results to a library, and exporting to an excel sheet

According to the AMSTAR guidelines, at least two databases have to be searched in the SR/MA [ 8 ], but as you increase the number of searched databases, you get much yield and more accurate and comprehensive results. The ordering of the databases depends mostly on the review questions; being in a study of clinical trials, you will rely mostly on Cochrane, mRCTs, or International Clinical Trials Registry Platform (ICTRP). Here, we propose 12 databases (PubMed, Scopus, Web of Science, EMBASE, GHL, VHL, Cochrane, Google Scholar, Clinical trials.gov , mRCTs, POPLINE, and SIGLE), which help to cover almost all published articles in tropical medicine and other health-related fields. Among those databases, POPLINE focuses on reproductive health. Researchers should consider to choose relevant database according to the research topic. Some databases do not support the use of Boolean or quotation; otherwise, there are some databases that have special searching way. Therefore, we need to modify the initial search terms for each database to get appreciated results; therefore, manipulation guides for each online database searches are presented in Additional file 5 : Table S2. The detailed search strategy for each database is found in Additional file 5 : Table S3. The search term that we created in PubMed needs customization based on a specific characteristic of the database. An example for Google Scholar advanced search for our topic is as follows:

With all of the words: ebola virus

With at least one of the words: vaccine vaccination vaccinated immunization

Where my words occur: in the title of the article

With all of the words: EVD

Finally, all records are collected into one Endnote library in order to delete duplicates and then to it export into an excel sheet. Using remove duplicating function with two options is mandatory. All references which have (1) the same title and author, and published in the same year, and (2) the same title and author, and published in the same journal, would be deleted. References remaining after this step should be exported to an excel file with essential information for screening. These could be the authors’ names, publication year, journal, DOI, URL link, and abstract.

Protocol writing and registration

Protocol registration at an early stage guarantees transparency in the research process and protects from duplication problems. Besides, it is considered a documented proof of team plan of action, research question, eligibility criteria, intervention/exposure, quality assessment, and pre-analysis plan. It is recommended that researchers send it to the principal investigator (PI) to revise it, then upload it to registry sites. There are many registry sites available for SR/MA like those proposed by Cochrane and Campbell collaborations; however, we recommend registering the protocol into PROSPERO as it is easier. The layout of a protocol template, according to PROSPERO, can be found in Additional file 5 : File S1.

Title and abstract screening

Decisions to select retrieved articles for further assessment are based on eligibility criteria, to minimize the chance of including non-relevant articles. According to the Cochrane guidance, two reviewers are a must to do this step, but as for beginners and junior researchers, this might be tiresome; thus, we propose based on our experience that at least three reviewers should work independently to reduce the chance of error, particularly in teams with a large number of authors to add more scrutiny and ensure proper conduct. Mostly, the quality with three reviewers would be better than two, as two only would have different opinions from each other, so they cannot decide, while the third opinion is crucial. And here are some examples of systematic reviews which we conducted following the same strategy (by a different group of researchers in our research group) and published successfully, and they feature relevant ideas to tropical medicine and disease [ 9 , 10 , 11 ].

In this step, duplications will be removed manually whenever the reviewers find them out. When there is a doubt about an article decision, the team should be inclusive rather than exclusive, until the main leader or PI makes a decision after discussion and consensus. All excluded records should be given exclusion reasons.

Full text downloading and screening

Many search engines provide links for free to access full-text articles. In case not found, we can search in some research websites as ResearchGate, which offer an option of direct full-text request from authors. Additionally, exploring archives of wanted journals, or contacting PI to purchase it if available. Similarly, 2–3 reviewers work independently to decide about included full texts according to eligibility criteria, with reporting exclusion reasons of articles. In case any disagreement has occurred, the final decision has to be made by discussion.

Manual search

One has to exhaust all possibilities to reduce bias by performing an explicit hand-searching for retrieval of reports that may have been dropped from first search [ 12 ]. We apply five methods to make manual searching: searching references from included studies/reviews, contacting authors and experts, and looking at related articles/cited articles in PubMed and Google Scholar.

We describe here three consecutive methods to increase and refine the yield of manual searching: firstly, searching reference lists of included articles; secondly, performing what is known as citation tracking in which the reviewers track all the articles that cite each one of the included articles, and this might involve electronic searching of databases; and thirdly, similar to the citation tracking, we follow all “related to” or “similar” articles. Each of the abovementioned methods can be performed by 2–3 independent reviewers, and all the possible relevant article must undergo further scrutiny against the inclusion criteria, after following the same records yielded from electronic databases, i.e., title/abstract and full-text screening.

We propose an independent reviewing by assigning each member of the teams a “tag” and a distinct method, to compile all the results at the end for comparison of differences and discussion and to maximize the retrieval and minimize the bias. Similarly, the number of included articles has to be stated before addition to the overall included records.

Data extraction and quality assessment

This step entitles data collection from included full-texts in a structured extraction excel sheet, which is previously pilot-tested for extraction using some random studies. We recommend extracting both adjusted and non-adjusted data because it gives the most allowed confounding factor to be used in the analysis by pooling them later [ 13 ]. The process of extraction should be executed by 2–3 independent reviewers. Mostly, the sheet is classified into the study and patient characteristics, outcomes, and quality assessment (QA) tool.

Data presented in graphs should be extracted by software tools such as Web plot digitizer [ 14 ]. Most of the equations that can be used in extraction prior to analysis and estimation of standard deviation (SD) from other variables is found inside Additional file 5 : File S2 with their references as Hozo et al. [ 15 ], Xiang et al. [ 16 ], and Rijkom et al. [ 17 ]. A variety of tools are available for the QA, depending on the design: ROB-2 Cochrane tool for randomized controlled trials [ 18 ] which is presented as Additional file 1 : Figure S1 and Additional file 2 : Figure S2—from a previous published article data—[ 19 ], NIH tool for observational and cross-sectional studies [ 20 ], ROBINS-I tool for non-randomize trials [ 21 ], QUADAS-2 tool for diagnostic studies, QUIPS tool for prognostic studies, CARE tool for case reports, and ToxRtool for in vivo and in vitro studies. We recommend that 2–3 reviewers independently assess the quality of the studies and add to the data extraction form before the inclusion into the analysis to reduce the risk of bias. In the NIH tool for observational studies—cohort and cross-sectional—as in this EBOLA case, to evaluate the risk of bias, reviewers should rate each of the 14 items into dichotomous variables: yes, no, or not applicable. An overall score is calculated by adding all the items scores as yes equals one, while no and NA equals zero. A score will be given for every paper to classify them as poor, fair, or good conducted studies, where a score from 0–5 was considered poor, 6–9 as fair, and 10–14 as good.

In the EBOLA case example above, authors can extract the following information: name of authors, country of patients, year of publication, study design (case report, cohort study, or clinical trial or RCT), sample size, the infected point of time after EBOLA infection, follow-up interval after vaccination time, efficacy, safety, adverse effects after vaccinations, and QA sheet (Additional file 6 : Data S1).

Data checking

Due to the expected human error and bias, we recommend a data checking step, in which every included article is compared with its counterpart in an extraction sheet by evidence photos, to detect mistakes in data. We advise assigning articles to 2–3 independent reviewers, ideally not the ones who performed the extraction of those articles. When resources are limited, each reviewer is assigned a different article than the one he extracted in the previous stage.

Statistical analysis

Investigators use different methods for combining and summarizing findings of included studies. Before analysis, there is an important step called cleaning of data in the extraction sheet, where the analyst organizes extraction sheet data in a form that can be read by analytical software. The analysis consists of 2 types namely qualitative and quantitative analysis. Qualitative analysis mostly describes data in SR studies, while quantitative analysis consists of two main types: MA and network meta-analysis (NMA). Subgroup, sensitivity, cumulative analyses, and meta-regression are appropriate for testing whether the results are consistent or not and investigating the effect of certain confounders on the outcome and finding the best predictors. Publication bias should be assessed to investigate the presence of missing studies which can affect the summary.

To illustrate basic meta-analysis, we provide an imaginary data for the research question about Ebola vaccine safety (in terms of adverse events, 14 days after injection) and immunogenicity (Ebola virus antibodies rise in geometric mean titer, 6 months after injection). Assuming that from searching and data extraction, we decided to do an analysis to evaluate Ebola vaccine “A” safety and immunogenicity. Other Ebola vaccines were not meta-analyzed because of the limited number of studies (instead, it will be included for narrative review). The imaginary data for vaccine safety meta-analysis can be accessed in Additional file 7 : Data S2. To do the meta-analysis, we can use free software, such as RevMan [ 22 ] or R package meta [ 23 ]. In this example, we will use the R package meta. The tutorial of meta package can be accessed through “General Package for Meta-Analysis” tutorial pdf [ 23 ]. The R codes and its guidance for meta-analysis done can be found in Additional file 5 : File S3.

For the analysis, we assume that the study is heterogenous in nature; therefore, we choose a random effect model. We did an analysis on the safety of Ebola vaccine A. From the data table, we can see some adverse events occurring after intramuscular injection of vaccine A to the subject of the study. Suppose that we include six studies that fulfill our inclusion criteria. We can do a meta-analysis for each of the adverse events extracted from the studies, for example, arthralgia, from the results of random effect meta-analysis using the R meta package.

From the results shown in Additional file 3 : Figure S3, we can see that the odds ratio (OR) of arthralgia is 1.06 (0.79; 1.42), p value = 0.71, which means that there is no association between the intramuscular injection of Ebola vaccine A and arthralgia, as the OR is almost one, and besides, the P value is insignificant as it is > 0.05.

In the meta-analysis, we can also visualize the results in a forest plot. It is shown in Fig. 3 an example of a forest plot from the simulated analysis.

figure 3

Random effect model forest plot for comparison of vaccine A versus placebo

From the forest plot, we can see six studies (A to F) and their respective OR (95% CI). The green box represents the effect size (in this case, OR) of each study. The bigger the box means the study weighted more (i.e., bigger sample size). The blue diamond shape represents the pooled OR of the six studies. We can see the blue diamond cross the vertical line OR = 1, which indicates no significance for the association as the diamond almost equalized in both sides. We can confirm this also from the 95% confidence interval that includes one and the p value > 0.05.

For heterogeneity, we see that I 2 = 0%, which means no heterogeneity is detected; the study is relatively homogenous (it is rare in the real study). To evaluate publication bias related to the meta-analysis of adverse events of arthralgia, we can use the metabias function from the R meta package (Additional file 4 : Figure S4) and visualization using a funnel plot. The results of publication bias are demonstrated in Fig. 4 . We see that the p value associated with this test is 0.74, indicating symmetry of the funnel plot. We can confirm it by looking at the funnel plot.

figure 4

Publication bias funnel plot for comparison of vaccine A versus placebo

Looking at the funnel plot, the number of studies at the left and right side of the funnel plot is the same; therefore, the plot is symmetry, indicating no publication bias detected.

Sensitivity analysis is a procedure used to discover how different values of an independent variable will influence the significance of a particular dependent variable by removing one study from MA. If all included study p values are < 0.05, hence, removing any study will not change the significant association. It is only performed when there is a significant association, so if the p value of MA done is 0.7—more than one—the sensitivity analysis is not needed for this case study example. If there are 2 studies with p value > 0.05, removing any of the two studies will result in a loss of the significance.

Double data checking

For more assurance on the quality of results, the analyzed data should be rechecked from full-text data by evidence photos, to allow an obvious check for the PI of the study.

Manuscript writing, revision, and submission to a journal

Writing based on four scientific sections: introduction, methods, results, and discussion, mostly with a conclusion. Performing a characteristic table for study and patient characteristics is a mandatory step which can be found as a template in Additional file 5 : Table S3.

After finishing the manuscript writing, characteristics table, and PRISMA flow diagram, the team should send it to the PI to revise it well and reply to his comments and, finally, choose a suitable journal for the manuscript which fits with considerable impact factor and fitting field. We need to pay attention by reading the author guidelines of journals before submitting the manuscript.

The role of evidence-based medicine in biomedical research is rapidly growing. SR/MAs are also increasing in the medical literature. This paper has sought to provide a comprehensive approach to enable reviewers to produce high-quality SR/MAs. We hope that readers could gain general knowledge about how to conduct a SR/MA and have the confidence to perform one, although this kind of study requires complex steps compared to narrative reviews.

Having the basic steps for conduction of MA, there are many advanced steps that are applied for certain specific purposes. One of these steps is meta-regression which is performed to investigate the association of any confounder and the results of the MA. Furthermore, there are other types rather than the standard MA like NMA and MA. In NMA, we investigate the difference between several comparisons when there were not enough data to enable standard meta-analysis. It uses both direct and indirect comparisons to conclude what is the best between the competitors. On the other hand, mega MA or MA of patients tend to summarize the results of independent studies by using its individual subject data. As a more detailed analysis can be done, it is useful in conducting repeated measure analysis and time-to-event analysis. Moreover, it can perform analysis of variance and multiple regression analysis; however, it requires homogenous dataset and it is time-consuming in conduct [ 24 ].

Conclusions

Systematic review/meta-analysis steps include development of research question and its validation, forming criteria, search strategy, searching databases, importing all results to a library and exporting to an excel sheet, protocol writing and registration, title and abstract screening, full-text screening, manual searching, extracting data and assessing its quality, data checking, conducting statistical analysis, double data checking, manuscript writing, revising, and submitting to a journal.

Availability of data and materials

Not applicable.

Abbreviations

Network meta-analysis

Principal investigator

Population, Intervention, Comparison, Outcome

Preferred Reporting Items for Systematic Review and Meta-analysis statement

Quality assessment

Sample, Phenomenon of Interest, Design, Evaluation, Research type

Systematic review and meta-analyses

Bello A, Wiebe N, Garg A, Tonelli M. Evidence-based decision-making 2: systematic reviews and meta-analysis. Methods Mol Biol (Clifton, NJ). 2015;1281:397–416.

Article   Google Scholar  

Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96(3):118–21.

Rys P, Wladysiuk M, Skrzekowska-Baran I, Malecki MT. Review articles, systematic reviews and meta-analyses: which can be trusted? Polskie Archiwum Medycyny Wewnetrznej. 2009;119(3):148–56.

PubMed   Google Scholar  

Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. 2011.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res. 2014;14:579.

Gross L, Lhomme E, Pasin C, Richert L, Thiebaut R. Ebola vaccine development: systematic review of pre-clinical and clinical studies, and meta-analysis of determinants of antibody response variability after vaccination. Int J Infect Dis. 2018;74:83–96.

Article   CAS   Google Scholar  

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, ... Henry DA. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.

Giang HTN, Banno K, Minh LHN, Trinh LT, Loc LT, Eltobgy A, et al. Dengue hemophagocytic syndrome: a systematic review and meta-analysis on epidemiology, clinical signs, outcomes, and risk factors. Rev Med Virol. 2018;28(6):e2005.

Morra ME, Altibi AMA, Iqtadar S, Minh LHN, Elawady SS, Hallab A, et al. Definitions for warning signs and signs of severe dengue according to the WHO 2009 classification: systematic review of literature. Rev Med Virol. 2018;28(4):e1979.

Morra ME, Van Thanh L, Kamel MG, Ghazy AA, Altibi AMA, Dat LM, et al. Clinical outcomes of current medical approaches for Middle East respiratory syndrome: a systematic review and meta-analysis. Rev Med Virol. 2018;28(3):e1977.

Vassar M, Atakpo P, Kash MJ. Manual search approaches used by systematic reviewers in dermatology. Journal of the Medical Library Association: JMLA. 2016;104(4):302.

Naunheim MR, Remenschneider AK, Scangas GA, Bunting GW, Deschler DG. The effect of initial tracheoesophageal voice prosthesis size on postoperative complications and voice outcomes. Ann Otol Rhinol Laryngol. 2016;125(6):478–84.

Rohatgi AJaiWa. Web Plot Digitizer. ht tp. 2014;2.

Hozo SP, Djulbegovic B, Hozo I. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med Res Methodol. 2005;5(1):13.

Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014;14(1):135.

Van Rijkom HM, Truin GJ, Van’t Hof MA. A meta-analysis of clinical studies on the caries-inhibiting effect of fluoride gel treatment. Carries Res. 1998;32(2):83–92.

Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.

Tawfik GM, Tieu TM, Ghozy S, Makram OM, Samuel P, Abdelaal A, et al. Speech efficacy, safety and factors affecting lifetime of voice prostheses in patients with laryngeal cancer: a systematic review and network meta-analysis of randomized controlled trials. J Clin Oncol. 2018;36(15_suppl):e18031-e.

Wannemuehler TJ, Lobo BC, Johnson JD, Deig CR, Ting JY, Gregory RL. Vibratory stimulus reduces in vitro biofilm formation on tracheoesophageal voice prostheses. Laryngoscope. 2016;126(12):2752–7.

Sterne JAC, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355.

RevMan The Cochrane Collaboration %J Copenhagen TNCCTCC. Review Manager (RevMan). 5.0. 2008.

Schwarzer GJRn. meta: An R package for meta-analysis. 2007;7(3):40-45.

Google Scholar  

Simms LLH. Meta-analysis versus mega-analysis: is there a difference? Oral budesonide for the maintenance of remission in Crohn’s disease: Faculty of Graduate Studies, University of Western Ontario; 1998.

Download references

Acknowledgements

This study was conducted (in part) at the Joint Usage/Research Center on Tropical Disease, Institute of Tropical Medicine, Nagasaki University, Japan.

Author information

Authors and affiliations.

Faculty of Medicine, Ain Shams University, Cairo, Egypt

Gehad Mohamed Tawfik

Online research Club http://www.onlineresearchclub.org/

Gehad Mohamed Tawfik, Kadek Agus Surya Dila, Muawia Yousif Fadlelmola Mohamed, Dao Ngoc Hien Tam, Nguyen Dang Kien & Ali Mahmoud Ahmed

Pratama Giri Emas Hospital, Singaraja-Amlapura street, Giri Emas village, Sawan subdistrict, Singaraja City, Buleleng, Bali, 81171, Indonesia

Kadek Agus Surya Dila

Faculty of Medicine, University of Khartoum, Khartoum, Sudan

Muawia Yousif Fadlelmola Mohamed

Nanogen Pharmaceutical Biotechnology Joint Stock Company, Ho Chi Minh City, Vietnam

Dao Ngoc Hien Tam

Department of Obstetrics and Gynecology, Thai Binh University of Medicine and Pharmacy, Thai Binh, Vietnam

Nguyen Dang Kien

Faculty of Medicine, Al-Azhar University, Cairo, Egypt

Ali Mahmoud Ahmed

Evidence Based Medicine Research Group & Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Nguyen Tien Huy

Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Department of Clinical Product Development, Institute of Tropical Medicine (NEKKEN), Leading Graduate School Program, and Graduate School of Biomedical Sciences, Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

NTH and GMT were responsible for the idea and its design. The figure was done by GMT. All authors contributed to the manuscript writing and approval of the final version.

Corresponding author

Correspondence to Nguyen Tien Huy .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Figure S1. Risk of bias assessment graph of included randomized controlled trials. (TIF 20 kb)

Additional file 2:

Figure S2. Risk of bias assessment summary. (TIF 69 kb)

Additional file 3:

Figure S3. Arthralgia results of random effect meta-analysis using R meta package. (TIF 20 kb)

Additional file 4:

Figure S4. Arthralgia linear regression test of funnel plot asymmetry using R meta package. (TIF 13 kb)

Additional file 5:

Table S1. PRISMA 2009 Checklist. Table S2. Manipulation guides for online database searches. Table S3. Detailed search strategy for twelve database searches. Table S4. Baseline characteristics of the patients in the included studies. File S1. PROSPERO protocol template file. File S2. Extraction equations that can be used prior to analysis to get missed variables. File S3. R codes and its guidance for meta-analysis done for comparison between EBOLA vaccine A and placebo. (DOCX 49 kb)

Additional file 6:

Data S1. Extraction and quality assessment data sheets for EBOLA case example. (XLSX 1368 kb)

Additional file 7:

Data S2. Imaginary data for EBOLA case example. (XLSX 10 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Tawfik, G.M., Dila, K.A.S., Mohamed, M.Y.F. et al. A step by step guide for conducting a systematic review and meta-analysis with simulation data. Trop Med Health 47 , 46 (2019). https://doi.org/10.1186/s41182-019-0165-6

Download citation

Received : 30 January 2019

Accepted : 24 May 2019

Published : 01 August 2019

DOI : https://doi.org/10.1186/s41182-019-0165-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Tropical Medicine and Health

ISSN: 1349-4147

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

meta analysis research question example

APS

Introduction to Meta-Analysis: A Guide for the Novice

  • Experimental Psychology
  • Methodology
  • Statistical Analysis

Free Meta-Analysis Software and Macros

MetaXL (Version 2.0)

RevMan (Version 5.3)

Meta-Analysis Macros for SAS, SPSS, and Stata

Opposing theories and disparate findings populate the field of psychology; scientists must interpret the results of any single study in the context of its limitations. Meta-analysis is a robust tool that can help researchers overcome these challenges by assimilating data across studies identified through a literature review. In other words, rather than surveying participants, a meta-analysis surveys studies. The goal is to calculate the direction and/or magnitude of an effect across all relevant studies, both published and unpublished. Despite the utility of this statistical technique, it can intimidate a beginner who has no formal training in the approach. However, any motivated researcher with a statistics background can complete a meta-analysis. This article provides an overview of the main steps of basic meta-analysis.

Meta-analysis has many strengths. First, meta-analysis provides an organized approach for handling a large number of studies. Second, the process is systematic and documented in great detail, which allows readers to evaluate the researchers’ decisions and conclusions. Third, meta-analysis allows researchers to examine an effect within a collection of studies in a more sophisticated manner than a qualitative summary.

However, meta-analysis also involves numerous challenges. First, it consumes a great deal of time and requires a great deal of effort. Second, meta-analysis has been criticized for aggregating studies that are too different (i.e., mixing “apples and oranges”). Third, some scientists argue that the objective coding procedure used in meta-analysis ignores the context of each individual study, such as its methodological rigor. Fourth, when a researcher includes low-quality studies in a meta-analysis, the limitations of these studies impact the mean effect size (i.e., “garbage in, garbage out”). As long as researchers are aware of these issues and consider the potential influence of these limitations on their findings, meta-analysis can serve as a powerful and informative approach to help us draw conclusions from a large literature base.

  Identifying the Right Question

Similar to any research study, a meta-analysis begins with a research question. Meta-analysis can be used in any situation where the goal is to summarize quantitative findings from empirical studies. It can be used to examine different types of effects, including prevalence rates (e.g., percentage of rape survivors with depression), growth rates (e.g., changes in depression from pretreatment to posttreatment), group differences (e.g., comparison of treatment and placebo groups on depression), and associations between variables (e.g., correlation between depression and self-esteem). To select the effect metric, researchers should consider the statistical form of the results in the literature. Any given meta-analysis can focus on only one metric at a time. While selecting a research question, researchers should think about the size of the literature base and select a manageable topic. At the same time, they should make sure the number of existing studies is large enough to warrant a meta-analysis.

Determining Eligibility Criteria

After choosing a relevant question, researchers should then identify and explicitly state the types of studies to be included. These criteria ensure that the studies overlap enough in topic and methodology that it makes sense to combine them. The inclusion and exclusion criteria depend on the specific research question and characteristics of the literature. First, researchers can specify relevant participant characteristics, such as age or gender. Second, researchers can identify the key variables that must be included in the study. Third, the language, date range, and types (e.g., peer-reviewed journal articles) of studies should be specified. Fourth, pertinent study characteristics, such as experimental design, can be defined. Eligibility criteria should be clearly documented and relevant to the research question. Specifying the eligibility criteria prior to conducting the literature search allows the researcher to perform a more targeted search and reduces the number of irrelevant studies. Eligibility criteria can also be revised later, because the researcher may become aware of unforeseen issues during the literature search stage.

Conducting a Literature Search and Review

The next step is to identify, retrieve, and review published and unpublished studies. The goal is to be exhaustive; however, being too broad can result in an overwhelming number of studies to review.

Online databases, such as PsycINFO and PubMed, compile millions of searchable records, including peer-reviewed journals, books, and dissertations.  In addition, through these electronic databases, researchers can access the full text of many of the records. It is important that researchers carefully choose search terms and databases, because these decisions impact the breadth of the review. Researchers who aren’t familiar with the research topic should consult with an expert.

Additional ways to identify studies include searching conference proceedings, examining reference lists of relevant studies, and directly contacting researchers. After the literature search is completed, researchers must evaluate each study for inclusion using the eligibility criteria. At least a subset of the studies should be reviewed by two individuals (i.e., double coded) to serve as a reliability check. It is vital that researchers keep meticulous records of this process; for publication, a flow diagram is typically required to depict the search and results. Researchers should allow adequate time, because this step can be quite time consuming.

Calculating Effect Size

Next, researchers calculate an effect size for each eligible study. The effect size is the key component of a meta-analysis because it encodes the results in a numeric value that can then be aggregated. Examples of commonly used effect size metrics include Cohen’s d (i.e., group differences) and Pearson’s r (i.e., association between two variables). The effect size metric is based on the statistical form of the results in the literature and the research question. Because studies that include more participants provide more accurate estimates of an effect than those that include fewer participants, it is important to also calculate the precision of the effect size (e.g., standard error).

Meta-analysis software guides researchers through the calculation process by requesting the necessary information for the specified effect size metric. I have identified some potentially useful resources and programs below. Although meta-analysis software makes effect size calculations simple, it is good practice for researchers to understand what computations are being used.

The effect size and precision of each individual study are aggregated into a summary statistic, which can be done with meta-analysis software. Researchers should confirm that the effect sizes are independent of each other (i.e., no overlap in participants). Additionally, researchers must select either a fixed effects model (i.e., assumes all studies share one true effect size) or a random effects model (i.e., assumes the true effect size varies among studies). The random effects model is typically preferred when the studies have been conducted using different methodologies. Depending on the software, additional specifications or adjustments may be possible.

During analysis, the effect sizes of the included studies are weighted by their precision (e.g., inverse of the sampling error variance) and the mean is calculated. The mean effect size represents the direction and/or magnitude of the effect summarized across all eligible studies. This statistic is typically accompanied by an estimate of its precision (e.g., confidence interval) and p -value representing statistical significance. Forest plots are a common way of displaying meta-analysis results.

Depending on the situation, follow-up analyses may be advised. Researchers can quantify heterogeneity (e.g., Q, t 2 , I 2 ), which is a measure of the variation among the effect sizes of included studies. Moderator variables, such as the quality of the studies or age of participants, may be included to examine sources of heterogeneity. Because published studies may be biased towards significant effects, it is important to evaluate the impact of publication bias (e.g., funnel plot, Rosenthal’s Fail-safe N ). Sensitivity analysis can indicate how the results of the meta-analysis would change if one study were excluded from the analysis.

If properly conducted and clearly documented, meta-analyses often make significant contributions to a specific field of study and therefore stand a good chance of being published in a top-tier journal. The biggest obstacle for most researchers who attempt meta-analysis for the first time is the amount of work and organization required for proper execution, rather than their level of statistical knowledge.

Recommended Resources

Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2009). Introduction to meta-analysis . Hoboken, NJ: Wiley.

Cooper, H., Hedges, L., & Valentine, J. (2009). The handbook of research synthesis and meta-analysis (2nd ed.). New York, NY: Russell Sage Foundation.

Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis . Thousand Oaks, California: Sage Publications.

Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment, and adjustments . Hoboken, NJ: Wiley.

meta analysis research question example

It is nice to see the software we developed (MetaXL) being mentioned. However, the reason we developed the software and made publicly available for free is that we disagree with an important statement in the review. This statement is “researchers must select either a fixed effects model (i.e., assumes all studies share one true effect size) or a random effects model (i.e., assumes the true effect size varies among studies)”. We developed MetaXL because we think that the random effects model is seriously flawed and should be abandoned. We implemented in MetaXL two additional models, the Inverse Variance heterogeneity model and the Quality Effects model, both meant to be used in case of heterogeneity. More details are in the User Guide, available from the Epigear website.

meta analysis research question example

Thank you very much! The article really helped me to start understanding what meta-analysis is about

meta analysis research question example

thank you for sharing this article; it is very helpful.But I am still confused about how to remove quickly duplicates papers without wasting time if we more than 10 000 papers?

meta analysis research question example

Not being one to blow my own horn all the time, but I would like to suggest that you may want to take a look at a web based application I wrote that conducts a Hunter-Schmidt type meta-analysis. The Meta-Analysis is very easy to use and corrects for sampling and error variance due to reliability. It also exports the results in excel format. You can also export the dataset effect sizes (r, d, and z), sample sizes and reliability information in excel as well.

http://www.lyonsmorris.com/lyons/MaCalc/index.cfm

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

About the Author

Laura C. Wilson is an Assistant Professor in the Psychology Department at the University of Mary Washington. She earned a PhD in Clinical Psychology from Virginia Tech and MA in General/Experimental Psychology from The College of William & Mary. Her main area of expertise is post-trauma functioning, particularly in survivors of sexual violence or mass trauma (e.g., terrorism, mass shootings, combat). She also has interest in predictors of violence and aggression, including psychophysiological and personality factors.

meta analysis research question example

Careers Up Close: Joel Anderson on Gender and Sexual Prejudices, the Freedoms of Academic Research, and the Importance of Collaboration

Joel Anderson, a senior research fellow at both Australian Catholic University and La Trobe University, researches group processes, with a specific interest on prejudice, stigma, and stereotypes.

meta analysis research question example

Experimental Methods Are Not Neutral Tools

Ana Sofia Morais and Ralph Hertwig explain how experimental psychologists have painted too negative a picture of human rationality, and how their pessimism is rooted in a seemingly mundane detail: methodological choices. 

APS Fellows Elected to SEP

In addition, an APS Rising Star receives the society’s Early Investigator Award.

Privacy Overview

CookieDurationDescription
__cf_bm30 minutesThis cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
CookieDurationDescription
AWSELBCORS5 minutesThis cookie is used by Elastic Load Balancing from Amazon Web Services to effectively balance load on the servers.
CookieDurationDescription
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 27 daysSet by addthis.com to determine the usage of addthis.com service.
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_11 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CookieDurationDescription
loc1 year 27 daysAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextIdneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requestsneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Study Design 101: Meta-Analysis

  • Case Report
  • Case Control Study
  • Cohort Study
  • Randomized Controlled Trial
  • Practice Guideline
  • Systematic Review

Meta-Analysis

  • Helpful Formulas
  • Finding Specific Study Types

A subset of systematic reviews; a method for systematically combining pertinent qualitative and quantitative study data from several selected studies to develop a single conclusion that has greater statistical power. This conclusion is statistically stronger than the analysis of any single study, due to increased numbers of subjects, greater diversity among subjects, or accumulated effects and results.

Meta-analysis would be used for the following purposes:

  • To establish statistical significance with studies that have conflicting results
  • To develop a more correct estimate of effect magnitude
  • To provide a more complex analysis of harms, safety data, and benefits
  • To examine subgroups with individual numbers that are not statistically significant

If the individual studies utilized randomized controlled trials (RCT), combining several selected RCT results would be the highest-level of evidence on the evidence hierarchy, followed by systematic reviews, which analyze all available studies on a topic.

  • Greater statistical power
  • Confirmatory data analysis
  • Greater ability to extrapolate to general population affected
  • Considered an evidence-based resource

Disadvantages

  • Difficult and time consuming to identify appropriate studies
  • Not all studies provide adequate data for inclusion and analysis
  • Requires advanced statistical techniques
  • Heterogeneity of study populations

Design pitfalls to look out for

The studies pooled for review should be similar in type (i.e. all randomized controlled trials).

Are the studies being reviewed all the same type of study or are they a mixture of different types?

The analysis should include published and unpublished results to avoid publication bias.

Does the meta-analysis include any appropriate relevant studies that may have had negative outcomes?

Fictitious Example

Do individuals who wear sunscreen have fewer cases of melanoma than those who do not wear sunscreen? A MEDLINE search was conducted using the terms melanoma, sunscreening agents, and zinc oxide, resulting in 8 randomized controlled studies, each with between 100 and 120 subjects. All of the studies showed a positive effect between wearing sunscreen and reducing the likelihood of melanoma. The subjects from all eight studies (total: 860 subjects) were pooled and statistically analyzed to determine the effect of the relationship between wearing sunscreen and melanoma. This meta-analysis showed a 50% reduction in melanoma diagnosis among sunscreen-wearers.

Real-life Examples

Goyal, A., Elminawy, M., Kerezoudis, P., Lu, V., Yolcu, Y., Alvi, M., & Bydon, M. (2019). Impact of obesity on outcomes following lumbar spine surgery: A systematic review and meta-analysis. Clinical Neurology and Neurosurgery, 177 , 27-36. https://doi.org/10.1016/j.clineuro.2018.12.012

This meta-analysis was interested in determining whether obesity affects the outcome of spinal surgery. Some previous studies have shown higher perioperative morbidity in patients with obesity while other studies have not shown this effect. This study looked at surgical outcomes including "blood loss, operative time, length of stay, complication and reoperation rates and functional outcomes" between patients with and without obesity. A meta-analysis of 32 studies (23,415 patients) was conducted. There were no significant differences for patients undergoing minimally invasive surgery, but patients with obesity who had open surgery had experienced higher blood loss and longer operative times (not clinically meaningful) as well as higher complication and reoperation rates. Further research is needed to explore this issue in patients with morbid obesity.

Nakamura, A., van Der Waerden, J., Melchior, M., Bolze, C., El-Khoury, F., & Pryor, L. (2019). Physical activity during pregnancy and postpartum depression: Systematic review and meta-analysis. Journal of Affective Disorders, 246 , 29-41. https://doi.org/10.1016/j.jad.2018.12.009

This meta-analysis explored whether physical activity during pregnancy prevents postpartum depression. Seventeen studies were included (93,676 women) and analysis showed a "significant reduction in postpartum depression scores in women who were physically active during their pregnancies when compared with inactive women." Possible limitations or moderators of this effect include intensity and frequency of physical activity, type of physical activity, and timepoint in pregnancy (e.g. trimester).

Related Terms

A document often written by a panel that provides a comprehensive review of all relevant studies on a particular clinical or health-related topic/question.

Publication Bias

A phenomenon in which studies with positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Therefore, conclusions based exclusively on published studies can be misleading.

Now test yourself!

1. A Meta-Analysis pools together the sample populations from different studies, such as Randomized Controlled Trials, into one statistical analysis and treats them as one large sample population with one conclusion.

a) True b) False

2. One potential design pitfall of Meta-Analyses that is important to pay attention to is:

a) Whether it is evidence-based. b) If the authors combined studies with conflicting results. c) If the authors appropriately combined studies so they did not compare apples and oranges. d) If the authors used only quantitative data.

Evidence Pyramid - Navigation

  • Meta- Analysis
  • Case Reports
  • << Previous: Systematic Review
  • Next: Helpful Formulas >>

Creative Commons License

  • Last Updated: Sep 25, 2023 10:59 AM
  • URL: https://guides.himmelfarb.gwu.edu/studydesign101

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2962
  • [email protected]
  • https://himmelfarb.gwu.edu

Jump to navigation

Home

Cochrane Training

Chapter 10: analysing data and undertaking meta-analyses.

Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Key Points:

  • Meta-analysis is the statistical combination of results from two or more separate studies.
  • Potential advantages of meta-analyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims. However, they also have the potential to mislead seriously, particularly if specific study designs, within-study biases, variation across studies, and reporting biases are not carefully considered.
  • It is important to be familiar with the type of data (e.g. dichotomous, continuous) that result from measurement of an outcome in an individual study, and to choose suitable effect measures for comparing intervention groups.
  • Most meta-analysis methods are variations on a weighted average of the effect estimates from the different studies.
  • Studies with no events contribute no information about the risk ratio or odds ratio. For rare events, the Peto method has been observed to be less biased and more powerful than other methods.
  • Variation across studies (heterogeneity) must be considered, although most Cochrane Reviews do not have enough studies to allow for the reliable investigation of its causes. Random-effects meta-analyses allow for heterogeneity by assuming that underlying effects follow a normal distribution, but they must be interpreted carefully. Prediction intervals from random-effects meta-analyses are a useful device for presenting the extent of between-study variation.
  • Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.

Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August  2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

10.1 Do not start here!

It can be tempting to jump prematurely into a statistical analysis when undertaking a systematic review. The production of a diamond at the bottom of a plot is an exciting moment for many authors, but results of meta-analyses can be very misleading if suitable attention has not been given to formulating the review question; specifying eligibility criteria; identifying and selecting studies; collecting appropriate data; considering risk of bias; planning intervention comparisons; and deciding what data would be meaningful to analyse. Review authors should consult the chapters that precede this one before a meta-analysis is undertaken.

10.2 Introduction to meta-analysis

An important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention. Potential advantages of meta-analyses include the following:

  • T o improve precision . Many studies are too small to provide convincing evidence about intervention effects in isolation. Estimation is usually improved when it is based on more information.
  • To answer questions not posed by the individual studies . Primary studies often involve a specific type of participant and explicitly defined interventions. A selection of studies in which these characteristics differ can allow investigation of the consistency of effect across a wider range of populations and interventions. It may also, if relevant, allow reasons for differences in effect estimates to be investigated.
  • To settle controversies arising from apparently conflicting studies or to generate new hypotheses . Statistical synthesis of findings allows the degree of conflict to be formally assessed, and reasons for different results to be explored and quantified.

Of course, the use of statistical synthesis methods does not guarantee that the results of a review are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods can be misused.

This chapter describes the principles and methods used to carry out a meta-analysis for a comparison of two interventions for the main types of data encountered. The use of network meta-analysis to compare more than two interventions is addressed in Chapter 11 . Formulae for most of the methods described are provided in the RevMan Web Knowledge Base under Statistical Algorithms and calculations used in Review Manager (documentation.cochrane.org/revman-kb/statistical-methods-210600101.html), and a longer discussion of many of the issues is available ( Deeks et al 2001 ).

10.2.1 Principles of meta-analysis

The commonly used methods for meta-analysis follow the following basic principles:

  • Meta-analysis is typically a two-stage process. In the first stage, a summary statistic is calculated for each study, to describe the observed intervention effect in the same way for every study. For example, the summary statistic may be a risk ratio if the data are dichotomous, or a difference between means if the data are continuous (see Chapter 6 ).

meta analysis research question example

  • The combination of intervention effect estimates across studies may optionally incorporate an assumption that the studies are not all estimating the same intervention effect, but estimate intervention effects that follow a distribution across studies. This is the basis of a random-effects meta-analysis (see Section 10.10.4 ). Alternatively, if it is assumed that each study is estimating exactly the same quantity, then a fixed-effect meta-analysis is performed.
  • The standard error of the summary intervention effect can be used to derive a confidence interval, which communicates the precision (or uncertainty) of the summary estimate; and to derive a P value, which communicates the strength of the evidence against the null hypothesis of no intervention effect.
  • As well as yielding a summary quantification of the intervention effect, all methods of meta-analysis can incorporate an assessment of whether the variation among the results of the separate studies is compatible with random variation, or whether it is large enough to indicate inconsistency of intervention effects across studies (see Section 10.10 ).
  • The problem of missing data is one of the numerous practical considerations that must be thought through when undertaking a meta-analysis. In particular, review authors should consider the implications of missing outcome data from individual participants (due to losses to follow-up or exclusions from analysis) (see Section 10.12 ).

Meta-analyses are usually illustrated using a forest plot . An example appears in Figure 10.2.a . A forest plot displays effect estimates and confidence intervals for both individual studies and meta-analyses (Lewis and Clarke 2001). Each study is represented by a block at the point estimate of intervention effect with a horizontal line extending either side of the block. The area of the block indicates the weight assigned to that study in the meta-analysis while the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The area of the block and the confidence interval convey similar information, but both make different contributions to the graphic. The confidence interval depicts the range of intervention effects compatible with the study’s result. The size of the block draws the eye towards the studies with larger weight (usually those with narrower confidence intervals), which dominate the calculation of the summary result, presented as a diamond at the bottom.

Figure 10.2.a Example of a forest plot from a review of interventions to promote ownership of smoke alarms (DiGuiseppi and Higgins 2001). Reproduced with permission of John Wiley & Sons

meta analysis research question example

10.3 A generic inverse-variance approach to meta-analysis

A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse-variance method . This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many meta-analyses of both dichotomous and continuous data.

The inverse-variance method is so named because the weight given to each study is chosen to be the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error). Thus, larger studies, which have smaller standard errors, are given more weight than smaller studies, which have larger standard errors. This choice of weights minimizes the imprecision (uncertainty) of the pooled effect estimate.

10.3.1 Fixed-effect method for meta-analysis

A fixed-effect meta-analysis using the inverse-variance method calculates a weighted average as:

meta analysis research question example

where Y i is the intervention effect estimated in the i th study, SE i is the standard error of that estimate, and the summation is across all studies. The basic data required for the analysis are therefore an estimate of the intervention effect and its standard error from each study. A fixed-effect meta-analysis is valid under an assumption that all effect estimates are estimating the same underlying intervention effect, which is referred to variously as a ‘fixed-effect’ assumption, a ‘common-effect’ assumption or an ‘equal-effects’ assumption. However, the result of the meta-analysis can be interpreted without making such an assumption (Rice et al 2018).

10.3.2 Random-effects methods for meta-analysis

A variation on the inverse-variance method is to incorporate an assumption that the different studies are estimating different, yet related, intervention effects (Higgins et al 2009). This produces a random-effects meta-analysis, and the simplest version is known as the DerSimonian and Laird method (DerSimonian and Laird 1986). Random-effects meta-analysis is discussed in detail in Section 10.10.4 .

10.3.3 Performing inverse-variance meta-analyses

Most meta-analysis programs perform inverse-variance meta-analyses. Usually the user provides summary data from each intervention arm of each study, such as a 2×2 table when the outcome is dichotomous (see Chapter 6, Section 6.4 ), or means, standard deviations and sample sizes for each group when the outcome is continuous (see Chapter 6, Section 6.5 ). This avoids the need for the author to calculate effect estimates, and allows the use of methods targeted specifically at different types of data (see Sections 10.4 and 10.5 ).

When the data are conveniently available as summary statistics from each intervention group, the inverse-variance method can be implemented directly. For example, estimates and their standard errors may be entered directly into RevMan under the ‘Generic inverse variance’ outcome type. For ratio measures of intervention effect, the data must be entered into RevMan as natural logarithms (for example, as a log odds ratio and the standard error of the log odds ratio). However, it is straightforward to instruct the software to display results on the original (e.g. odds ratio) scale. It is possible to supplement or replace this with a column providing the sample sizes in the two groups. Note that the ability to enter estimates and standard errors creates a high degree of flexibility in meta-analysis. It facilitates the analysis of properly analysed crossover trials, cluster-randomized trials and non-randomized trials (see Chapter 23 ), as well as outcome data that are ordinal, time-to-event or rates (see Chapter 6 ).

10.4 Meta-analysis of dichotomous outcomes

There are four widely used methods of meta-analysis for dichotomous outcomes, three fixed-effect methods (Mantel-Haenszel, Peto and inverse variance) and one random-effects method (DerSimonian and Laird inverse variance). All of these methods are available as analysis options in RevMan. The Peto method can only combine odds ratios, whilst the other three methods can combine odds ratios, risk ratios or risk differences. Formulae for all of the meta-analysis methods are available elsewhere (Deeks et al 2001).

Note that having no events in one group (sometimes referred to as ‘zero cells’) causes problems with computation of estimates and standard errors with some methods: see Section 10.4.4 .

10.4.1 Mantel-Haenszel methods

When data are sparse, either in terms of event risks being low or study size being small, the estimates of the standard errors of the effect estimates that are used in the inverse-variance methods may be poor. Mantel-Haenszel methods are fixed-effect meta-analysis methods using a different weighting scheme that depends on which effect measure (e.g. risk ratio, odds ratio, risk difference) is being used (Mantel and Haenszel 1959, Greenland and Robins 1985). They have been shown to have better statistical properties when there are few events. As this is a common situation in Cochrane Reviews, the Mantel-Haenszel method is generally preferable to the inverse variance method in fixed-effect meta-analyses. In other situations the two methods give similar estimates.

10.4.2 Peto odds ratio method

Peto’s method can only be used to combine odds ratios (Yusuf et al 1985). It uses an inverse-variance approach, but uses an approximate method of estimating the log odds ratio, and uses different weights. An alternative way of viewing the Peto method is as a sum of ‘O – E’ statistics. Here, O is the observed number of events and E is an expected number of events in the experimental intervention group of each study under the null hypothesis of no intervention effect.

The approximation used in the computation of the log odds ratio works well when intervention effects are small (odds ratios are close to 1), events are not particularly common and the studies have similar numbers in experimental and comparator groups. In other situations it has been shown to give biased answers. As these criteria are not always fulfilled, Peto’s method is not recommended as a default approach for meta-analysis.

Corrections for zero cell counts are not necessary when using Peto’s method. Perhaps for this reason, this method performs well when events are very rare (Bradburn et al 2007); see Section 10.4.4.1 . Also, Peto’s method can be used to combine studies with dichotomous outcome data with studies using time-to-event analyses where log-rank tests have been used (see Section 10.9 ).

10.4.3 Which effect measure for dichotomous outcomes?

Effect measures for dichotomous data are described in Chapter 6, Section 6.4.1 . The effect of an intervention can be expressed as either a relative or an absolute effect. The risk ratio (relative risk) and odds ratio are relative measures, while the risk difference and number needed to treat for an additional beneficial outcome are absolute measures. A further complication is that there are, in fact, two risk ratios. We can calculate the risk ratio of an event occurring or the risk ratio of no event occurring. These give different summary results in a meta-analysis, sometimes dramatically so.

The selection of a summary statistic for use in meta-analysis depends on balancing three criteria (Deeks 2002). First, we desire a summary statistic that gives values that are similar for all the studies in the meta-analysis and subdivisions of the population to which the interventions will be applied. The more consistent the summary statistic, the greater is the justification for expressing the intervention effect as a single summary number. Second, the summary statistic must have the mathematical properties required to perform a valid meta-analysis. Third, the summary statistic would ideally be easily understood and applied by those using the review. The summary intervention effect should be presented in a way that helps readers to interpret and apply the results appropriately. Among effect measures for dichotomous data, no single measure is uniformly best, so the choice inevitably involves a compromise.

Consistency Empirical evidence suggests that relative effect measures are, on average, more consistent than absolute measures (Engels et al 2000, Deeks 2002, Rücker et al 2009). For this reason, it is wise to avoid performing meta-analyses of risk differences, unless there is a clear reason to suspect that risk differences will be consistent in a particular clinical situation. On average there is little difference between the odds ratio and risk ratio in terms of consistency (Deeks 2002). When the study aims to reduce the incidence of an adverse event, there is empirical evidence that risk ratios of the adverse event are more consistent than risk ratios of the non-event (Deeks 2002). Selecting an effect measure based on what is the most consistent in a particular situation is not a generally recommended strategy, since it may lead to a selection that spuriously maximizes the precision of a meta-analysis estimate.

Mathematical properties The most important mathematical criterion is the availability of a reliable variance estimate. The number needed to treat for an additional beneficial outcome does not have a simple variance estimator and cannot easily be used directly in meta-analysis, although it can be computed from the meta-analysis result afterwards (see Chapter 15, Section 15.4.2 ). There is no consensus regarding the importance of two other often-cited mathematical properties: the fact that the behaviour of the odds ratio and the risk difference do not rely on which of the two outcome states is coded as the event, and the odds ratio being the only statistic which is unbounded (see Chapter 6, Section 6.4.1 ).

Ease of interpretation The odds ratio is the hardest summary statistic to understand and to apply in practice, and many practising clinicians report difficulties in using them. There are many published examples where authors have misinterpreted odds ratios from meta-analyses as risk ratios. Although odds ratios can be re-expressed for interpretation (as discussed here), there must be some concern that routine presentation of the results of systematic reviews as odds ratios will lead to frequent over-estimation of the benefits and harms of interventions when the results are applied in clinical practice. Absolute measures of effect are thought to be more easily interpreted by clinicians than relative effects (Sinclair and Bracken 1994), and allow trade-offs to be made between likely benefits and likely harms of interventions. However, they are less likely to be generalizable.

It is generally recommended that meta-analyses are undertaken using risk ratios (taking care to make a sensible choice over which category of outcome is classified as the event) or odds ratios. This is because it seems important to avoid using summary statistics for which there is empirical evidence that they are unlikely to give consistent estimates of intervention effects (the risk difference), and it is impossible to use statistics for which meta-analysis cannot be performed (the number needed to treat for an additional beneficial outcome). It may be wise to plan to undertake a sensitivity analysis to investigate whether choice of summary statistic (and selection of the event category) is critical to the conclusions of the meta-analysis (see Section 10.14 ).

It is often sensible to use one statistic for meta-analysis and to re-express the results using a second, more easily interpretable statistic. For example, often meta-analysis may be best performed using relative effect measures (risk ratios or odds ratios) and the results re-expressed using absolute effect measures (risk differences or numbers needed to treat for an additional beneficial outcome – see Chapter 15, Section 15.4 . This is one of the key motivations for ‘Summary of findings’ tables in Cochrane Reviews: see Chapter 14 ). If odds ratios are used for meta-analysis they can also be re-expressed as risk ratios (see Chapter 15, Section 15.4 ). In all cases the same formulae can be used to convert upper and lower confidence limits. However, all of these transformations require specification of a value of baseline risk that indicates the likely risk of the outcome in the ‘control’ population to which the experimental intervention will be applied. Where the chosen value for this assumed comparator group risk is close to the typical observed comparator group risks across the studies, similar estimates of absolute effect will be obtained regardless of whether odds ratios or risk ratios are used for meta-analysis. Where the assumed comparator risk differs from the typical observed comparator group risk, the predictions of absolute benefit will differ according to which summary statistic was used for meta-analysis.

10.4.4 Meta-analysis of rare events

For rare outcomes, meta-analysis may be the only way to obtain reliable evidence of the effects of healthcare interventions. Individual studies are usually under-powered to detect differences in rare outcomes, but a meta-analysis of many studies may have adequate power to investigate whether interventions do have an impact on the incidence of the rare event. However, many methods of meta-analysis are based on large sample approximations, and are unsuitable when events are rare. Thus authors must take care when selecting a method of meta-analysis (Efthimiou 2018).

There is no single risk at which events are classified as ‘rare’. Certainly risks of 1 in 1000 constitute rare events, and many would classify risks of 1 in 100 the same way. However, the performance of methods when risks are as high as 1 in 10 may also be affected by the issues discussed in this section. What is typical is that a high proportion of the studies in the meta-analysis observe no events in one or more study arms.

10.4.4.1 Studies with no events in one or more arms

Computational problems can occur when no events are observed in one or both groups in an individual study. Inverse variance meta-analytical methods involve computing an intervention effect estimate and its standard error for each study. For studies where no events were observed in one or both arms, these computations often involve dividing by a zero count, which yields a computational error. Most meta-analytical software routines (including those in RevMan) automatically check for problematic zero counts, and add a fixed value (typically 0.5) to all cells of a 2×2 table where the problems occur. The Mantel-Haenszel methods require zero-cell corrections only if the same cell is zero in all the included studies, and hence need to use the correction less often. However, in many software applications the same correction rules are applied for Mantel-Haenszel methods as for the inverse-variance methods. Odds ratio and risk ratio methods require zero cell corrections more often than difference methods, except for the Peto odds ratio method, which encounters computation problems only in the extreme situation of no events occurring in all arms of all studies.

Whilst the fixed correction meets the objective of avoiding computational errors, it usually has the undesirable effect of biasing study estimates towards no difference and over-estimating variances of study estimates (consequently down-weighting inappropriately their contribution to the meta-analysis). Where the sizes of the study arms are unequal (which occurs more commonly in non-randomized studies than randomized trials), they will introduce a directional bias in the treatment effect. Alternative non-fixed zero-cell corrections have been explored by Sweeting and colleagues, including a correction proportional to the reciprocal of the size of the contrasting study arm, which they found preferable to the fixed 0.5 correction when arm sizes were not balanced (Sweeting et al 2004).

10.4.4.2 Studies with no events in either arm

The standard practice in meta-analysis of odds ratios and risk ratios is to exclude studies from the meta-analysis where there are no events in both arms. This is because such studies do not provide any indication of either the direction or magnitude of the relative treatment effect. Whilst it may be clear that events are very rare on both the experimental intervention and the comparator intervention, no information is provided as to which group is likely to have the higher risk, or on whether the risks are of the same or different orders of magnitude (when risks are very low, they are compatible with very large or very small ratios). Whilst one might be tempted to infer that the risk would be lowest in the group with the larger sample size (as the upper limit of the confidence interval would be lower), this is not justified as the sample size allocation was determined by the study investigators and is not a measure of the incidence of the event.

Risk difference methods superficially appear to have an advantage over odds ratio methods in that the risk difference is defined (as zero) when no events occur in either arm. Such studies are therefore included in the estimation process. Bradburn and colleagues undertook simulation studies which revealed that all risk difference methods yield confidence intervals that are too wide when events are rare, and have associated poor statistical power, which make them unsuitable for meta-analysis of rare events (Bradburn et al 2007). This is especially relevant when outcomes that focus on treatment safety are being studied, as the ability to identify correctly (or attempt to refute) serious adverse events is a key issue in drug development.

It is likely that outcomes for which no events occur in either arm may not be mentioned in reports of many randomized trials, precluding their inclusion in a meta-analysis. It is unclear, though, when working with published results, whether failure to mention a particular adverse event means there were no such events, or simply that such events were not included as a measured endpoint. Whilst the results of risk difference meta-analyses will be affected by non-reporting of outcomes with no events, odds and risk ratio based methods naturally exclude these data whether or not they are published, and are therefore unaffected.

10.4.4.3 Validity of methods of meta-analysis for rare events

Simulation studies have revealed that many meta-analytical methods can give misleading results for rare events, which is unsurprising given their reliance on asymptotic statistical theory. Their performance has been judged suboptimal either through results being biased, confidence intervals being inappropriately wide, or statistical power being too low to detect substantial differences.

In the following we consider the choice of statistical method for meta-analyses of odds ratios. Appropriate choices appear to depend on the comparator group risk, the likely size of the treatment effect and consideration of balance in the numbers of experimental and comparator participants in the constituent studies. We are not aware of research that has evaluated risk ratio measures directly, but their performance is likely to be very similar to corresponding odds ratio measurements. When events are rare, estimates of odds and risks are near identical, and results of both can be interpreted as ratios of probabilities.

Bradburn and colleagues found that many of the most commonly used meta-analytical methods were biased when events were rare (Bradburn et al 2007). The bias was greatest in inverse variance and DerSimonian and Laird odds ratio and risk difference methods, and the Mantel-Haenszel odds ratio method using a 0.5 zero-cell correction. As already noted, risk difference meta-analytical methods tended to show conservative confidence interval coverage and low statistical power when risks of events were low.

At event rates below 1% the Peto one-step odds ratio method was found to be the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and comparator group sizes within studies, and treatment effects were not exceptionally large. This finding was consistently observed across three different meta-analytical scenarios, and was also observed by Sweeting and colleagues (Sweeting et al 2004).

This finding was noted despite the method producing only an approximation to the odds ratio. For very large effects (e.g. risk ratio=0.2) when the approximation is known to be poor, treatment effects were under-estimated, but the Peto method still had the best performance of all the methods considered for event risks of 1 in 1000, and the bias was never more than 6% of the comparator group risk.

In other circumstances (i.e. event risks above 1%, very large effects at event risks around 1%, and meta-analyses where many studies were substantially imbalanced) the best performing methods were the Mantel-Haenszel odds ratio without zero-cell corrections, logistic regression and an exact method. None of these methods is available in RevMan.

Methods that should be avoided with rare events are the inverse-variance methods (including the DerSimonian and Laird random-effects method) (Efthimiou 2018). These directly incorporate the study’s variance in the estimation of its contribution to the meta-analysis, but these are usually based on a large-sample variance approximation, which was not intended for use with rare events. We would suggest that incorporation of heterogeneity into an estimate of a treatment effect should be a secondary consideration when attempting to produce estimates of effects from sparse data – the primary concern is to discern whether there is any signal of an effect in the data.

10.5 Meta-analysis of continuous outcomes

An important assumption underlying standard methods for meta-analysis of continuous data is that the outcomes have a normal distribution in each intervention arm in each study. This assumption may not always be met, although it is unimportant in very large studies. It is useful to consider the possibility of skewed data (see Section 10.5.3 ).

10.5.1 Which effect measure for continuous outcomes?

The two summary statistics commonly used for meta-analysis of continuous data are the mean difference (MD) and the standardized mean difference (SMD). Other options are available, such as the ratio of means (see Chapter 6, Section 6.5.1 ). Selection of summary statistics for continuous data is principally determined by whether studies all report the outcome using the same scale (when the mean difference can be used) or using different scales (when the standardized mean difference is usually used). The ratio of means can be used in either situation, but is appropriate only when outcome measurements are strictly greater than zero. Further considerations in deciding on an effect measure that will facilitate interpretation of the findings appears in Chapter 15, Section 15.5 .

The different roles played in MD and SMD approaches by the standard deviations (SDs) of outcomes observed in the two groups should be understood.

For the mean difference approach, the SDs are used together with the sample sizes to compute the weight given to each study. Studies with small SDs are given relatively higher weight whilst studies with larger SDs are given relatively smaller weights. This is appropriate if variation in SDs between studies reflects differences in the reliability of outcome measurements, but is probably not appropriate if the differences in SD reflect real differences in the variability of outcomes in the study populations.

For the standardized mean difference approach, the SDs are used to standardize the mean differences to a single scale, as well as in the computation of study weights. Thus, studies with small SDs lead to relatively higher estimates of SMD, whilst studies with larger SDs lead to relatively smaller estimates of SMD. For this to be appropriate, it must be assumed that between-study variation in SDs reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among study populations, as discussed in Chapter 6, Section 6.5.1.2 .

These assumptions of the methods should be borne in mind when unexpected variation of SDs is observed across studies.

10.5.2 Meta-analysis of change scores

In some circumstances an analysis based on changes from baseline will be more efficient and powerful than comparison of post-intervention values, as it removes a component of between-person variability from the analysis. However, calculation of a change score requires measurement of the outcome twice and in practice may be less efficient for outcomes that are unstable or difficult to measure precisely, where the measurement error may be larger than true between-person baseline variability. Change-from-baseline outcomes may also be preferred if they have a less skewed distribution than post-intervention measurement outcomes. Although sometimes used as a device to ‘correct’ for unlucky randomization, this practice is not recommended.

The preferred statistical approach to accounting for baseline measurements of the outcome variable is to include the baseline outcome measurements as a covariate in a regression model or analysis of covariance (ANCOVA). These analyses produce an ‘adjusted’ estimate of the intervention effect together with its standard error. These analyses are the least frequently encountered, but as they give the most precise and least biased estimates of intervention effects they should be included in the analysis when they are available. However, they can only be included in a meta-analysis using the generic inverse-variance method, since means and SDs are not available for each intervention group separately.

In practice an author is likely to discover that the studies included in a review include a mixture of change-from-baseline and post-intervention value scores. However, mixing of outcomes is not a problem when it comes to meta-analysis of MDs. There is no statistical reason why studies with change-from-baseline outcomes should not be combined in a meta-analysis with studies with post-intervention measurement outcomes when using the (unstandardized) MD method. In a randomized study, MD based on changes from baseline can usually be assumed to be addressing exactly the same underlying intervention effects as analyses based on post-intervention measurements. That is to say, the difference in mean post-intervention values will on average be the same as the difference in mean change scores. If the use of change scores does increase precision, appropriately, the studies presenting change scores will be given higher weights in the analysis than they would have received if post-intervention values had been used, as they will have smaller SDs.

When combining the data on the MD scale, authors must be careful to use the appropriate means and SDs (either of post-intervention measurements or of changes from baseline) for each study. Since the mean values and SDs for the two types of outcome may differ substantially, it may be advisable to place them in separate subgroups to avoid confusion for the reader, but the results of the subgroups can legitimately be pooled together.

In contrast, post-intervention value and change scores should not in principle be combined using standard meta-analysis approaches when the effect measure is an SMD. This is because the SDs used in the standardization reflect different things. The SD when standardizing post-intervention values reflects between-person variability at a single point in time. The SD when standardizing change scores reflects variation in between-person changes over time, so will depend on both within-person and between-person variability; within-person variability in turn is likely to depend on the length of time between measurements. Nevertheless, an empirical study of 21 meta-analyses in osteoarthritis did not find a difference between combined SMDs based on post-intervention values and combined SMDs based on change scores (da Costa et al 2013). One option is to standardize SMDs using post-intervention SDs rather than change score SDs. This would lead to valid synthesis of the two approaches, but we are not aware that an appropriate standard error for this has been derived.

A common practical problem associated with including change-from-baseline measures is that the SD of changes is not reported. Imputation of SDs is discussed in Chapter 6, Section 6.5.2.8 .

10.5.3 Meta-analysis of skewed data

Analyses based on means are appropriate for data that are at least approximately normally distributed, and for data from very large trials. If the true distribution of outcomes is asymmetrical, then the data are said to be skewed. Review authors should consider the possibility and implications of skewed data when analysing continuous outcomes (see MECIR Box 10.5.a ). Skew can sometimes be diagnosed from the means and SDs of the outcomes. A rough check is available, but it is only valid if a lowest or highest possible value for an outcome is known to exist. Thus, the check may be used for outcomes such as weight, volume and blood concentrations, which have lowest possible values of 0, or for scale outcomes with minimum or maximum scores, but it may not be appropriate for change-from-baseline measures. The check involves calculating the observed mean minus the lowest possible value (or the highest possible value minus the observed mean), and dividing this by the SD. A ratio less than 2 suggests skew (Altman and Bland 1996). If the ratio is less than 1, there is strong evidence of a skewed distribution.

Transformation of the original outcome data may reduce skew substantially. Reports of trials may present results on a transformed scale, usually a log scale. Collection of appropriate data summaries from the trialists, or acquisition of individual patient data, is currently the approach of choice. Appropriate data summaries and analysis strategies for the individual patient data will depend on the situation. Consultation with a knowledgeable statistician is advised.

Where data have been analysed on a log scale, results are commonly presented as geometric means and ratios of geometric means. A meta-analysis may be then performed on the scale of the log-transformed data; an example of the calculation of the required means and SD is given in Chapter 6, Section 6.5.2.4 . This approach depends on being able to obtain transformed data for all studies; methods for transforming from one scale to the other are available (Higgins et al 2008b). Log-transformed and untransformed data should not be mixed in a meta-analysis.

MECIR Box 10.5.a Relevant expectations for conduct of intervention reviews

Addressing skewed data ( )

Skewed data are sometimes not summarized usefully by means and standard deviations. While statistical methods are approximately valid for large sample sizes, skewed outcome data can lead to misleading results when studies are small.

10.6 Combining dichotomous and continuous outcomes

Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means, or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cut-point). This type of information is often easier to understand, and more helpful, when it is dichotomized. However, deciding on a cut-point may be arbitrary, and information is lost when continuous data are transformed to dichotomous data.

There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and SDs as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.

There are statistical approaches available that will re-express odds ratios as SMDs (and vice versa), allowing dichotomous and continuous data to be combined (Anzures-Cabrera et al 2011). A simple approach is as follows. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution, but with more data in the distributional tails), and that the variability of the outcomes is the same in both experimental and comparator participants, the odds ratios can be re-expressed as a SMD according to the following simple formula (Chinn 2000):

meta analysis research question example

The standard error of the log odds ratio can be converted to the standard error of a SMD by multiplying by the same constant (√3/π=0.5513). Alternatively SMDs can be re-expressed as log odds ratios by multiplying by π/√3=1.814. Once SMDs (or log odds ratios) and their standard errors have been computed for all studies in the meta-analysis, they can be combined using the generic inverse-variance method. Standard errors can be computed for all studies by entering the data as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and SMDs into standard errors (see Chapter 6, Section 6.3 ).

10.7 Meta-analysis of ordinal outcomes and measurement scale s

Ordinal and measurement scale outcomes are most commonly meta-analysed as dichotomous data (if so, see Section 10.4 ) or continuous data (if so, see Section 10.5 ) depending on the way that the study authors performed the original analyses.

Occasionally it is possible to analyse the data using proportional odds models. This is the case when ordinal scales have a small number of categories, the numbers falling into each category for each intervention group can be obtained, and the same ordinal scale has been used in all studies. This approach may make more efficient use of all available data than dichotomization, but requires access to statistical software and results in a summary statistic for which it is challenging to find a clinical meaning.

The proportional odds model uses the proportional odds ratio as the measure of intervention effect (Agresti 1996) (see Chapter 6, Section 6.6 ), and can be used for conducting a meta-analysis in advanced statistical software packages (Whitehead and Jones 1994). Estimates of log odds ratios and their standard errors from a proportional odds model may be meta-analysed using the generic inverse-variance method (see Section 10.3.3 ). If the same ordinal scale has been used in all studies, but in some reports has been presented as a dichotomous outcome, it may still be possible to include all studies in the meta-analysis. In the context of the three-category model, this might mean that for some studies category 1 constitutes a success, while for others both categories 1 and 2 constitute a success. Methods are available for dealing with this, and for combining data from scales that are related but have different definitions for their categories (Whitehead and Jones 1994).

10.8 Meta-analysis of counts and rates

Results may be expressed as count data when each participant may experience an event, and may experience it more than once (see Chapter 6, Section 6.7 ). For example, ‘number of strokes’, or ‘number of hospital visits’ are counts. These events may not happen at all, but if they do happen there is no theoretical maximum number of occurrences for an individual. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual (see Section 10.4 ), continuous data (see Section 10.5 ) and time-to-event data (see Section 10.9 ), as well as being analysed as rate data.

Rate data occur if counts are measured for each participant along with the time over which they are observed. This is particularly appropriate when the events being counted are rare. For example, a woman may experience two strokes during a follow-up period of two years. Her rate of strokes is one per year of follow-up (or, equivalently 0.083 per month of follow-up). Rates are conventionally summarized at the group level. For example, participants in the comparator group of a clinical trial may experience 85 strokes during a total of 2836 person-years of follow-up. An underlying assumption associated with the use of rates is that the risk of an event is constant across participants and over time. This assumption should be carefully considered for each situation. For example, in contraception studies, rates have been used (known as Pearl indices) to describe the number of pregnancies per 100 women-years of follow-up. This is now considered inappropriate since couples have different risks of conception, and the risk for each woman changes over time. Pregnancies are now analysed more often using life tables or time-to-event methods that investigate the time elapsing before the first pregnancy.

Analysing count data as rates is not always the most appropriate approach and is uncommon in practice. This is because:

  • the assumption of a constant underlying risk may not be suitable; and
  • the statistical methods are not as well developed as they are for other types of data.

The results of a study may be expressed as a rate ratio , that is the ratio of the rate in the experimental intervention group to the rate in the comparator group. The (natural) logarithms of the rate ratios may be combined across studies using the generic inverse-variance method (see Section 10.3.3 ). Alternatively, Poisson regression approaches can be used (Spittal et al 2015).

In a randomized trial, rate ratios may often be very similar to risk ratios obtained after dichotomizing the participants, since the average period of follow-up should be similar in all intervention groups. Rate ratios and risk ratios will differ, however, if an intervention affects the likelihood of some participants experiencing multiple events.

It is possible also to focus attention on the rate difference (see Chapter 6, Section 6.7.1 ). The analysis again can be performed using the generic inverse-variance method (Hasselblad and McCrory 1995, Guevara et al 2004).

10.9 Meta-analysis of time-to-event outcomes

Two approaches to meta-analysis of time-to-event outcomes are readily available to Cochrane Review authors. The choice of which to use will depend on the type of data that have been extracted from the primary studies, or obtained from re-analysis of individual participant data.

If ‘O – E’ and ‘V’ statistics have been obtained (see Chapter 6, Section 6.8.2 ), either through re-analysis of individual participant data or from aggregate statistics presented in the study reports, then these statistics may be entered directly into RevMan using the ‘O – E and Variance’ outcome type. There are several ways to calculate these ‘O – E’ and ‘V’ statistics. Peto’s method applied to dichotomous data (Section 10.4.2 ) gives rise to an odds ratio; a log-rank approach gives rise to a hazard ratio; and a variation of the Peto method for analysing time-to-event data gives rise to something in between (Simmonds et al 2011). The appropriate effect measure should be specified. Only fixed-effect meta-analysis methods are available in RevMan for ‘O – E and Variance’ outcomes.

Alternatively, if estimates of log hazard ratios and standard errors have been obtained from results of Cox proportional hazards regression models, study results can be combined using generic inverse-variance methods (see Section 10.3.3 ).

If a mixture of log-rank and Cox model estimates are obtained from the studies, all results can be combined using the generic inverse-variance method, as the log-rank estimates can be converted into log hazard ratios and standard errors using the approaches discussed in Chapter 6, Section 6.8 .

10.10 Heterogeneity

10.10.1 what is heterogeneity.

Inevitably, studies brought together in a systematic review will differ. Any kind of variability among studies in a systematic review may be termed heterogeneity. It can be helpful to distinguish between different types of heterogeneity. Variability in the participants, interventions and outcomes studied may be described as clinical diversity (sometimes called clinical heterogeneity), and variability in study design, outcome measurement tools and risk of bias may be described as methodological diversity (sometimes called methodological heterogeneity). Variability in the intervention effects being evaluated in the different studies is known as statistical heterogeneity , and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone. We will follow convention and refer to statistical heterogeneity simply as heterogeneity .

Clinical variation will lead to heterogeneity if the intervention effect is affected by the factors that vary across studies; most obviously, the specific interventions or patient characteristics. In other words, the true intervention effect will be different in different studies.

Differences between studies in terms of methodological factors, such as use of blinding and concealment of allocation sequence, or if there are differences between studies in the way the outcomes are defined and measured, may be expected to lead to differences in the observed intervention effects. Significant statistical heterogeneity arising from methodological diversity or differences in outcome assessments suggests that the studies are not all estimating the same quantity, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity would indicate that the studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this is not always the case. Further discussion appears in Chapter 7 and Chapter 8 .

The scope of a review will largely determine the extent to which studies included in a review are diverse. Sometimes a review will include studies addressing a variety of questions, for example when several different interventions for the same condition are of interest (see also Chapter 11 ) or when the differential effects of an intervention in different populations are of interest. Meta-analysis should only be considered when a group of studies is sufficiently homogeneous in terms of participants, interventions and outcomes to provide a meaningful summary (see MECIR Box 10.10.a. ). It is often appropriate to take a broader perspective in a meta-analysis than in a single clinical trial. A common analogy is that systematic reviews bring together apples and oranges, and that combining these can yield a meaningless result. This is true if apples and oranges are of intrinsic interest on their own, but may not be if they are used to contribute to a wider question about fruit. For example, a meta-analysis may reasonably evaluate the average effect of a class of drugs by combining results from trials where each evaluates the effect of a different drug from the class.

MECIR Box 10.10.a Relevant expectations for conduct of intervention reviews

( )

Meta-analyses of very diverse studies can be misleading, for example where studies use different forms of control. Clinical diversity does not indicate necessarily that a meta-analysis should not be performed. However, authors must be clear about the underlying question that all studies are addressing.

There may be specific interest in a review in investigating how clinical and methodological aspects of studies relate to their results. Where possible these investigations should be specified a priori (i.e. in the protocol for the systematic review). It is legitimate for a systematic review to focus on examining the relationship between some clinical characteristic(s) of the studies and the size of intervention effect, rather than on obtaining a summary effect estimate across a series of studies (see Section 10.11 ). Meta-regression may best be used for this purpose, although it is not implemented in RevMan (see Section 10.11.4 ).

10.10.2 Identifying and measuring heterogeneity

It is essential to consider the extent to which the results of studies are consistent with each other (see MECIR Box 10.10.b ). If confidence intervals for the results of individual studies (generally depicted graphically using horizontal lines) have poor overlap, this generally indicates the presence of statistical heterogeneity. More formally, a statistical test for heterogeneity is available. This Chi 2 (χ 2 , or chi-squared) test is included in the forest plots in Cochrane Reviews. It assesses whether observed differences in results are compatible with chance alone. A low P value (or a large Chi 2 statistic relative to its degree of freedom) provides evidence of heterogeneity of intervention effects (variation in effect estimates beyond chance).

MECIR Box 10.10.b Relevant expectations for conduct of intervention reviews

Assessing statistical heterogeneity ( )

The presence of heterogeneity affects the extent to which generalizable conclusions can be formed. It is important to identify heterogeneity in case there is sufficient information to explain it and offer new insights. Authors should recognize that there is much uncertainty in measures such as and Tau when there are few studies. Thus, use of simple thresholds to diagnose heterogeneity should be avoided.

Care must be taken in the interpretation of the Chi 2 test, since it has low power in the (common) situation of a meta-analysis when studies have small sample size or are few in number. This means that while a statistically significant result may indicate a problem with heterogeneity, a non-significant result must not be taken as evidence of no heterogeneity. This is also why a P value of 0.10, rather than the conventional level of 0.05, is sometimes used to determine statistical significance. A further problem with the test, which seldom occurs in Cochrane Reviews, is that when there are many studies in a meta-analysis, the test has high power to detect a small amount of heterogeneity that may be clinically unimportant.

Some argue that, since clinical and methodological diversity always occur in a meta-analysis, statistical heterogeneity is inevitable (Higgins et al 2003). Thus, the test for heterogeneity is irrelevant to the choice of analysis; heterogeneity will always exist whether or not we happen to be able to detect it using a statistical test. Methods have been developed for quantifying inconsistency across studies that move the focus away from testing whether heterogeneity is present to assessing its impact on the meta-analysis. A useful statistic for quantifying inconsistency is:

meta analysis research question example

In this equation, Q is the Chi 2 statistic and df is its degrees of freedom (Higgins and Thompson 2002, Higgins et al 2003). I 2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).

Thresholds for the interpretation of the I 2 statistic can be misleading, since the importance of inconsistency depends on several factors. A rough guide to interpretation in the context of meta-analyses of randomized trials is as follows:

  • 0% to 40%: might not be important;
  • 30% to 60%: may represent moderate heterogeneity*;
  • 50% to 90%: may represent substantial heterogeneity*;
  • 75% to 100%: considerable heterogeneity*.

*The importance of the observed value of I 2 depends on (1) magnitude and direction of effects, and (2) strength of evidence for heterogeneity (e.g. P value from the Chi 2 test, or a confidence interval for I 2 : uncertainty in the value of I 2 is substantial when the number of studies is small).

10.10.3 Strategies for addressing heterogeneity

Review authors must take into account any statistical heterogeneity when interpreting results, particularly when there is variation in the direction of effect (see MECIR Box 10.10.c ). A number of options are available if heterogeneity is identified among a group of studies that would otherwise be considered suitable for a meta-analysis.

MECIR Box 10.10.c  Relevant expectations for conduct of intervention reviews

Considering statistical heterogeneity when interpreting the results ( )

The presence of heterogeneity affects the extent to which generalizable conclusions can be formed. If a fixed-effect analysis is used, the confidence intervals ignore the extent of heterogeneity. If a random-effects analysis is used, the result pertains to the mean effect across studies. In both cases, the implications of notable heterogeneity should be addressed. It may be possible to understand the reasons for the heterogeneity if there are sufficient studies.

  • Check again that the data are correct. Severe apparent heterogeneity can indicate that data have been incorrectly extracted or entered into meta-analysis software. For example, if standard errors have mistakenly been entered as SDs for continuous outcomes, this could manifest itself in overly narrow confidence intervals with poor overlap and hence substantial heterogeneity. Unit-of-analysis errors may also be causes of heterogeneity (see Chapter 6, Section 6.2 ).  
  • Do not do a meta -analysis. A systematic review need not contain any meta-analyses. If there is considerable variation in results, and particularly if there is inconsistency in the direction of effect, it may be misleading to quote an average value for the intervention effect.  
  • Explore heterogeneity. It is clearly of interest to determine the causes of heterogeneity among results of studies. This process is problematic since there are often many characteristics that vary across studies from which one may choose. Heterogeneity may be explored by conducting subgroup analyses (see Section 10.11.3 ) or meta-regression (see Section 10.11.4 ). Reliable conclusions can only be drawn from analyses that are truly pre-specified before inspecting the studies’ results, and even these conclusions should be interpreted with caution. Explorations of heterogeneity that are devised after heterogeneity is identified can at best lead to the generation of hypotheses. They should be interpreted with even more caution and should generally not be listed among the conclusions of a review. Also, investigations of heterogeneity when there are very few studies are of questionable value.  
  • Ignore heterogeneity. Fixed-effect meta-analyses ignore heterogeneity. The summary effect estimate from a fixed-effect meta-analysis is normally interpreted as being the best estimate of the intervention effect. However, the existence of heterogeneity suggests that there may not be a single intervention effect but a variety of intervention effects. Thus, the summary fixed-effect estimate may be an intervention effect that does not actually exist in any population, and therefore have a confidence interval that is meaningless as well as being too narrow (see Section 10.10.4 ).  
  • Perform a random-effects meta-analysis. A random-effects meta-analysis may be used to incorporate heterogeneity among studies. This is not a substitute for a thorough investigation of heterogeneity. It is intended primarily for heterogeneity that cannot be explained. An extended discussion of this option appears in Section 10.10.4 .  
  • Reconsider the effect measure. Heterogeneity may be an artificial consequence of an inappropriate choice of effect measure. For example, when studies collect continuous outcome data using different scales or different units, extreme heterogeneity may be apparent when using the mean difference but not when the more appropriate standardized mean difference is used. Furthermore, choice of effect measure for dichotomous outcomes (odds ratio, risk ratio, or risk difference) may affect the degree of heterogeneity among results. In particular, when comparator group risks vary, homogeneous odds ratios or risk ratios will necessarily lead to heterogeneous risk differences, and vice versa. However, it remains unclear whether homogeneity of intervention effect in a particular meta-analysis is a suitable criterion for choosing between these measures (see also Section 10.4.3 ).  
  • Exclude studies. Heterogeneity may be due to the presence of one or two outlying studies with results that conflict with the rest of the studies. In general it is unwise to exclude studies from a meta-analysis on the basis of their results as this may introduce bias. However, if an obvious reason for the outlying result is apparent, the study might be removed with more confidence. Since usually at least one characteristic can be found for any study in any meta-analysis which makes it different from the others, this criterion is unreliable because it is all too easy to fulfil. It is advisable to perform analyses both with and without outlying studies as part of a sensitivity analysis (see Section 10.14 ). Whenever possible, potential sources of clinical diversity that might lead to such situations should be specified in the protocol.

10.10.4 Incorporating heterogeneity into random-effects models

The random-effects meta-analysis approach incorporates an assumption that the different studies are estimating different, yet related, intervention effects (DerSimonian and Laird 1986, Borenstein et al 2010). The approach allows us to address heterogeneity that cannot readily be explained by other factors. A random-effects meta-analysis model involves an assumption that the effects being estimated in the different studies follow some distribution. The model represents our lack of knowledge about why real, or apparent, intervention effects differ, by considering the differences as if they were random. The centre of the assumed distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any particular distributional assumption, and this is a common criticism of random-effects meta-analyses. The importance of the assumed shape for this distribution has not been widely studied.

To undertake a random-effects meta-analysis, the standard errors of the study-specific estimates (SE i in Section 10.3.1 ) are adjusted to incorporate a measure of the extent of variation, or heterogeneity, among the intervention effects observed in different studies (this variation is often referred to as Tau-squared, τ 2 , or Tau 2 ). The amount of variation, and hence the adjustment, can be estimated from the intervention effects and standard errors of the studies included in the meta-analysis.

In a heterogeneous set of studies, a random-effects meta-analysis will award relatively more weight to smaller studies than such studies would receive in a fixed-effect meta-analysis. This is because small studies are more informative for learning about the distribution of effects across studies than for learning about an assumed common intervention effect.

Note that a random-effects model does not ‘take account’ of the heterogeneity, in the sense that it is no longer an issue. It is always preferable to explore possible causes of heterogeneity, although there may be too few studies to do this adequately (see Section 10.11 ).

10.10.4.1 Fixed or random effects?

A fixed-effect meta-analysis provides a result that may be viewed as a ‘typical intervention effect’ from the studies included in the analysis. In order to calculate a confidence interval for a fixed-effect meta-analysis the assumption is usually made that the true effect of intervention (in both magnitude and direction) is the same value in every study (i.e. fixed across studies). This assumption implies that the observed differences among study results are due solely to the play of chance (i.e. that there is no statistical heterogeneity).

A random-effects model provides a result that may be viewed as an ‘average intervention effect’, where this average is explicitly defined according to an assumed distribution of effects across studies. Instead of assuming that the intervention effects are the same, we assume that they follow (usually) a normal distribution. The assumption implies that the observed differences among study results are due to a combination of the play of chance and some genuine variation in the intervention effects.

The random-effects method and the fixed-effect method will give identical results when there is no heterogeneity among the studies.

When heterogeneity is present, a confidence interval around the random-effects summary estimate is wider than a confidence interval around a fixed-effect summary estimate. This will happen whenever the I 2 statistic is greater than zero, even if the heterogeneity is not detected by the Chi 2 test for heterogeneity (see Section 10.10.2 ).

Sometimes the central estimate of the intervention effect is different between fixed-effect and random-effects analyses. In particular, if results of smaller studies are systematically different from results of larger ones, which can happen as a result of publication bias or within-study bias in smaller studies (Egger et al 1997, Poole and Greenland 1999, Kjaergard et al 2001), then a random-effects meta-analysis will exacerbate the effects of the bias (see also Chapter 13, Section 13.3.5.6 ). A fixed-effect analysis will be affected less, although strictly it will also be inappropriate.

The decision between fixed- and random-effects meta-analyses has been the subject of much debate, and we do not provide a universal recommendation. Some considerations in making this choice are as follows:

  • Many have argued that the decision should be based on an expectation of whether the intervention effects are truly identical, preferring the fixed-effect model if this is likely and a random-effects model if this is unlikely (Borenstein et al 2010). Since it is generally considered to be implausible that intervention effects across studies are identical (unless the intervention has no effect at all), this leads many to advocate use of the random-effects model.
  • Others have argued that a fixed-effect analysis can be interpreted in the presence of heterogeneity, and that it makes fewer assumptions than a random-effects meta-analysis. They then refer to it as a ‘fixed-effects’ meta-analysis (Peto et al 1995, Rice et al 2018).
  • Under any interpretation, a fixed-effect meta-analysis ignores heterogeneity. If the method is used, it is therefore important to supplement it with a statistical investigation of the extent of heterogeneity (see Section 10.10.2 ).
  • In the presence of heterogeneity, a random-effects analysis gives relatively more weight to smaller studies and relatively less weight to larger studies. If there is additionally some funnel plot asymmetry (i.e. a relationship between intervention effect magnitude and study size), then this will push the results of the random-effects analysis towards the findings in the smaller studies. In the context of randomized trials, this is generally regarded as an unfortunate consequence of the model.
  • A pragmatic approach is to plan to undertake both a fixed-effect and a random-effects meta-analysis, with an intention to present the random-effects result if there is no indication of funnel plot asymmetry. If there is an indication of funnel plot asymmetry, then both methods are problematic. It may be reasonable to present both analyses or neither, or to perform a sensitivity analysis in which small studies are excluded or addressed directly using meta-regression (see Chapter 13, Section 13.3.5.6 ).
  • The choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity.

10.10.4.2 Interpretation of random-effects meta-analyses

The summary estimate and confidence interval from a random-effects meta-analysis refer to the centre of the distribution of intervention effects, but do not describe the width of the distribution. Often the summary estimate and its confidence interval are quoted in isolation and portrayed as a sufficient summary of the meta-analysis. This is inappropriate. The confidence interval from a random-effects meta-analysis describes uncertainty in the location of the mean of systematically different effects in the different studies. It does not describe the degree of heterogeneity among studies, as may be commonly believed. For example, when there are many studies in a meta-analysis, we may obtain a very tight confidence interval around the random-effects estimate of the mean effect even when there is a large amount of heterogeneity. A solution to this problem is to consider a prediction interval (see Section 10.10.4.3 ).

Methodological diversity creates heterogeneity through biases variably affecting the results of different studies. The random-effects summary estimate will only correctly estimate the average intervention effect if the biases are symmetrically distributed, leading to a mixture of over-estimates and under-estimates of effect, which is unlikely to be the case. In practice it can be very difficult to distinguish whether heterogeneity results from clinical or methodological diversity, and in most cases it is likely to be due to both, so these distinctions are hard to draw in the interpretation.

When there is little information, either because there are few studies or if the studies are small with few events, a random-effects analysis will provide poor estimates of the amount of heterogeneity (i.e. of the width of the distribution of intervention effects). Fixed-effect methods such as the Mantel-Haenszel method will provide more robust estimates of the average intervention effect, but at the cost of ignoring any heterogeneity.

10.10.4.3 Prediction intervals from a random-effects meta-analysis

An estimate of the between-study variance in a random-effects meta-analysis is typically presented as part of its results. The square root of this number (i.e. Tau) is the estimated standard deviation of underlying effects across studies. Prediction intervals are a way of expressing this value in an interpretable way.

To motivate the idea of a prediction interval, note that for absolute measures of effect (e.g. risk difference, mean difference, standardized mean difference), an approximate 95% range of normally distributed underlying effects can be obtained by creating an interval from 1.96´Tau below the random-effects mean, to 1.96✕Tau above it. (For relative measures such as the odds ratio and risk ratio, an equivalent interval needs to be based on the natural logarithm of the summary estimate.) In reality, both the summary estimate and the value of Tau are associated with uncertainty. A prediction interval seeks to present the range of effects in a way that acknowledges this uncertainty (Higgins et al 2009). A simple 95% prediction interval can be calculated as:

meta analysis research question example

where M is the summary mean from the random-effects meta-analysis, t k −2 is the 95% percentile of a t -distribution with k –2 degrees of freedom, k is the number of studies, Tau 2 is the estimated amount of heterogeneity and SE( M ) is the standard error of the summary mean.

The term ‘prediction interval’ relates to the use of this interval to predict the possible underlying effect in a new study that is similar to the studies in the meta-analysis. A more useful interpretation of the interval is as a summary of the spread of underlying effects in the studies included in the random-effects meta-analysis.

Prediction intervals have proved a popular way of expressing the amount of heterogeneity in a meta-analysis (Riley et al 2011). They are, however, strongly based on the assumption of a normal distribution for the effects across studies, and can be very problematic when the number of studies is small, in which case they can appear spuriously wide or spuriously narrow. Nevertheless, we encourage their use when the number of studies is reasonable (e.g. more than ten) and there is no clear funnel plot asymmetry.

10.10.4.4 Implementing random-effects meta-analyses

As introduced in Section 10.3.2 , the random-effects model can be implemented using an inverse-variance approach, incorporating a measure of the extent of heterogeneity into the study weights. RevMan implements a version of random-effects meta-analysis that is described by DerSimonian and Laird, making use of a ‘moment-based’ estimate of the between-study variance (DerSimonian and Laird 1986). The attraction of this method is that the calculations are straightforward, but it has a theoretical disadvantage in that the confidence intervals are slightly too narrow to encompass full uncertainty resulting from having estimated the degree of heterogeneity.

For many years, RevMan has implemented two random-effects methods for dichotomous data: a Mantel-Haenszel method and an inverse-variance method. Both use the moment-based approach to estimating the amount of between-studies variation. The difference between the two is subtle: the former estimates the between-study variation by comparing each study’s result with a Mantel-Haenszel fixed-effect meta-analysis result, whereas the latter estimates it by comparing each study’s result with an inverse-variance fixed-effect meta-analysis result. In practice, the difference is likely to be trivial.

There are alternative methods for performing random-effects meta-analyses that have better technical properties than the DerSimonian and Laird approach with a moment-based estimate (Veroniki et al 2016). Most notable among these is an adjustment to the confidence interval proposed by Hartung and Knapp and by Sidik and Jonkman (Hartung and Knapp 2001, Sidik and Jonkman 2002). This adjustment widens the confidence interval to reflect uncertainty in the estimation of between-study heterogeneity, and it should be used if available to review authors. An alternative option to encompass full uncertainty in the degree of heterogeneity is to take a Bayesian approach (see Section 10.13 ).

An empirical comparison of different ways to estimate between-study variation in Cochrane meta-analyses has shown that they can lead to substantial differences in estimates of heterogeneity, but seldom have major implications for estimating summary effects (Langan et al 2015). Several simulation studies have concluded that an approach proposed by Paule and Mandel should be recommended (Langan et al 2017); whereas a comprehensive recent simulation study recommended a restricted maximum likelihood approach, although noted that no single approach is universally preferable (Langan et al 2019). Review authors are encouraged to select one of these options if it is available to them.

10.11 Investigating heterogeneity

10.11.1 interaction and effect modification.

Does the intervention effect vary with different populations or intervention characteristics (such as dose or duration)? Such variation is known as interaction by statisticians and as effect modification by epidemiologists. Methods to search for such interactions include subgroup analyses and meta-regression. All methods have considerable pitfalls.

10.11.2 What are subgroup analyses?

Subgroup analyses involve splitting all the participant data into subgroups, often in order to make comparisons between them. Subgroup analyses may be done for subsets of participants (such as males and females), or for subsets of studies (such as different geographical locations). Subgroup analyses may be done as a means of investigating heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study.

Subgroup analyses of subsets of participants within studies are uncommon in systematic reviews based on published literature because sufficient details to extract data about separate participant types are seldom published in reports. By contrast, such subsets of participants are easily analysed when individual participant data have been collected (see Chapter 26 ). The methods we describe in the remainder of this chapter are for subgroups of studies.

Findings from multiple subgroup analyses may be misleading. Subgroup analyses are observational by nature and are not based on randomized comparisons. False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed. If their findings are presented as definitive conclusions there is clearly a risk of people being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations about directions for future research that, if followed, would waste scarce resources.

It is useful to distinguish between the notions of ‘qualitative interaction’ and ‘quantitative interaction’ (Yusuf et al 1991). Qualitative interaction exists if the direction of effect is reversed, that is if an intervention is beneficial in one subgroup but is harmful in another. Qualitative interaction is rare. This may be used as an argument that the most appropriate result of a meta-analysis is the overall effect across all subgroups. Quantitative interaction exists when the size of the effect varies but not the direction, that is if an intervention is beneficial to different degrees in different subgroups.

10.11.3 Undertaking subgroup analyses

Meta-analyses can be undertaken in RevMan both within subgroups of studies as well as across all studies irrespective of their subgroup membership. It is tempting to compare effect estimates in different subgroups by considering the meta-analysis results from each subgroup separately. This should only be done informally by comparing the magnitudes of effect. Noting that either the effect or the test for heterogeneity in one subgroup is statistically significant whilst that in the other subgroup is not statistically significant does not indicate that the subgroup factor explains heterogeneity. Since different subgroups are likely to contain different amounts of information and thus have different abilities to detect effects, it is extremely misleading simply to compare the statistical significance of the results.

10.11.3.1 Is the effect different in different subgroups?

Valid investigations of whether an intervention works differently in different subgroups involve comparing the subgroups with each other. It is a mistake to compare within-subgroup inferences such as P values. If one subgroup analysis is statistically significant and another is not, then the latter may simply reflect a lack of information rather than a smaller (or absent) effect. When there are only two subgroups, non-overlap of the confidence intervals indicates statistical significance, but note that the confidence intervals can overlap to a small degree and the difference still be statistically significant.

A formal statistical approach should be used to examine differences among subgroups (see MECIR Box 10.11.a ). A simple significance test to investigate differences between two or more subgroups can be performed (Borenstein and Higgins 2013). This procedure consists of undertaking a standard test for heterogeneity across subgroup results rather than across individual study results. When the meta-analysis uses a fixed-effect inverse-variance weighted average approach, the method is exactly equivalent to the test described by Deeks and colleagues (Deeks et al 2001). An I 2 statistic is also computed for subgroup differences. This describes the percentage of the variability in effect estimates from the different subgroups that is due to genuine subgroup differences rather than sampling error (chance). Note that these methods for examining subgroup differences should be used only when the data in the subgroups are independent (i.e. they should not be used if the same study participants contribute to more than one of the subgroups in the forest plot).

If fixed-effect models are used for the analysis within each subgroup, then these statistics relate to differences in typical effects across different subgroups. If random-effects models are used for the analysis within each subgroup, then the statistics relate to variation in the mean effects in the different subgroups.

An alternative method for testing for differences between subgroups is to use meta-regression techniques, in which case a random-effects model is generally preferred (see Section 10.11.4 ). Tests for subgroup differences based on random-effects models may be regarded as preferable to those based on fixed-effect models, due to the high risk of false-positive results when a fixed-effect model is used to compare subgroups (Higgins and Thompson 2004).

MECIR Box 10.11.a Relevant expectations for conduct of intervention reviews

Comparing subgroups ( )

Concluding that there is a difference in effect in different subgroups on the basis of differences in the level of statistical significance within subgroups can be very misleading.

10.11.4 Meta-regression

If studies are divided into subgroups (see Section 10.11.2 ), this may be viewed as an investigation of how a categorical study characteristic is associated with the intervention effects in the meta-analysis. For example, studies in which allocation sequence concealment was adequate may yield different results from those in which it was inadequate. Here, allocation sequence concealment, being either adequate or inadequate, is a categorical characteristic at the study level. Meta-regression is an extension to subgroup analyses that allows the effect of continuous, as well as categorical, characteristics to be investigated, and in principle allows the effects of multiple factors to be investigated simultaneously (although this is rarely possible due to inadequate numbers of studies) (Thompson and Higgins 2002). Meta-regression should generally not be considered when there are fewer than ten studies in a meta-analysis.

Meta-regressions are similar in essence to simple regressions, in which an outcome variable is predicted according to the values of one or more explanatory variables . In meta-regression, the outcome variable is the effect estimate (for example, a mean difference, a risk difference, a log odds ratio or a log risk ratio). The explanatory variables are characteristics of studies that might influence the size of intervention effect. These are often called ‘potential effect modifiers’ or covariates. Meta-regressions usually differ from simple regressions in two ways. First, larger studies have more influence on the relationship than smaller studies, since studies are weighted by the precision of their respective effect estimate. Second, it is wise to allow for the residual heterogeneity among intervention effects not modelled by the explanatory variables. This gives rise to the term ‘random-effects meta-regression’, since the extra variability is incorporated in the same way as in a random-effects meta-analysis (Thompson and Sharp 1999).

The regression coefficient obtained from a meta-regression analysis will describe how the outcome variable (the intervention effect) changes with a unit increase in the explanatory variable (the potential effect modifier). The statistical significance of the regression coefficient is a test of whether there is a linear relationship between intervention effect and the explanatory variable. If the intervention effect is a ratio measure, the log-transformed value of the intervention effect should always be used in the regression model (see Chapter 6, Section 6.1.2.1 ), and the exponential of the regression coefficient will give an estimate of the relative change in intervention effect with a unit increase in the explanatory variable.

Meta-regression can also be used to investigate differences for categorical explanatory variables as done in subgroup analyses. If there are J subgroups, membership of particular subgroups is indicated by using J minus 1 dummy variables (which can only take values of zero or one) in the meta-regression model (as in standard linear regression modelling). The regression coefficients will estimate how the intervention effect in each subgroup differs from a nominated reference subgroup. The P value of each regression coefficient will indicate the strength of evidence against the null hypothesis that the characteristic is not associated with the intervention effect.

Meta-regression may be performed using the ‘metareg’ macro available for the Stata statistical package, or using the ‘metafor’ package for R, as well as other packages.

10.11.5 Selection of study characteristics for subgroup analyses and meta-regression

Authors need to be cautious about undertaking subgroup analyses, and interpreting any that they do. Some considerations are outlined here for selecting characteristics (also called explanatory variables, potential effect modifiers or covariates) that will be investigated for their possible influence on the size of the intervention effect. These considerations apply similarly to subgroup analyses and to meta-regressions. Further details may be obtained elsewhere (Oxman and Guyatt 1992, Berlin and Antman 1994).

10.11.5.1 Ensure that there are adequate studies to justify subgroup analyses and meta-regressions

It is very unlikely that an investigation of heterogeneity will produce useful findings unless there is a substantial number of studies. Typical advice for undertaking simple regression analyses: that at least ten observations (i.e. ten studies in a meta-analysis) should be available for each characteristic modelled. However, even this will be too few when the covariates are unevenly distributed across studies.

10.11.5.2 Specify characteristics in advance

Authors should, whenever possible, pre-specify characteristics in the protocol that later will be subject to subgroup analyses or meta-regression. The plan specified in the protocol should then be followed (data permitting), without undue emphasis on any particular findings (see MECIR Box 10.11.b ). Pre-specifying characteristics reduces the likelihood of spurious findings, first by limiting the number of subgroups investigated, and second by preventing knowledge of the studies’ results influencing which subgroups are analysed. True pre-specification is difficult in systematic reviews, because the results of some of the relevant studies are often known when the protocol is drafted. If a characteristic was overlooked in the protocol, but is clearly of major importance and justified by external evidence, then authors should not be reluctant to explore it. However, such post-hoc analyses should be identified as such.

MECIR Box 10.11.b Relevant expectations for conduct of intervention reviews

Interpreting subgroup analyses ( )

If subgroup analyses are conducted

Selective reporting, or over-interpretation, of particular subgroups or particular subgroup analyses should be avoided. This is a problem especially when multiple subgroup analyses are performed. This does not preclude the use of sensible and honest post hoc subgroup analyses.

10.11.5.3 Select a small number of characteristics

The likelihood of a false-positive result among subgroup analyses and meta-regression increases with the number of characteristics investigated. It is difficult to suggest a maximum number of characteristics to look at, especially since the number of available studies is unknown in advance. If more than one or two characteristics are investigated it may be sensible to adjust the level of significance to account for making multiple comparisons.

10.11.5.4 Ensure there is scientific rationale for investigating each characteristic

Selection of characteristics should be motivated by biological and clinical hypotheses, ideally supported by evidence from sources other than the included studies. Subgroup analyses using characteristics that are implausible or clinically irrelevant are not likely to be useful and should be avoided. For example, a relationship between intervention effect and year of publication is seldom in itself clinically informative, and if identified runs the risk of initiating a post-hoc data dredge of factors that may have changed over time.

Prognostic factors are those that predict the outcome of a disease or condition, whereas effect modifiers are factors that influence how well an intervention works in affecting the outcome. Confusion between prognostic factors and effect modifiers is common in planning subgroup analyses, especially at the protocol stage. Prognostic factors are not good candidates for subgroup analyses unless they are also believed to modify the effect of intervention. For example, being a smoker may be a strong predictor of mortality within the next ten years, but there may not be reason for it to influence the effect of a drug therapy on mortality (Deeks 1998). Potential effect modifiers may include participant characteristics (age, setting), the precise interventions (dose of active intervention, choice of comparison intervention), how the study was done (length of follow-up) or methodology (design and quality).

10.11.5.5 Be aware that the effect of a characteristic may not always be identified

Many characteristics that might have important effects on how well an intervention works cannot be investigated using subgroup analysis or meta-regression. These are characteristics of participants that might vary substantially within studies, but that can only be summarized at the level of the study. An example is age. Consider a collection of clinical trials involving adults ranging from 18 to 60 years old. There may be a strong relationship between age and intervention effect that is apparent within each study. However, if the mean ages for the trials are similar, then no relationship will be apparent by looking at trial mean ages and trial-level effect estimates. The problem is one of aggregating individuals’ results and is variously known as aggregation bias, ecological bias or the ecological fallacy (Morgenstern 1982, Greenland 1987, Berlin et al 2002). It is even possible for the direction of the relationship across studies be the opposite of the direction of the relationship observed within each study.

10.11.5.6 Think about whether the characteristic is closely related to another characteristic (confounded)

The problem of ‘confounding’ complicates interpretation of subgroup analyses and meta-regressions and can lead to incorrect conclusions. Two characteristics are confounded if their influences on the intervention effect cannot be disentangled. For example, if those studies implementing an intensive version of a therapy happened to be the studies that involved patients with more severe disease, then one cannot tell which aspect is the cause of any difference in effect estimates between these studies and others. In meta-regression, co-linearity between potential effect modifiers leads to similar difficulties (Berlin and Antman 1994). Computing correlations between study characteristics will give some information about which study characteristics may be confounded with each other.

10.11.6 Interpretation of subgroup analyses and meta-regressions

Appropriate interpretation of subgroup analyses and meta-regressions requires caution (Oxman and Guyatt 1992).

  • Subgroup comparisons are observational. It must be remembered that subgroup analyses and meta-regressions are entirely observational in their nature. These analyses investigate differences between studies. Even if individuals are randomized to one group or other within a clinical trial, they are not randomized to go in one trial or another. Hence, subgroup analyses suffer the limitations of any observational investigation, including possible bias through confounding by other study-level characteristics. Furthermore, even a genuine difference between subgroups is not necessarily due to the classification of the subgroups. As an example, a subgroup analysis of bone marrow transplantation for treating leukaemia might show a strong association between the age of a sibling donor and the success of the transplant. However, this probably does not mean that the age of donor is important. In fact, the age of the recipient is probably a key factor and the subgroup finding would simply be due to the strong association between the age of the recipient and the age of their sibling.  
  • Was the analysis pre-specified or post hoc? Authors should state whether subgroup analyses were pre-specified or undertaken after the results of the studies had been compiled (post hoc). More reliance may be placed on a subgroup analysis if it was one of a small number of pre-specified analyses. Performing numerous post-hoc subgroup analyses to explain heterogeneity is a form of data dredging. Data dredging is condemned because it is usually possible to find an apparent, but false, explanation for heterogeneity by considering lots of different characteristics.  
  • Is there indirect evidence in support of the findings? Differences between subgroups should be clinically plausible and supported by other external or indirect evidence, if they are to be convincing.  
  • Is the magnitude of the difference practically important? If the magnitude of a difference between subgroups will not result in different recommendations for different subgroups, then it may be better to present only the overall analysis results.  
  • Is there a statistically significant difference between subgroups? To establish whether there is a different effect of an intervention in different situations, the magnitudes of effects in different subgroups should be compared directly with each other. In particular, statistical significance of the results within separate subgroup analyses should not be compared (see Section 10.11.3.1 ).  
  • Are analyses looking at within-study or between-study relationships? For patient and intervention characteristics, differences in subgroups that are observed within studies are more reliable than analyses of subsets of studies. If such within-study relationships are replicated across studies then this adds confidence to the findings.

10.11.7 Investigating the effect of underlying risk

One potentially important source of heterogeneity among a series of studies is when the underlying average risk of the outcome event varies between the studies. The underlying risk of a particular event may be viewed as an aggregate measure of case-mix factors such as age or disease severity. It is generally measured as the observed risk of the event in the comparator group of each study (the comparator group risk, or CGR). The notion is controversial in its relevance to clinical practice since underlying risk represents a summary of both known and unknown risk factors. Problems also arise because comparator group risk will depend on the length of follow-up, which often varies across studies. However, underlying risk has received particular attention in meta-analysis because the information is readily available once dichotomous data have been prepared for use in meta-analyses. Sharp provides a full discussion of the topic (Sharp 2001).

Intuition would suggest that participants are more or less likely to benefit from an effective intervention according to their risk status. However, the relationship between underlying risk and intervention effect is a complicated issue. For example, suppose an intervention is equally beneficial in the sense that for all patients it reduces the risk of an event, say a stroke, to 80% of the underlying risk. Then it is not equally beneficial in terms of absolute differences in risk in the sense that it reduces a 50% stroke rate by 10 percentage points to 40% (number needed to treat=10), but a 20% stroke rate by 4 percentage points to 16% (number needed to treat=25).

Use of different summary statistics (risk ratio, odds ratio and risk difference) will demonstrate different relationships with underlying risk. Summary statistics that show close to no relationship with underlying risk are generally preferred for use in meta-analysis (see Section 10.4.3 ).

Investigating any relationship between effect estimates and the comparator group risk is also complicated by a technical phenomenon known as regression to the mean. This arises because the comparator group risk forms an integral part of the effect estimate. A high risk in a comparator group, observed entirely by chance, will on average give rise to a higher than expected effect estimate, and vice versa. This phenomenon results in a false correlation between effect estimates and comparator group risks. There are methods, which require sophisticated software, that correct for regression to the mean (McIntosh 1996, Thompson et al 1997). These should be used for such analyses, and statistical expertise is recommended.

10.11.8 Dose-response analyses

The principles of meta-regression can be applied to the relationships between intervention effect and dose (commonly termed dose-response), treatment intensity or treatment duration (Greenland and Longnecker 1992, Berlin et al 1993). Conclusions about differences in effect due to differences in dose (or similar factors) are on stronger ground if participants are randomized to one dose or another within a study and a consistent relationship is found across similar studies. While authors should consider these effects, particularly as a possible explanation for heterogeneity, they should be cautious about drawing conclusions based on between-study differences. Authors should be particularly cautious about claiming that a dose-response relationship does not exist, given the low power of many meta-regression analyses to detect genuine relationships.

10.12 Missing data

10.12.1 types of missing data.

There are many potential sources of missing data in a systematic review or meta-analysis (see Table 10.12.a ). For example, a whole study may be missing from the review, an outcome may be missing from a study, summary data may be missing for an outcome, and individual participants may be missing from the summary data. Here we discuss a variety of potential sources of missing data, highlighting where more detailed discussions are available elsewhere in the Handbook .

Whole studies may be missing from a review because they are never published, are published in obscure places, are rarely cited, or are inappropriately indexed in databases. Thus, review authors should always be aware of the possibility that they have failed to identify relevant studies. There is a strong possibility that such studies are missing because of their ‘uninteresting’ or ‘unwelcome’ findings (that is, in the presence of publication bias). This problem is discussed at length in Chapter 13 . Details of comprehensive search methods are provided in Chapter 4 .

Some studies might not report any information on outcomes of interest to the review. For example, there may be no information on quality of life, or on serious adverse effects. It is often difficult to determine whether this is because the outcome was not measured or because the outcome was not reported. Furthermore, failure to report that outcomes were measured may be dependent on the unreported results (selective outcome reporting bias; see Chapter 7, Section 7.2.3.3 ). Similarly, summary data for an outcome, in a form that can be included in a meta-analysis, may be missing. A common example is missing standard deviations (SDs) for continuous outcomes. This is often a problem when change-from-baseline outcomes are sought. We discuss imputation of missing SDs in Chapter 6, Section 6.5.2.8 . Other examples of missing summary data are missing sample sizes (particularly those for each intervention group separately), numbers of events, standard errors, follow-up times for calculating rates, and sufficient details of time-to-event outcomes. Inappropriate analyses of studies, for example of cluster-randomized and crossover trials, can lead to missing summary data. It is sometimes possible to approximate the correct analyses of such studies, for example by imputing correlation coefficients or SDs, as discussed in Chapter 23, Section 23.1 , for cluster-randomized studies and Chapter 23,Section 23.2 , for crossover trials. As a general rule, most methodologists believe that missing summary data (e.g. ‘no usable data’) should not be used as a reason to exclude a study from a systematic review. It is more appropriate to include the study in the review, and to discuss the potential implications of its absence from a meta-analysis.

It is likely that in some, if not all, included studies, there will be individuals missing from the reported results. Review authors are encouraged to consider this problem carefully (see MECIR Box 10.12.a ). We provide further discussion of this problem in Section 10.12.3 ; see also Chapter 8, Section 8.5 .

Missing data can also affect subgroup analyses. If subgroup analyses or meta-regressions are planned (see Section 10.11 ), they require details of the study-level characteristics that distinguish studies from one another. If these are not available for all studies, review authors should consider asking the study authors for more information.

Table 10.12.a Types of missing data in a meta-analysis

Missing studies

Publication bias

Search not sufficiently comprehensive

Missing outcomes

Outcome not measured

Selective reporting bias

Missing summary data

Selective reporting bias

Incomplete reporting

Missing individuals

Lack of intention-to-treat analysis

Attrition from the study

Selective reporting bias

Missing study-level characteristics (for subgroup analysis or meta-regression)

Characteristic not measured

Incomplete reporting

MECIR Box 10.12.a Relevant expectations for conduct of intervention reviews

Addressing missing outcome data ( )

Incomplete outcome data can introduce bias. In most circumstances, authors should follow the principles of intention-to-treat analyses as far as possible (this may not be appropriate for adverse effects or if trying to demonstrate equivalence). Risk of bias due to incomplete outcome data is addressed in the Cochrane risk-of-bias tool. However, statistical analyses and careful interpretation of results are additional ways in which the issue can be addressed by review authors. Imputation methods can be considered (accompanied by, or in the form of, sensitivity analyses).

10.12.2 General principles for dealing with missing data

There is a large literature of statistical methods for dealing with missing data. Here we briefly review some key concepts and make some general recommendations for Cochrane Review authors. It is important to think why data may be missing. Statisticians often use the terms ‘missing at random’ and ‘not missing at random’ to represent different scenarios.

Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual values of the missing data. For instance, if some quality-of-life questionnaires were lost in the postal system, this would be unlikely to be related to the quality of life of the trial participants who completed the forms. In some circumstances, statisticians distinguish between data ‘missing at random’ and data ‘missing completely at random’, although in the context of a systematic review the distinction is unlikely to be important. Data that are missing at random may not be important. Analyses based on the available data will often be unbiased, although based on a smaller sample size than the original data set.

Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual missing data. For instance, in a depression trial, participants who had a relapse of depression might be less likely to attend the final follow-up interview, and more likely to have missing outcome data. Such data are ‘non-ignorable’ in the sense that an analysis of the available data alone will typically be biased. Publication bias and selective reporting bias lead by definition to data that are ‘not missing at random’, and attrition and exclusions of individuals within studies often do as well.

The principal options for dealing with missing data are:

  • analysing only the available data (i.e. ignoring the missing data);
  • imputing the missing data with replacement values, and treating these as if they were observed (e.g. last observation carried forward, imputing an assumed outcome such as assuming all were poor outcomes, imputing the mean, imputing based on predicted values from a regression analysis);
  • imputing the missing data and accounting for the fact that these were imputed with uncertainty (e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard error); and
  • using statistical models to allow for missing data, making assumptions about their relationships with the available data.

Option 2 is practical in most circumstances and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in the imputed values and results, typically, in confidence intervals that are too narrow. Options 3 and 4 would require involvement of a knowledgeable statistician.

Five general recommendations for dealing with missing data in Cochrane Reviews are as follows:

  • Whenever possible, contact the original investigators to request missing data.
  • Make explicit the assumptions of any methods used to address missing data: for example, that the data are assumed missing at random, or that missing values were assumed to have a particular value such as a poor outcome.
  • Follow the guidance in Chapter 8 to assess risk of bias due to missing outcome data in randomized trials.
  • Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the assumptions that are made (see Section 10.14 ).
  • Address the potential impact of missing data on the findings of the review in the Discussion section.

10.12.3 Dealing with missing outcome data from individual participants

Review authors may undertake sensitivity analyses to assess the potential impact of missing outcome data, based on assumptions about the relationship between missingness in the outcome and its true value. Several methods are available (Akl et al 2015). For dichotomous outcomes, Higgins and colleagues propose a strategy involving different assumptions about how the risk of the event among the missing participants differs from the risk of the event among the observed participants, taking account of uncertainty introduced by the assumptions (Higgins et al 2008a). Akl and colleagues propose a suite of simple imputation methods, including a similar approach to that of Higgins and colleagues based on relative risks of the event in missing versus observed participants. Similar ideas can be applied to continuous outcome data (Ebrahim et al 2013, Ebrahim et al 2014). Particular care is required to avoid double counting events, since it can be unclear whether reported numbers of events in trial reports apply to the full randomized sample or only to those who did not drop out (Akl et al 2016).

Although there is a tradition of implementing ‘worst case’ and ‘best case’ analyses clarifying the extreme boundaries of what is theoretically possible, such analyses may not be informative for the most plausible scenarios (Higgins et al 2008a).

10.13 Bayesian approaches to meta-analysis

Bayesian statistics is an approach to statistics based on a different philosophy from that which underlies significance tests and confidence intervals. It is essentially about updating of evidence. In a Bayesian analysis, initial uncertainty is expressed through a prior distribution about the quantities of interest. Current data and assumptions concerning how they were generated are summarized in the likelihood . The posterior distribution for the quantities of interest can then be obtained by combining the prior distribution and the likelihood. The likelihood summarizes both the data from studies included in the meta-analysis (for example, 2×2 tables from randomized trials) and the meta-analysis model (for example, assuming a fixed effect or random effects). The result of the analysis is usually presented as a point estimate and 95% credible interval from the posterior distribution for each quantity of interest, which look much like classical estimates and confidence intervals. Potential advantages of Bayesian analyses are summarized in Box 10.13.a . Bayesian analysis may be performed using WinBUGS software (Smith et al 1995, Lunn et al 2000), within R (Röver 2017), or – for some applications – using standard meta-regression software with a simple trick (Rhodes et al 2016).

A difference between Bayesian analysis and classical meta-analysis is that the interpretation is directly in terms of belief: a 95% credible interval for an odds ratio is that region in which we believe the odds ratio to lie with probability 95%. This is how many practitioners actually interpret a classical confidence interval, but strictly in the classical framework the 95% refers to the long-term frequency with which 95% intervals contain the true value. The Bayesian framework also allows a review author to calculate the probability that the odds ratio has a particular range of values, which cannot be done in the classical framework. For example, we can determine the probability that the odds ratio is less than 1 (which might indicate a beneficial effect of an experimental intervention), or that it is no larger than 0.8 (which might indicate a clinically important effect). It should be noted that these probabilities are specific to the choice of the prior distribution. Different meta-analysts may analyse the same data using different prior distributions and obtain different results. It is therefore important to carry out sensitivity analyses to investigate how the results depend on any assumptions made.

In the context of a meta-analysis, prior distributions are needed for the particular intervention effect being analysed (such as the odds ratio or the mean difference) and – in the context of a random-effects meta-analysis – on the amount of heterogeneity among intervention effects across studies. Prior distributions may represent subjective belief about the size of the effect, or may be derived from sources of evidence not included in the meta-analysis, such as information from non-randomized studies of the same intervention or from randomized trials of other interventions. The width of the prior distribution reflects the degree of uncertainty about the quantity. When there is little or no information, a ‘non-informative’ prior can be used, in which all values across the possible range are equally likely.

Most Bayesian meta-analyses use non-informative (or very weakly informative) prior distributions to represent beliefs about intervention effects, since many regard it as controversial to combine objective trial data with subjective opinion. However, prior distributions are increasingly used for the extent of among-study variation in a random-effects analysis. This is particularly advantageous when the number of studies in the meta-analysis is small, say fewer than five or ten. Libraries of data-based prior distributions are available that have been derived from re-analyses of many thousands of meta-analyses in the Cochrane Database of Systematic Reviews (Turner et al 2012).

Box 10.13.a Some potential advantages of Bayesian meta-analysis

Some potential advantages of Bayesian approaches over classical methods for meta-analyses are that they:

of various clinical outcome states; ); ); ); and

Statistical expertise is strongly recommended for review authors who wish to carry out Bayesian analyses. There are several good texts (Sutton et al 2000, Sutton and Abrams 2001, Spiegelhalter et al 2004).

10.14 Sensitivity analyses

The process of undertaking a systematic review involves a sequence of decisions. Whilst many of these decisions are clearly objective and non-contentious, some will be somewhat arbitrary or unclear. For instance, if eligibility criteria involve a numerical value, the choice of value is usually arbitrary: for example, defining groups of older people may reasonably have lower limits of 60, 65, 70 or 75 years, or any value in between. Other decisions may be unclear because a study report fails to include the required information. Some decisions are unclear because the included studies themselves never obtained the information required: for example, the outcomes of those who were lost to follow-up. Further decisions are unclear because there is no consensus on the best statistical method to use for a particular problem.

It is highly desirable to prove that the findings from a systematic review are not dependent on such arbitrary or unclear decisions by using sensitivity analysis (see MECIR Box 10.14.a ). A sensitivity analysis is a repeat of the primary analysis or meta-analysis in which alternative decisions or ranges of values are substituted for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the meta-analysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking the meta-analysis twice: the first time including all studies and, second, including only those that are definitely known to be eligible. A sensitivity analysis asks the question, ‘Are the findings robust to the decisions made in the process of obtaining them?’

MECIR Box 10.14.a Relevant expectations for conduct of intervention reviews

Sensitivity analysis ( )

It is important to be aware when results are robust, since the strength of the conclusion may be strengthened or weakened.

There are many decision nodes within the systematic review process that can generate a need for a sensitivity analysis. Examples include:

Searching for studies:

  • Should abstracts whose results cannot be confirmed in subsequent publications be included in the review?

Eligibility criteria:

  • Characteristics of participants: where a majority but not all people in a study meet an age range, should the study be included?
  • Characteristics of the intervention: what range of doses should be included in the meta-analysis?
  • Characteristics of the comparator: what criteria are required to define usual care to be used as a comparator group?
  • Characteristics of the outcome: what time point or range of time points are eligible for inclusion?
  • Study design: should blinded and unblinded outcome assessment be included, or should study inclusion be restricted by other aspects of methodological criteria?

What data should be analysed?

  • Time-to-event data: what assumptions of the distribution of censored data should be made?
  • Continuous data: where standard deviations are missing, when and how should they be imputed? Should analyses be based on change scores or on post-intervention values?
  • Ordinal scales: what cut-point should be used to dichotomize short ordinal scales into two groups?
  • Cluster-randomized trials: what values of the intraclass correlation coefficient should be used when trial analyses have not been adjusted for clustering?
  • Crossover trials: what values of the within-subject correlation coefficient should be used when this is not available in primary reports?
  • All analyses: what assumptions should be made about missing outcomes? Should adjusted or unadjusted estimates of intervention effects be used?

Analysis methods:

  • Should fixed-effect or random-effects methods be used for the analysis?
  • For dichotomous outcomes, should odds ratios, risk ratios or risk differences be used?
  • For continuous outcomes, where several scales have assessed the same dimension, should results be analysed as a standardized mean difference across all scales or as mean differences individually for each scale?

Some sensitivity analyses can be pre-specified in the study protocol, but many issues suitable for sensitivity analysis are only identified during the review process where the individual peculiarities of the studies under investigation are identified. When sensitivity analyses show that the overall result and conclusions are not affected by the different decisions that could be made during the review process, the results of the review can be regarded with a higher degree of certainty. Where sensitivity analyses identify particular decisions or missing information that greatly influence the findings of the review, greater resources can be deployed to try and resolve uncertainties and obtain extra information, possibly through contacting trial authors and obtaining individual participant data. If this cannot be achieved, the results must be interpreted with an appropriate degree of caution. Such findings may generate proposals for further investigations and future research.

Reporting of sensitivity analyses in a systematic review may best be done by producing a summary table. Rarely is it informative to produce individual forest plots for each sensitivity analysis undertaken.

Sensitivity analyses are sometimes confused with subgroup analysis. Although some sensitivity analyses involve restricting the analysis to a subset of the totality of studies, the two methods differ in two ways. First, sensitivity analyses do not attempt to estimate the effect of the intervention in the group of studies removed from the analysis, whereas in subgroup analyses, estimates are produced for each subgroup. Second, in sensitivity analyses, informal comparisons are made between different ways of estimating the same thing, whereas in subgroup analyses, formal statistical comparisons are made across the subgroups.

10.15 Chapter information

Editors: Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Contributing authors: Douglas Altman, Deborah Ashby, Jacqueline Birks, Michael Borenstein, Marion Campbell, Jonathan Deeks, Matthias Egger, Julian Higgins, Joseph Lau, Keith O’Rourke, Gerta Rücker, Rob Scholten, Jonathan Sterne, Simon Thompson, Anne Whitehead

Acknowledgements: We are grateful to the following for commenting helpfully on earlier drafts: Bodil Als-Nielsen, Deborah Ashby, Jesse Berlin, Joseph Beyene, Jacqueline Birks, Michael Bracken, Marion Campbell, Chris Cates, Wendong Chen, Mike Clarke, Albert Cobos, Esther Coren, Francois Curtin, Roberto D’Amico, Keith Dear, Heather Dickinson, Diana Elbourne, Simon Gates, Paul Glasziou, Christian Gluud, Peter Herbison, Sally Hollis, David Jones, Steff Lewis, Tianjing Li, Joanne McKenzie, Philippa Middleton, Nathan Pace, Craig Ramsey, Keith O’Rourke, Rob Scholten, Guido Schwarzer, Jack Sinclair, Jonathan Sterne, Simon Thompson, Andy Vail, Clarine van Oel, Paula Williamson and Fred Wolf.

Funding: JJD received support from the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH is a member of the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

10.16 References

Agresti A. An Introduction to Categorical Data Analysis . New York (NY): John Wiley & Sons; 1996.

Akl EA, Kahale LA, Agoritsas T, Brignardello-Petersen R, Busse JW, Carrasco-Labra A, Ebrahim S, Johnston BC, Neumann I, Sola I, Sun X, Vandvik P, Zhang Y, Alonso-Coello P, Guyatt G. Handling trial participants with missing outcome data when conducting a meta-analysis: a systematic survey of proposed approaches. Systematic Reviews 2015; 4 : 98.

Akl EA, Kahale LA, Ebrahim S, Alonso-Coello P, Schünemann HJ, Guyatt GH. Three challenges described for identifying participants with missing data in trials reports, and potential solutions suggested to systematic reviewers. Journal of Clinical Epidemiology 2016; 76 : 147-154.

Altman DG, Bland JM. Detecting skewness from summary information. BMJ 1996; 313 : 1200.

Anzures-Cabrera J, Sarpatwari A, Higgins JPT. Expressing findings from meta-analyses of continuous outcomes in terms of risks. Statistics in Medicine 2011; 30 : 2967-2985.

Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology 1993; 4 : 218-228.

Berlin JA, Antman EM. Advantages and limitations of metaanalytic regressions of clinical trials data. Online Journal of Current Clinical Trials 1994; Doc No 134 .

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman KA, Group A-LAITS. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21 : 371-387.

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods 2010; 1 : 97-111.

Borenstein M, Higgins JPT. Meta-analysis and subgroups. Prev Sci 2013; 14 : 134-143.

Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 2007; 26 : 53-77.

Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine 2000; 19 : 3127-3131.

da Costa BR, Nuesch E, Rutjes AW, Johnston BC, Reichenbach S, Trelle S, Guyatt GH, Jüni P. Combining follow-up and change data is valid in meta-analyses of continuous outcomes: a meta-epidemiological study. Journal of Clinical Epidemiology 2013; 66 : 847-855.

Deeks JJ. Systematic reviews of published evidence: Miracles or minefields? Annals of Oncology 1998; 9 : 703-709.

Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 285-312.

Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine 2002; 21 : 1575-1600.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7 : 177-188.

DiGuiseppi C, Higgins JPT. Interventions for promoting smoke alarm ownership and function. Cochrane Database of Systematic Reviews 2001; 2 : CD002246.

Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Johnston BC, Guyatt GH. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2013; 66 : 1014-1021 e1011.

Ebrahim S, Johnston BC, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Guyatt GH. Addressing continuous data measured with different instruments for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2014; 67 : 560-570.

Efthimiou O. Practical guide to the meta-analysis of rare events. Evidence-Based Mental Health 2018; 21 : 72-76.

Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315 : 629-634.

Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Statistics in Medicine 2000; 19 : 1707-1728.

Greenland S, Robins JM. Estimation of a common effect parameter from sparse follow-up data. Biometrics 1985; 41 : 55-68.

Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiologic Reviews 1987; 9 : 1-30.

Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American Journal of Epidemiology 1992; 135 : 1301-1309.

Guevara JP, Berlin JA, Wolf FM. Meta-analytic methods for pooling rates when follow-up duration varies: a case study. BMC Medical Research Methodology 2004; 4 : 17.

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20 : 3875-3889.

Hasselblad V, McCrory DC. Meta-analytic tools for medical decision making: A practical guide. Medical Decision Making 1995; 15 : 81-96.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002; 21 : 1539-1558.

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327 : 557-560.

Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Statistics in Medicine 2004; 23 : 1663-1682.

Higgins JPT, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clinical Trials 2008a; 5 : 225-239.

Higgins JPT, White IR, Anzures-Cabrera J. Meta-analysis of skewed data: combining results reported on log-transformed or raw scales. Statistics in Medicine 2008b; 27 : 6072-6092.

Higgins JPT, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172 : 137-159.

Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Annals of Internal Medicine 2001; 135 : 982-989.

Langan D, Higgins JPT, Simmonds M. An empirical comparison of heterogeneity variance estimators in 12 894 meta-analyses. Research Synthesis Methods 2015; 6 : 195-205.

Langan D, Higgins JPT, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Research Synthesis Methods 2017; 8 : 181-198.

Langan D, Higgins JPT, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Research Synthesis Methods 2019; 10 : 83-98.

Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001; 322 : 1479-1480.

Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 2000; 10 : 325-337.

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 1959; 22 : 719-748.

McIntosh MW. The population risk as an explanatory variable in research synthesis of clinical trials. Statistics in Medicine 1996; 15 : 1713-1728.

Morgenstern H. Uses of ecologic analysis in epidemiologic research. American Journal of Public Health 1982; 72 : 1336-1344.

Oxman AD, Guyatt GH. A consumers guide to subgroup analyses. Annals of Internal Medicine 1992; 116 : 78-84.

Peto R, Collins R, Gray R. Large-scale randomized evidence: large, simple trials and overviews of trials. Journal of Clinical Epidemiology 1995; 48 : 23-40.

Poole C, Greenland S. Random-effects meta-analyses are not always conservative. American Journal of Epidemiology 1999; 150 : 469-475.

Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ, Higgins JPT. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data. Statistics in Medicine 2016; 35 : 5495-5511.

Rice K, Higgins JPT, Lumley T. A re-evaluation of fixed effect(s) meta-analysis. Journal of the Royal Statistical Society Series A (Statistics in Society) 2018; 181 : 205-227.

Riley RD, Higgins JPT, Deeks JJ. Interpretation of random effects meta-analyses. BMJ 2011; 342 : d549.

Röver C. Bayesian random-effects meta-analysis using the bayesmeta R package 2017. https://arxiv.org/abs/1711.08683 .

Rücker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Statistics in Medicine 2009; 28 : 721-738.

Sharp SJ. Analysing the relationship between treatment benefit and underlying risk: precautions and practical recommendations. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 176-188.

Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Statistics in Medicine 2002; 21 : 3153-3159.

Simmonds MC, Tierney J, Bowden J, Higgins JPT. Meta-analysis of time-to-event data: a comparison of two-stage methods. Research Synthesis Methods 2011; 2 : 139-149.

Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47 : 881-889.

Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: a comparative study. Statistics in Medicine 1995; 14 : 2685-2699.

Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Chichester (UK): John Wiley & Sons; 2004.

Spittal MJ, Pirkis J, Gurrin LC. Meta-analysis of incidence rate data in the presence of zero events. BMC Medical Research Methodology 2015; 15 : 42.

Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-analysis in Medical Research . Chichester (UK): John Wiley & Sons; 2000.

Sutton AJ, Abrams KR. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research 2001; 10 : 277-303.

Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine 2004; 23 : 1351-1375.

Thompson SG, Smith TC, Sharp SJ. Investigating underlying risk as a source of heterogeneity in meta-analysis. Statistics in Medicine 1997; 16 : 2741-2758.

Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Statistics in Medicine 1999; 18 : 2693-2708.

Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine 2002; 21 : 1559-1574.

Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. International Journal of Epidemiology 2012; 41 : 818-827.

Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Research Synthesis Methods 2016; 7 : 55-79.

Whitehead A, Jones NMB. A meta-analysis of clinical trials involving different classifications of response into ordered categories. Statistics in Medicine 1994; 13 : 2503-2515.

Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases 1985; 27 : 335-371.

Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 1991; 266 : 93-98.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Systematic review Q & A

What is a systematic review.

A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies. A well-designed systematic review includes clear objectives, pre-selected criteria for identifying eligible studies, an explicit methodology, a thorough and reproducible search of the literature, an assessment of the validity or risk of bias of each included study, and a systematic synthesis, analysis and presentation of the findings of the included studies. A systematic review may include a meta-analysis.

For details about carrying out systematic reviews, see the Guides and Standards section of this guide.

Is my research topic appropriate for systematic review methods?

A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure. By focusing on a single intervention or a few specific interventions for a particular condition, the investigator can ensure a manageable results set. Moreover, examining a single or small set of related interventions, exposures, or outcomes, will simplify the assessment of studies and the synthesis of the findings.

Systematic reviews are poor tools for hypothesis generation: for instance, to determine what interventions have been used to increase the awareness and acceptability of a vaccine or to investigate the ways that predictive analytics have been used in health care management. In the first case, we don't know what interventions to search for and so have to screen all the articles about awareness and acceptability. In the second, there is no agreed on set of methods that make up predictive analytics, and health care management is far too broad. The search will necessarily be incomplete, vague and very large all at the same time. In most cases, reviews without clearly and exactly specified populations, interventions, exposures, and outcomes will produce results sets that quickly outstrip the resources of a small team and offer no consistent way to assess and synthesize findings from the studies that are identified.

If not a systematic review, then what?

You might consider performing a scoping review . This framework allows iterative searching over a reduced number of data sources and no requirement to assess individual studies for risk of bias. The framework includes built-in mechanisms to adjust the analysis as the work progresses and more is learned about the topic. A scoping review won't help you limit the number of records you'll need to screen (broad questions lead to large results sets) but may give you means of dealing with a large set of results.

This tool can help you decide what kind of review is right for your question.

Can my student complete a systematic review during her summer project?

Probably not. Systematic reviews are a lot of work. Including creating the protocol, building and running a quality search, collecting all the papers, evaluating the studies that meet the inclusion criteria and extracting and analyzing the summary data, a well done review can require dozens to hundreds of hours of work that can span several months. Moreover, a systematic review requires subject expertise, statistical support and a librarian to help design and run the search. Be aware that librarians sometimes have queues for their search time. It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

How can I know if my topic has been been reviewed already?

Before starting out on a systematic review, check to see if someone has done it already. In PubMed you can use the systematic review subset to limit to a broad group of papers that is enriched for systematic reviews. You can invoke the subset by selecting if from the Article Types filters to the left of your PubMed results, or you can append AND systematic[sb] to your search. For example:

"neoadjuvant chemotherapy" AND systematic[sb]

The systematic review subset is very noisy, however. To quickly focus on systematic reviews (knowing that you may be missing some), simply search for the word systematic in the title:

"neoadjuvant chemotherapy" AND systematic[ti]

Any PRISMA-compliant systematic review will be captured by this method since including the words "systematic review" in the title is a requirement of the PRISMA checklist. Cochrane systematic reviews do not include 'systematic' in the title, however. It's worth checking the Cochrane Database of Systematic Reviews independently.

You can also search for protocols that will indicate that another group has set out on a similar project. Many investigators will register their protocols in PROSPERO , a registry of review protocols. Other published protocols as well as Cochrane Review protocols appear in the Cochrane Methodology Register, a part of the Cochrane Library .

  • Next: Guides and Standards >>
  • Last Updated: Feb 26, 2024 3:17 PM
  • URL: https://guides.library.harvard.edu/meta-analysis
         


10 Shattuck St, Boston MA 02115 | (617) 432-2136

| |
Copyright © 2020 President and Fellows of Harvard College. All rights reserved.

What is Meta-Analysis? Definition, Research & Examples

Appinio Research · 01.02.2024 · 39min read

What Is Meta-Analysis Definition Research Examples

Are you looking to harness the power of data and uncover meaningful insights from a multitude of research studies? In a world overflowing with information, meta-analysis emerges as a guiding light, offering a systematic and quantitative approach to distilling knowledge from a sea of research.

This guide will demystify the art and science of meta-analysis, walking you through the process, from defining your research question to interpreting the results. Whether you're an academic researcher, a policymaker, or a curious mind eager to explore the depths of data, this guide will equip you with the tools and understanding needed to undertake robust and impactful meta-analyses.

What is a Meta Analysis?

Meta-analysis is a quantitative research method that involves the systematic synthesis and statistical analysis of data from multiple individual studies on a particular topic or research question. It aims to provide a comprehensive and robust summary of existing evidence by pooling the results of these studies, often leading to more precise and generalizable conclusions.

The primary purpose of meta-analysis is to:

  • Quantify Effect Sizes:  Determine the magnitude and direction of an effect or relationship across studies.
  • Evaluate Consistency:  Assess the consistency of findings among studies and identify sources of heterogeneity.
  • Enhance Statistical Power:  Increase the statistical power to detect significant effects by combining data from multiple studies.
  • Generalize Results:  Provide more generalizable results by analyzing a more extensive and diverse sample of participants or contexts.
  • Examine Subgroup Effects:  Explore whether the effect varies across different subgroups or study characteristics.

Importance of Meta-Analysis

Meta-analysis plays a crucial role in scientific research and evidence-based decision-making. Here are key reasons why meta-analysis is highly valuable:

  • Enhanced Precision:  By pooling data from multiple studies, meta-analysis provides a more precise estimate of the effect size, reducing the impact of random variation.
  • Increased Statistical Power:  The combination of numerous studies enhances statistical power, making it easier to detect small but meaningful effects.
  • Resolution of Inconsistencies:  Meta-analysis can help resolve conflicting findings in the literature by systematically analyzing and synthesizing evidence.
  • Identification of Moderators:  It allows for the identification of factors that may moderate the effect, helping to understand when and for whom interventions or treatments are most effective.
  • Evidence-Based Decision-Making:  Policymakers, clinicians, and researchers use meta-analysis to inform evidence-based decision-making, leading to more informed choices in healthcare , education, and other fields.
  • Efficient Use of Resources:  Meta-analysis can guide future research by identifying gaps in knowledge, reducing duplication of efforts, and directing resources to areas with the most significant potential impact.

Types of Research Questions Addressed

Meta-analysis can address a wide range of research questions across various disciplines. Some common types of research questions that meta-analysis can tackle include:

  • Treatment Efficacy:  Does a specific medical treatment, therapy, or intervention have a significant impact on patient outcomes or symptoms?
  • Intervention Effectiveness:  How effective are educational programs, training methods, or interventions in improving learning outcomes or skills?
  • Risk Factors and Associations:  What are the associations between specific risk factors, such as smoking or diet, and the likelihood of developing certain diseases or conditions?
  • Impact of Policies:  What is the effect of government policies, regulations, or interventions on social, economic, or environmental outcomes?
  • Psychological Constructs:  How do psychological constructs, such as self-esteem, anxiety, or motivation, influence behavior or mental health outcomes?
  • Comparative Effectiveness:  Which of two or more competing interventions or treatments is more effective for a particular condition or population?
  • Dose-Response Relationships:  Is there a dose-response relationship between exposure to a substance or treatment and the likelihood or severity of an outcome?

Meta-analysis is a versatile tool that can provide valuable insights into a wide array of research questions, making it an indispensable method in evidence synthesis and knowledge advancement.

Meta-Analysis vs. Systematic Review

In evidence synthesis and research aggregation, meta-analysis and systematic reviews are two commonly used methods, each serving distinct purposes while sharing some similarities. Let's explore the differences and similarities between these two approaches.

Meta-Analysis

  • Purpose:  Meta-analysis is a statistical technique used to combine and analyze quantitative data from multiple individual studies that address the same research question. The primary aim of meta-analysis is to provide a single summary effect size that quantifies the magnitude and direction of an effect or relationship across studies.
  • Data Synthesis:  In meta-analysis, researchers extract and analyze numerical data, such as means, standard deviations, correlation coefficients, or odds ratios, from each study. These effect size estimates are then combined using statistical methods to generate an overall effect size and associated confidence interval.
  • Quantitative:  Meta-analysis is inherently quantitative, focusing on numerical data and statistical analyses to derive a single effect size estimate.
  • Main Outcome:  The main outcome of a meta-analysis is the summary effect size, which provides a quantitative estimate of the research question's answer.

Systematic Review

  • Purpose:  A systematic review is a comprehensive and structured overview of the available evidence on a specific research question. While systematic reviews may include meta-analysis, their primary goal is to provide a thorough and unbiased summary of the existing literature.
  • Data Synthesis:  Systematic reviews involve a meticulous process of literature search, study selection, data extraction, and quality assessment. Researchers may narratively synthesize the findings, providing a qualitative summary of the evidence.
  • Qualitative:  Systematic reviews are often qualitative in nature, summarizing and synthesizing findings in a narrative format. They do not always involve statistical analysis .
  • Main Outcome:  The primary outcome of a systematic review is a comprehensive narrative summary of the existing evidence. While some systematic reviews include meta-analyses, not all do so.

Key Differences

  • Nature of Data:  Meta-analysis primarily deals with quantitative data and statistical analysis , while systematic reviews encompass both quantitative and qualitative data, often presenting findings in a narrative format.
  • Focus on Effect Size:  Meta-analysis focuses on deriving a single, quantitative effect size estimate, whereas systematic reviews emphasize providing a comprehensive overview of the literature, including study characteristics, methodologies, and key findings.
  • Synthesis Approach:  Meta-analysis is a quantitative synthesis method, while systematic reviews may use both quantitative and qualitative synthesis approaches.

Commonalities

  • Structured Process:  Both meta-analyses and systematic reviews follow a structured and systematic process for literature search, study selection, data extraction, and quality assessment.
  • Evidence-Based:  Both approaches aim to provide evidence-based answers to specific research questions, offering valuable insights for decision-making in various fields.
  • Transparency:  Both meta-analyses and systematic reviews prioritize transparency and rigor in their methodologies to minimize bias and enhance the reliability of their findings.

While meta-analysis and systematic reviews share the overarching goal of synthesizing research evidence, they differ in their approach and main outcomes. Meta-analysis is quantitative, focusing on effect sizes, while systematic reviews provide comprehensive overviews, utilizing both quantitative and qualitative data to summarize the literature. Depending on the research question and available data, one or both of these methods may be employed to provide valuable insights for evidence-based decision-making.

How to Conduct a Meta-Analysis?

Planning a meta-analysis is a critical phase that lays the groundwork for a successful and meaningful study. We will explore each component of the planning process in more detail, ensuring you have a solid foundation before diving into data analysis.

How to Formulate Research Questions?

Your research questions are the guiding compass of your meta-analysis. They should be precise and tailored to the topic you're investigating. To craft effective research questions:

  • Clearly Define the Problem:  Start by identifying the specific problem or topic you want to address through meta-analysis.
  • Specify Key Variables:  Determine the essential variables or factors you'll examine in the included studies.
  • Frame Hypotheses:  If applicable, create clear hypotheses that your meta-analysis will test.

For example, if you're studying the impact of a specific intervention on patient outcomes, your research question might be: "What is the effect of Intervention X on Patient Outcome Y in published clinical trials?"

Eligibility Criteria

Eligibility criteria define the boundaries of your meta-analysis. By establishing clear criteria, you ensure that the studies you include are relevant and contribute to your research objectives. Key considerations for eligibility criteria include:

  • Study Types:  Decide which types of studies will be considered (e.g., randomized controlled trials, cohort studies, case-control studies).
  • Publication Time Frame:  Specify the publication date range for included studies.
  • Language:  Determine whether studies in languages other than your primary language will be included.
  • Geographic Region:  If relevant, define any geographic restrictions.

Your eligibility criteria should strike a balance between inclusivity and relevance. Excluding certain studies based on valid criteria ensures the quality and relevance of the data you analyze.

Search Strategy

A robust search strategy is fundamental to identifying all relevant studies. To create an effective search strategy:

  • Select Databases:  Choose appropriate databases that cover your research area (e.g., PubMed, Scopus, Web of Science).
  • Keywords and Search Terms:  Develop a comprehensive list of relevant keywords and search terms related to your research questions.
  • Search Filters:  Utilize search filters and Boolean operators (AND, OR) to refine your search queries.
  • Manual Searches:  Consider conducting hand-searches of key journals and reviewing the reference lists of relevant studies for additional sources.

Remember that the goal is to cast a wide net while maintaining precision to capture all relevant studies.

Data Extraction

Data extraction is the process of systematically collecting information from each selected study. It involves retrieving key data points, including:

  • Study Characteristics:  Author(s), publication year, study design, sample size, duration, and location.
  • Outcome Data:  Effect sizes, standard errors, confidence intervals, p-values, and any other relevant statistics.
  • Methodological Details:  Information on study quality, risk of bias, and potential sources of heterogeneity.

Creating a standardized data extraction form is essential to ensure consistency and accuracy throughout this phase. Spreadsheet software, such as Microsoft Excel, is commonly used for data extraction.

Quality Assessment

Assessing the quality of included studies is crucial to determine their reliability and potential impact on your meta-analysis. Various quality assessment tools and checklists are available, depending on the study design. Some commonly used tools include:

  • Newcastle-Ottawa Scale:  Used for assessing the quality of non-randomized studies (e.g., cohort, case-control studies).
  • Cochrane Risk of Bias Tool:  Designed for evaluating randomized controlled trials.

Quality assessment typically involves evaluating aspects such as study design, sample size, data collection methods , and potential biases. This step helps you weigh the contribution of each study to the overall analysis.

How to Conduct a Literature Review?

Conducting a thorough literature review is a critical step in the meta-analysis process. We will explore the essential components of a literature review, from designing a comprehensive search strategy to establishing clear inclusion and exclusion criteria and, finally, the study selection process.

Comprehensive Search

To ensure the success of your meta-analysis, it's imperative to cast a wide net when searching for relevant studies. A comprehensive search strategy involves:

  • Selecting Relevant Databases:  Identify databases that cover your research area comprehensively, such as PubMed, Scopus, Web of Science, or specialized databases specific to your field.
  • Creating a Keyword List:  Develop a list of relevant keywords and search terms related to your research questions. Think broadly and consider synonyms, acronyms, and variations.
  • Using Boolean Operators:  Utilize Boolean operators (AND, OR) to combine keywords effectively and refine your search.
  • Applying Filters:  Employ search filters (e.g., publication date range, study type) to narrow down results based on your eligibility criteria.

Remember that the goal is to leave no relevant stone unturned, as missing key studies can introduce bias into your meta-analysis.

Inclusion and Exclusion Criteria

Clearly defined inclusion and exclusion criteria are the gatekeepers of your meta-analysis. These criteria ensure that the studies you include meet your research objectives and maintain the quality of your analysis. Consider the following factors when establishing criteria:

  • Study Types:  Determine which types of studies are eligible for inclusion (e.g., randomized controlled trials, observational studies, case reports).
  • Publication Time Frame:  Specify the time frame within which studies must have been published.
  • Language:  Decide whether studies in languages other than your primary language will be included or excluded.
  • Geographic Region:  If applicable, define any geographic restrictions.
  • Relevance to Research Questions:  Ensure that selected studies align with your research questions and objectives.

Your inclusion and exclusion criteria should strike a balance between inclusivity and relevance. Rigorous criteria help maintain the quality and applicability of the studies included in your meta-analysis.

Study Selection Process

The study selection process involves systematically screening and evaluating each potential study to determine whether it meets your predefined inclusion criteria. Here's a step-by-step guide:

  • Screen Titles and Abstracts:  Begin by reviewing the titles and abstracts of the retrieved studies. Exclude studies that clearly do not meet your inclusion criteria.
  • Full-Text Assessment:  Assess the full text of potentially relevant studies to confirm their eligibility. Pay attention to study design, sample size, and other specific criteria.
  • Data Extraction:  For studies that meet your criteria, extract the necessary data, including study characteristics, effect sizes, and other relevant information.
  • Record Exclusions:  Keep a record of the reasons for excluding studies. This transparency is crucial for the reproducibility of your meta-analysis.
  • Resolve Discrepancies:  If multiple reviewers are involved, resolve any disagreements through discussion or a third-party arbitrator.

Maintaining a clear and organized record of your study selection process is essential for transparency and reproducibility. Software tools like EndNote or Covidence can facilitate the screening and data extraction process.

By following these systematic steps in conducting a literature review, you ensure that your meta-analysis is built on a solid foundation of relevant and high-quality studies.

Data Extraction and Management

As you progress in your meta-analysis journey, the data extraction and management phase becomes paramount. We will delve deeper into the critical aspects of this phase, including the data collection process, data coding and transformation, and how to handle missing data effectively.

Data Collection Process

The data collection process is the heart of your meta-analysis, where you systematically extract essential information from each selected study. To ensure accuracy and consistency:

  • Create a Data Extraction Form:  Develop a standardized data extraction form that includes all the necessary fields for collecting relevant data. This form should align with your research questions and inclusion criteria.
  • Data Extractors:  Assign one or more reviewers to extract data from the selected studies. Ensure they are familiar with the form and the specific data points to collect.
  • Double-Check Accuracy:  Implement a verification process where a second reviewer cross-checks a random sample of data extractions to identify discrepancies or errors.
  • Extract All Relevant Information:  Collect data on study characteristics, participant demographics, outcome measures, effect sizes, confidence intervals, and any additional information required for your analysis.
  • Maintain Consistency:  Use clear guidelines and definitions for data extraction to ensure uniformity across studies.
To optimize your data collection process and streamline the extraction and management of crucial information, consider leveraging innovative solutions like Appinio . With Appinio, you can effortlessly collect real-time consumer insights, ensuring your meta-analysis benefits from the latest data trends and user perspectives.   Ready to learn more? Book a demo today and unlock a world of data-driven possibilities!

Book a demo EN US faces

Get a free demo and see the Appinio platform in action!

Data Coding and Transformation

After data collection, you may need to code and transform the extracted data to ensure uniformity and compatibility across studies. This process involves:

  • Coding Categorical Variables:  If studies report data differently, code categorical variables consistently . For example, ensure that categories like "male" and "female" are coded consistently across studies.
  • Standardizing Units of Measurement:  Convert all measurements to a common unit if studies use different measurement units. For instance, if one study reports height in inches and another in centimeters, standardize to one unit for comparability.
  • Calculating Effect Sizes:  Calculate effect sizes and their standard errors or variances if they are not directly reported in the studies. Common effect size measures include Cohen's d, odds ratio (OR), and hazard ratio (HR).
  • Data Transformation:  Transform data if necessary to meet assumptions of statistical tests. Common transformations include log transformation for skewed data or arcsine transformation for proportions.
  • Heterogeneity Adjustment:  Consider using transformation methods to address heterogeneity among studies, such as applying the Freeman-Tukey double arcsine transformation for proportions.

The goal of data coding and transformation is to make sure that data from different studies are compatible and can be effectively synthesized during the analysis phase. Spreadsheet software like Excel or statistical software like R can be used for these tasks.

Handling Missing Data

Missing data is a common challenge in meta-analysis, and how you handle it can impact the validity and precision of your results. Strategies for handling missing data include:

  • Contact Authors:  If feasible, contact the authors of the original studies to request missing data or clarifications.
  • Imputation:  Consider using appropriate imputation methods to estimate missing values, but exercise caution and report the imputation methods used.
  • Sensitivity Analysis:  Conduct sensitivity analyses to assess the impact of missing data on your results by comparing the main analysis to alternative scenarios.

Remember that transparency in reporting how you handled missing data is crucial for the credibility of your meta-analysis.

By following these steps in data extraction and management, you will ensure the integrity and reliability of your meta-analysis dataset.

Meta-Analysis Example

Meta-analysis is a versatile research method that can be applied to various fields and disciplines, providing valuable insights by synthesizing existing evidence.

Example 1: Analyzing the Impact of Advertising Campaigns on Sales

Background:  A market research agency is tasked with assessing the effectiveness of advertising campaigns on sales outcomes for a range of consumer products. They have access to multiple studies and reports conducted by different companies, each analyzing the impact of advertising on sales revenue.

Meta-Analysis Approach:

  • Study Selection:  Identify relevant studies that meet specific inclusion criteria, such as the type of advertising campaign (e.g., TV commercials, social media ads), the products examined, and the sales metrics assessed.
  • Data Extraction:  Collect data from each study, including details about the advertising campaign (e.g., budget, duration), sales data (e.g., revenue, units sold), and any reported effect sizes or correlations.
  • Effect Size Calculation:  Calculate effect sizes (e.g., correlation coefficients) based on the data provided in each study, quantifying the strength and direction of the relationship between advertising and sales.
  • Data Synthesis:  Employ meta-analysis techniques to combine the effect sizes from the selected studies. Compute a summary effect size and its confidence interval to estimate the overall impact of advertising on sales.
  • Publication Bias Assessment:  Use funnel plots and statistical tests to assess the potential presence of publication bias, ensuring that the meta-analysis results are not unduly influenced by selective reporting.

Findings:  Through meta-analysis, the market research agency discovers that advertising campaigns have a statistically significant and positive impact on sales across various product categories. The findings provide evidence for the effectiveness of advertising efforts and assist companies in making data-driven decisions regarding their marketing strategies.

These examples illustrate how meta-analysis can be applied in diverse domains, from tech startups seeking to optimize user engagement to market research agencies evaluating the impact of advertising campaigns. By systematically synthesizing existing evidence, meta-analysis empowers decision-makers with valuable insights for informed choices and evidence-based strategies.

How to Assess Study Quality and Bias?

Ensuring the quality and reliability of the studies included in your meta-analysis is essential for drawing accurate conclusions. We'll show you how you can assess study quality using specific tools, evaluate potential bias, and address publication bias.

Quality Assessment Tools

Quality assessment tools provide structured frameworks for evaluating the methodological rigor of each included study. The choice of tool depends on the study design. Here are some commonly used quality assessment tools:

For Randomized Controlled Trials (RCTs):

  • Cochrane Risk of Bias Tool:  This tool assesses the risk of bias in RCTs based on six domains: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, and selective reporting.
  • Jadad Scale:  A simpler tool specifically for RCTs, the Jadad Scale focuses on randomization, blinding, and the handling of withdrawals and dropouts.

For Observational Studies:

  • Newcastle-Ottawa Scale (NOS):  The NOS assesses the quality of cohort and case-control studies based on three categories: selection, comparability, and outcome.
  • ROBINS-I:  Designed for non-randomized studies of interventions, the Risk of Bias in Non-randomized Studies of Interventions tool evaluates bias in domains such as confounding, selection bias, and measurement bias.
  • MINORS:  The Methodological Index for Non-Randomized Studies (MINORS) assesses non-comparative studies and includes items related to study design, reporting, and statistical analysis.

Bias Assessment

Evaluating potential sources of bias is crucial to understanding the limitations of the included studies. Common sources of bias include:

  • Selection Bias:  Occurs when the selection of participants is not random or representative of the target population.
  • Performance Bias:  Arises when participants or researchers are aware of the treatment or intervention status, potentially influencing outcomes.
  • Detection Bias:  Occurs when outcome assessors are not blinded to the treatment groups.
  • Attrition Bias:  Results from incomplete data or differential loss to follow-up between treatment groups.
  • Reporting Bias:  Involves selective reporting of outcomes, where only positive or statistically significant results are published.

To assess bias, reviewers often use the quality assessment tools mentioned earlier, which include domains related to bias, or they may specifically address bias concerns in the narrative synthesis.

We'll move on to the core of meta-analysis: data synthesis. We'll explore different effect size measures, fixed-effect versus random-effects models, and techniques for assessing and addressing heterogeneity among studies.

Data Synthesis

Now that you've gathered data from multiple studies and assessed their quality, it's time to synthesize this information effectively.

Effect Size Measures

Effect size measures quantify the magnitude of the relationship or difference you're investigating in your meta-analysis. The choice of effect size measure depends on your research question and the type of data provided by the included studies. Here are some commonly used effect size measures:

Continuous Outcome Data:

  • Cohen's d:  Measures the standardized mean difference between two groups. It's suitable for continuous outcome variables.
  • Hedges' g:  Similar to Cohen's d but incorporates a correction factor for small sample sizes.

Binary Outcome Data:

  • Odds Ratio (OR):  Used for dichotomous outcomes, such as success/failure or presence/absence.
  • Risk Ratio (RR):  Similar to OR but used when the outcome is relatively common.

Time-to-Event Data:

  • Hazard Ratio (HR):  Used in survival analysis to assess the risk of an event occurring over time.
  • Risk Difference (RD):  Measures the absolute difference in event rates between two groups.

Selecting the appropriate effect size measure depends on the nature of your data and the research question. When effect sizes are not directly reported in the studies, you may need to calculate them using available data, such as means, standard deviations, and sample sizes.

Formula for Cohen's d:

d = (Mean of Group A - Mean of Group B) / Pooled Standard Deviation

Fixed-Effect vs. Random-Effects Models

In meta-analysis, you can choose between fixed-effect and random-effects models to combine the results of individual studies:

Fixed-Effect Model:

  • Assumes that all included studies share a common true effect size.
  • Accounts for only within-study variability ( sampling error ).
  • Appropriate when studies are very similar or when there's minimal heterogeneity.

Random-Effects Model:

  • Acknowledges that there may be variability in effect sizes across studies.
  • Accounts for both within-study variability (sampling error) and between-study variability (real differences between studies).
  • More conservative and applicable when there's substantial heterogeneity.

The choice between these models should be guided by the degree of heterogeneity observed among the included studies. If heterogeneity is significant, the random-effects model is often preferred, as it provides a more robust estimate of the overall effect.

Forest Plots

Forest plots are graphical representations commonly used in meta-analysis to display the results of individual studies along with the combined summary estimate. Key components of a forest plot include:

  • Vertical Line:  Represents the null effect (e.g., no difference or no effect).
  • Horizontal Lines:  Represent the confidence intervals for each study's effect size estimate.
  • Diamond or Square:  Represents the summary effect size estimate, with its width indicating the confidence interval around the summary estimate.
  • Study Names:  Listed on the left side of the plot, identifying each study.

Forest plots help visualize the distribution of effect sizes across studies and provide insights into the consistency and direction of the findings.

Heterogeneity Assessment

Heterogeneity refers to the variability in effect sizes among the included studies. It's important to assess and understand heterogeneity as it can impact the interpretation of your meta-analysis results. Standard methods for assessing heterogeneity include:

  • Cochran's Q Test:  A statistical test that assesses whether there is significant heterogeneity among the effect sizes of the included studies.
  • I² Statistic:  A measure that quantifies the proportion of total variation in effect sizes that is due to heterogeneity. I² values range from 0% to 100%, with higher values indicating greater heterogeneity.

Assessing heterogeneity is crucial because it informs your choice of meta-analysis model (fixed-effect vs. random-effects) and whether subgroup analyses or sensitivity analyses are warranted to explore potential sources of heterogeneity.

How to Interpret Meta-Analysis Results?

With the data synthesis complete, it's time to make sense of the results of your meta-analysis.

Meta-Analytic Summary

The meta-analytic summary is the culmination of your efforts in data synthesis. It provides a consolidated estimate of the effect size and its confidence interval, combining the results of all included studies. To interpret the meta-analytic summary effectively:

  • Effect Size Estimate:  Understand the primary effect size estimate, such as Cohen's d, odds ratio, or hazard ratio, and its associated confidence interval.
  • Significance:  Determine whether the summary effect size is statistically significant. This is indicated when the confidence interval does not include the null value (e.g., 0 for Cohen's d or 1 for odds ratio).
  • Magnitude:  Assess the magnitude of the effect size. Is it large, moderate, or small, and what are the practical implications of this magnitude?
  • Direction:  Consider the direction of the effect. Is it in the hypothesized direction, or does it contradict the expected outcome?
  • Clinical or Practical Significance:  Reflect on the clinical or practical significance of the findings. Does the effect size have real-world implications?
  • Consistency:  Evaluate the consistency of the findings across studies. Are most studies in agreement with the summary effect size estimate, or are there outliers?

Subgroup Analyses

Subgroup analyses allow you to explore whether the effect size varies across different subgroups of studies or participants. This can help identify potential sources of heterogeneity or assess whether the intervention's effect differs based on specific characteristics. Steps for conducting subgroup analyses:

  • Define Subgroups:  Clearly define the subgroups you want to investigate based on relevant study characteristics (e.g., age groups, study design , intervention type).
  • Analyze Subgroups:  Calculate separate summary effect sizes for each subgroup and compare them to the overall summary effect.
  • Assess Heterogeneity:  Evaluate whether subgroup differences are statistically significant. If so, this suggests that the effect size varies significantly among subgroups.
  • Interpretation:  Interpret the subgroup findings in the context of your research question. Are there meaningful differences in the effect across subgroups? What might explain these differences?

Subgroup analyses can provide valuable insights into the factors influencing the overall effect size and help tailor recommendations for specific populations or conditions.

Sensitivity Analyses

Sensitivity analyses are conducted to assess the robustness of your meta-analysis results by exploring how different choices or assumptions might affect the findings. Common sensitivity analyses include:

  • Exclusion of Low-Quality Studies:  Repeating the meta-analysis after excluding studies with low quality or a high risk of bias.
  • Changing Effect Size Measure:  Re-running the analysis using a different effect size measure to assess whether the choice of measure significantly impacts the results.
  • Publication Bias Adjustment:  Applying methods like the trim-and-fill procedure to adjust for potential publication bias.
  • Subsample Analysis:  Analyzing a subset of studies based on specific criteria or characteristics to investigate their impact on the summary effect.

Sensitivity analyses help assess the robustness and reliability of your meta-analysis results, providing a more comprehensive understanding of the potential influence of various factors.

Reporting and Publication

The final stages of your meta-analysis involve preparing your findings for publication.

Manuscript Preparation

When preparing your meta-analysis manuscript, consider the following:

  • Structured Format:  Organize your manuscript following a structured format, including sections such as introduction, methods, results, discussion, and conclusions.
  • Clarity and Conciseness:  Write your findings clearly and concisely, avoiding jargon or overly technical language. Use tables and figures to enhance clarity.
  • Transparent Methods:  Provide detailed descriptions of your methods, including eligibility criteria, search strategy, data extraction, and statistical analysis.
  • Incorporate Tables and Figures:  Present your meta-analysis results using tables and forest plots to visually convey key findings.
  • Interpretation:  Interpret the implications of your findings, discussing the clinical or practical significance and limitations.

Transparent Reporting Guidelines

Adhering to transparent reporting guidelines ensures that your meta-analysis is transparent, reproducible, and credible. Some widely recognized guidelines include:

  • PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses):  PRISMA provides a checklist and flow diagram for reporting systematic reviews and meta-analyses, enhancing transparency and rigor.
  • MOOSE (Meta-analysis of Observational Studies in Epidemiology):  MOOSE guidelines are designed for meta-analyses of observational studies and provide a framework for transparent reporting.
  • ROBINS-I:  If your meta-analysis involves non-randomized studies, follow the Risk Of Bias In Non-randomized Studies of Interventions guidelines for reporting.

Adhering to these guidelines ensures that your meta-analysis is transparent, reproducible, and credible. It enhances the quality of your research and aids readers and reviewers in assessing the rigor of your study.

PRISMA Statement

The PRISMA statement is a valuable resource for conducting and reporting systematic reviews and meta-analyses. Key elements of PRISMA include:

  • Title:  Clearly indicate that your paper is a systematic review or meta-analysis.
  • Structured Abstract:  Provide a structured summary of your study, including objectives, methods, results, and conclusions.
  • Transparent Reporting:  Follow the PRISMA checklist, which covers items such as the rationale, eligibility criteria, search strategy, data extraction, and risk of bias assessment.
  • Flow Diagram:  Include a flow diagram illustrating the study selection process.

By adhering to the PRISMA statement, you enhance the transparency and credibility of your meta-analysis, facilitating its acceptance for publication and aiding readers in evaluating the quality of your research.

Conclusion for Meta-Analysis

Meta-analysis is a powerful tool that allows you to combine and analyze data from multiple studies to find meaningful patterns and make informed decisions. It helps you see the bigger picture and draw more accurate conclusions than individual studies alone. Whether you're in healthcare, education, business, or any other field, the principles of meta-analysis can be applied to enhance your research and decision-making processes. Remember that conducting a successful meta-analysis requires careful planning, attention to detail, and transparency in reporting. By following the steps outlined in this guide, you can embark on your own meta-analysis journey with confidence, contributing to the advancement of knowledge and evidence-based practices in your area of interest.

How to Elevate Your Meta-Analysis With Real-Time Insights?

Introducing Appinio , the real-time market research platform that brings a new level of excitement to your meta-analysis journey. With Appinio, you can seamlessly collect your own market research data in minutes, empowering your meta-analysis with fresh, real-time consumer insights.

Here's why Appinio is your ideal partner for efficient data collection:

  • From Questions to Insights in Minutes:  Appinio's lightning-fast platform ensures you get the answers you need when you need them, accelerating your meta-analysis process.
  • No Research PhD Required:  Our intuitive platform is designed for everyone, eliminating the need for specialized research skills and putting the power of data collection in your hands.
  • Global Reach, Minimal Time:  With an average field time of less than 23 minutes for 1,000 respondents and access to over 90 countries, you can define precise target groups and gather data swiftly.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

What is Employee Experience EX and How to Improve It

20.08.2024 | 30min read

What is Employee Experience (EX) and How to Improve It?

Grow your brand and sales market share with a Mental Availability Brand Health Tracking

19.08.2024 | 14min read

Revolutionizing Brand Health with Mental Availability: Key Takeaways

360-Degree Feedback Survey Process Software Examples

15.08.2024 | 31min read

360-Degree Feedback: Survey, Process, Software, Examples

University of Tasmania, Australia

Systematic reviews for health: 1. formulate the research question.

  • Handbooks / Guidelines for Systematic Reviews
  • Standards for Reporting
  • Registering a Protocol
  • Tools for Systematic Review
  • Online Tutorials & Courses
  • Books and Articles about Systematic Reviews
  • Finding Systematic Reviews
  • Critical Appraisal
  • Library Help
  • Bibliographic Databases
  • Grey Literature
  • Handsearching
  • Citation Searching
  • 1. Formulate the Research Question
  • 2. Identify the Key Concepts
  • 3. Develop Search Terms - Free-Text
  • 4. Develop Search Terms - Controlled Vocabulary
  • 5. Search Fields
  • 6. Phrase Searching, Wildcards and Proximity Operators
  • 7. Boolean Operators
  • 8. Search Limits
  • 9. Pilot Search Strategy & Monitor Its Development
  • 10. Final Search Strategy
  • 11. Adapt Search Syntax
  • Documenting Search Strategies
  • Handling Results & Storing Papers

meta analysis research question example

Step 1. Formulate the Research Question

A systematic review is based on a pre-defined specific research question ( Cochrane Handbook, 1.1 ). The first step in a systematic review is to determine its focus - you should clearly frame the question(s) the review seeks to answer  ( Cochrane Handbook, 2.1 ). It may take you a while to develop a good review question - it is an important step in your review.  Well-formulated questions will guide many aspects of the review process, including determining eligibility criteria, searching for studies, collecting data from included studies, and presenting findings ( Cochrane Handbook, 2.1 ).

The research question should be clear and focused - not too vague, too specific or too broad.

You may like to consider some of the techniques mentioned below to help you with this process. They can be useful but are not necessary for a good search strategy.

PICO - to search for quantitative review questions

P I C O

if appropriate
Most important characteristics of patient (e.g. age, disease/condition, gender) Main intervention (e.g. drug treatment, diagnostic/screening test) Main alternative (e.g. placebo, standard therapy, no treatment, gold standard) What you are trying to accomplish, measure, improve, affect (e.g. reduced mortality or morbidity, improved memory)

Richardson, WS, Wilson, MC, Nishikawa, J & Hayward, RS 1995, 'The well-built clinical question: A key to evidence-based decisions', ACP Journal Club , vol. 123, no. 3, pp. A12-A12 .

We do not have access to this article at UTAS.

A variant of PICO is PICOS . S stands for Study designs . It establishes which study designs are appropriate for answering the question, e.g. randomised controlled trial (RCT). There is also PICO C (C for context) and PICO T (T for timeframe).

You may find this document on PICO / PIO / PEO useful:

  • Framing a PICO / PIO / PEO question Developed by Teesside University

SPIDER - to search for qualitative and mixed methods research studies

S PI D E R
Sample Phenomenon of Interest Design Evaluation Research type

Cooke, A, Smith, D & Booth, A 2012, 'Beyond pico the spider tool for qualitative evidence synthesis', Qualitative Health Research , vol. 22, no. 10, pp. 1435-1443.

This article is only accessible for UTAS staff and students.

SPICE - to search for qualitative evidence

S P I C E
Setting (where?) Perspecitve (for whom?) Intervention (what?) Comparison (compared with what?) Evaluation (with what result?)

Cleyle, S & Booth, A 2006, 'Clear and present questions: Formulating questions for evidence based practice', Library hi tech , vol. 24, no. 3, pp. 355-368.

ECLIPSE - to search for health policy/management information

E C L I P Se
Expectation (improvement or information or innovation) Client group (at whom the service is aimed) Location (where is the service located?) Impact (outcomes) Professionals (who is involved in providing/improving the service) Service (for which service are you looking for information)

Wildridge, V & Bell, L 2002, 'How clip became eclipse: A mnemonic to assist in searching for health policy/management information', Health Information & Libraries Journal , vol. 19, no. 2, pp. 113-115.

There are many more techniques available. See the below guide from the CQUniversity Library for an extensive list:

  • Question frameworks overview from Framing your research question guide, developed by CQUniversity Library

This is the specific research question used in the example:

"Is animal-assisted therapy more effective than music therapy in managing aggressive behaviour in elderly people with dementia?"

Within this question are the four PICO concepts :

P elderly patients with dementia
I animal-assisted therapy
C music therapy
O aggressive behaviour

S - Study design

This is a therapy question. The best study design to answer a therapy question is a randomised controlled trial (RCT). You may decide to only include studies in the systematic review that were using a RCT, see  Step 8 .

See source of example

Need More Help? Book a consultation with a  Learning and Research Librarian  or contact  [email protected] .

  • << Previous: Building Search Strategies
  • Next: 2. Identify the Key Concepts >>
  • Last Updated: Aug 6, 2024 10:44 AM
  • URL: https://utas.libguides.com/SystematicReviews

Australian Aboriginal Flag

Meta-analysis of data

Meta-analysis

Reviewed by Psychology Today Staff

Meta-analysis is an objective examination of published data from many studies of the same research topic identified through a literature search. Through the use of rigorous statistical methods, it can reveal patterns hidden in individual studies and can yield conclusions that have a high degree of reliability. It is a method of analysis that is especially useful for gaining an understanding of complex phenomena when independent studies have produced conflicting findings.

Meta-analysis provides much of the underpinning for evidence-based medicine. It is particularly helpful in identifying risk factors for a disorder, diagnostic criteria, and the effects of treatments on specific populations of people, as well as quantifying the size of the effects. Meta-analysis is well-suited to understanding the complexities of human behavior.

  • How Does It Differ From Other Studies?
  • When Is It Used?
  • What Are Some Important Things Revealed by Meta-analysis?

Person performing a meta-analysis

There are well-established scientific criteria for selecting studies for meta-analysis. Usually, meta-analysis is conducted on the gold standard of scientific research—randomized, controlled, double-blind trials. In addition, published guidelines not only describe standards for the inclusion of studies to be analyzed but also rank the quality of different types of studies. For example, cohort studies are likely to provide more reliable information than case reports.

Through statistical methods applied to the original data collected in the included studies, meta-analysis can account for and overcome many differences in the way the studies were conducted, such as the populations studied, how interventions were administered, and what outcomes were assessed and how. Meta-analyses, and the questions they are attempting to answer, are typically specified and registered with a scientific organization, and, with the protocols and methods openly described and reviewed independently by outside investigators, the research process is highly transparent.

Meta-analysis of data

Meta-analysis is often used to validate observed phenomena, determine the conditions under which effects occur, and get enough clarity in clinical decision-making to indicate a course of therapeutic action when individual studies have produced disparate findings. In reviewing the aggregate results of well-controlled studies meeting criteria for inclusion, meta-analysis can also reveal which research questions, test conditions, and research methods yield the most reliable results, not only providing findings of immediate clinical utility but furthering science.

The technique can be used to answer social and behavioral questions large and small. For example, to clarify whether or not having more options makes it harder for people to settle on any one item, a meta-analysis of over 53 conflicting studies on the phenomenon was conducted. The meta-analysis revealed that choice overload exists—but only under certain conditions. You will have difficulty selecting a TV show to watch from the massive array of possibilities, for example, if the shows differ from each other in multiple ways or if you don’t have any strong preferences when you finally get to sit down in front of the TV.

Person analyzing results of meta-analysis

A meta-analysis conducted in 2000, for example, answered the question of whether physically attractive people have “better” personalities . Among other traits, they prove to be more extroverted and have more social skills than others. Another meta-analysis, in 2014, showed strong ties between physical attractiveness as rated by others and having good mental and physical health. The effects on such personality factors as extraversion are too small to reliably show up in individual studies but real enough to be detected in the aggregate number of study participants. Together, the studies validate hypotheses put forth by evolutionary psychologists that physical attractiveness is important in mate selection because it is a reliable cue of health and, likely, fertility.

meta analysis research question example

The replication crisis, publication bias, and careerism undermine scientific rigor and ethical responsibility in counseling to provide effective and safe care for clients.

meta analysis research question example

The human brain has two halves. A new study highlights when differences between them start.

meta analysis research question example

When considered across a lifetime, no within-person association exists between religiosity and psychological well-being.

meta analysis research question example

What are the prevalence rates of psychosis for displaced refugees?

meta analysis research question example

A recent review provides compelling evidence that arts engagement significantly reduces cognitive decline and enhances the quality of life among healthy older adults.

meta analysis research question example

Personal Perspective: Mental healthcare AI is evolving beyond administrative roles. By automating routine tasks, therapists can spend sessions focusing on human interactions.

meta analysis research question example

Investing in building a positive classroom climate holds benefits for students and teachers alike.

meta analysis research question example

Mistakenly blaming cancer-causing chemicals and radiation for most cancers lets us avoid the simple lifestyle changes that could protect us from cancer far more.

meta analysis research question example

According to astronomer Carl Sagan, "Extraordinary claims require extraordinary evidence." Does the claim that pet owners live longer pass the extraordinary evidence requirement?

meta analysis research question example

People, including leading politicians, are working later in life than ever before. Luckily, social science suggests that aging does not get in the way of job performance.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Online Therapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Self Tests NEW
  • Therapy Center
  • Diagnosis Dictionary
  • Types of Therapy

July 2024 magazine cover

Sticking up for yourself is no easy task. But there are concrete skills you can use to hone your assertiveness and advocate for yourself.

  • Emotional Intelligence
  • Gaslighting
  • Affective Forecasting
  • Neuroscience

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Wiley Open Access Collection

Logo of blackwellopen

A brief introduction of meta‐analyses in clinical practice and research

Xiao‐meng wang.

1 Department of Epidemiology, School of Public Health, Southern Medical University, Guangzhou Guangdong, China

Xi‐Ru Zhang

Zhi‐hao li, wen‐fang zhong, associated data.

Data sharing is not applicable to this article because no datasets were generated or analyzed during the current study.

With the explosive growth of medical information, it is almost impossible for healthcare providers to review and evaluate all relevant evidence to make the best clinical decisions. Meta‐analyses, which summarize all existing evidence and quantitatively synthesize individual studies, have become the best available evidence for informing clinical practice. This article introduces the common methods, steps, principles, strengths and limitations of meta‐analyses and aims to help healthcare providers and researchers obtain a basic understanding of meta‐analyses in clinical practice and research.

This article introduces the common methods, principles, steps, strengths and limitations of meta‐analyses and aims to help clinicians and researchers obtain a basic understanding of meta‐analyses in clinical practice and research.

An external file that holds a picture, illustration, etc.
Object name is JGM-23-e3312-g001.jpg

1. INTRODUCTION

With the explosive growth of medical information, it has become almost impossible for healthcare providers to review and evaluate all related evidence to inform their decision making. 1 , 2 Furthermore, the inconsistent and often even conflicting conclusions of different studies can confuse these individuals. Systematic reviews were developed to resolve such situations, which comprehensively and systematically summarize all relevant empirical evidence. 3 Many systematic reviews contain meta‐analysis, which use statistical methods to combine the results of individual studies. 4 Through meta‐analyses, researchers can objectively and quantitatively synthesize results from different studies and increase the statistical strength and precision for estimating effects. 5 In the late 1970s, meta‐analysis began to appear regularly in the medical literature. 6 Subsequently, a plethora of meta‐analyses have emerged and the growth is exponential over time. 7 When conducted properly, a meta‐analysis of medical studies is considered as decisive evidence because it occupies a top level in the hierarchy of evidence. 8

An understanding of the principles, performance, advantages and weaknesses of meta‐analyses is important. Therefore, we aim to provide a basic understanding of meta‐analyses for clinicians and researchers in the present article by introducing the common methods, principles, steps, strengths and limitations of meta‐analyses.

2. COMMON META‐ANALYSIS METHODS

There are many types of meta‐analysis methods (Table  1 ). In this article, we mainly introduce five meta‐analysis methods commonly used in clinical practice.

Meta‐analysis methods

MethodsDefinitions
Aggregate data meta‐analysisExtracting summary results of studies available in published accounts
Individual participant data meta‐analysisCollecting individual participant‐level data from original studies
Cumulative meta‐analysisAdding studies to a meta‐analysis based on a predetermined order
Network meta‐analysisCombining direct and indirect evidence to compare the effectiveness between different interventions
Meta‐analysis of diagnostic test accuracyIdentifying and synthesizing evidence on the accuracy of tests
Prospective meta‐analysisConducting meta‐analysis for studies that specify research selection criteria, hypotheses and analysis, but for which the results are not yet known
Sequential meta‐analysisCombining the methodology of cumulative meta‐analysis with the technique of formal sequential testing, which can sequentially evaluate the available evidence at consecutive interim steps during the data collection
Meta‐analysis of the adverse eventsFollowing the basic meta‐analysis principles to analyze the incidences of adverse events of studies

2.1. Aggregated data meta‐analysis

Although more information can be obtained based on individual participant‐level data from original studies, it is usually impossible to obtain these data from all included studies in meta‐analysis because such data may have been corrupted, or the main investigator may no longer be contacted or refuse to release the data. Therefore, by extracting summary results of studies available in published accounts, an aggregate data meta‐analysis (AD‐MA) is the most commonly used of all the quantitative approaches. 9 A study has found that > 95% of published meta‐analyses were AD‐MA. 10 In addition, AD‐MA is the mainstay of systematic reviews conducted by the US Preventive Services Task Force, the Cochrane Collaboration and many professional societies. 9 Moreover, AD‐MA can be completed relatively quickly at a low cost, and the data are relatively easy to obtain. 11 , 12 However, AD‐MA has very limited control over the data. A challenge with AD‐MA is that the association between an individual participant‐level covariate and the effect of the interventions at the study level may not reflect the individual‐level effect modification of that covariate. 13 It is also difficult to extract sufficient compatible data to undertake meaningful subgroup analyses in AD‐MA. 14 Furthermore, AD‐MA is prone to ecological bias, as well as to confounding from variables not included in the model, and may have limited power. 15

2.2. Individual participant data meta‐analysis

An individual participant data meta‐analysis (IPD‐MA) is considered the “gold standard” for meta‐analysis; this type of analysis collects individual participant‐level data from original studies. 15 Compared with AD‐MA, IPD‐MA has many advantages, including improved data quality, a greater variety of analytical types that can be performed and the ability to obtain more reliable results. 16 , 17

It is crucial to maintain clusters of participants within studies in the statistical implementation of an IPD‐MA. Clusters can be retained during the analysis using a one‐step or two‐step approach. 18 In the one‐step approach, the individual participant data from all studies are modeled simultaneously, at the same time as accounting for the clustering of participants within studies. 19 This approach requires a model specific to the type of data being synthesized and an appropriate account of the meta‐analysis assumptions (e.g. fixed or random effects across studies). Cheng et al . 20 proposed using a one‐step IPD‐MA to handle binary rare events and found that this method was superior to traditional methods of inverse variance, the Mantel–Haenszel method and the Yusuf‐Peto method. In the two‐step approach, the individual participant data from each study are analyzed independently for each separate study to produce aggregate data for each study (e.g. a mean treatment effect estimate and its standard error) using a statistical method appropriate for the type of data being analyzed (e.g. a linear regression model might be fitted for continuous responses, or a Cox regression might be applied for time‐to‐event data). The aggregate data are then combined to obtain an summary effect in the second step using a suitable model, such as weighting studies by the inverse of the variance. 21 For example, using a two‐step IPD‐MA, Grams et al . 22 found that apolipoprotein‐L1 kidney‐risk variants were not associated with incident cardiovascular disease or death independent of kidney measures.

Compared to the two‐step approach, the one‐step IPD‐MA is recommended for small meta‐analyses 23 and, conveniently, must only specify one model; however, this requires careful distinction of within‐study and between‐study variability. 24 The two‐step IPD‐MA is more laborious, although it allows the use of traditional, well‐known meta‐analysis techniques in the second step, such as those used by the Cochrane Collaboration (e.g. the Mantel–Haenszel method).

2.3. Cumulative meta‐analysis

Meta‐analyses are traditionally used retrospectively to review existing evidence. However, current evidence often undergoes several updates as new studies become available. Thus, updated data must be continuously obtained to simplify and digest the ever‐expanding literature. Therefore, cumulative meta‐analysis was developed, which adds studies to a meta‐analysis based on a predetermined order and then tracks the magnitude of the mean effect and its variance. 25 A cumulative meta‐analysis can be performed multiple times; not only can it obtain summary results and provide a comparison of the dynamic results, but also it can assess the impact of newly added studies on the overall conclusions. 26 For example, initial observational studies and systematic reviews and meta‐analyses suggested that frozen embryo transfer was better for mothers and babies; however, recent primary studies have begun to challenge these conclusions. 27 Maheshwari et al . 27 therefore conducted a cumulative meta‐analysis to investigate whether these conclusions have remained consistent over time and found that the decreased risks of harmful outcomes associated with pregnancies conceived from frozen embryos have been consistent in terms of direction and magnitude of effect over several years, with an increasing precision around the point estimates. Furthermore, continuously updated cumulative meta‐analyses may avoid unnecessary large‐scale randomized controlled trials (RCTs) and prevent wasted research efforts. 28

2.4. Network meta‐analysis

Although RCTs can directly compare the effectiveness of interventions, most of them compare the effectiveness of an intervention with a placebo, and there is almost no direct comparison between different interventions. 29 , 30 Network meta‐analyses comprise a relatively recent development that combines direct and indirect evidence to compare the effectiveness between different interventions. 31 Evidence obtained from RCTs is considered as direct evidence, whereas evidence obtained through one or more common comparators is considered as indirect evidence. For example, when comparing interventions A and C, direct evidence refers to the estimate of the relative effects between A and C. When no RCTs have directly compared interventions A and C, these interventions can be compared indirectly if both have been compared with B (placebo or some standard treatments) in other studies (forming an A–B–C “loop” of evidence). 32 , 33

A valid network meta‐analysis can correctly combine the relative effects of more than two studies and obtain a consistent estimate of the relative effectiveness of all interventions in one analysis. 34 This meta‐analysis may lead to a greater accuracy of estimating intervention effectiveness and the ability to compare all available interventions to calculate the rank of different interventions. 34 , 35 For example, phosphodiesterase type 5 inhibitors (PDE5‐Is) are the first‐line therapy for erectile dysfunction, although there are limited available studies on the comparative effects of different types of PDE5‐Is. 36 Using a network meta‐analysis, Yuan et al . 36 calculated the absolute effects and the relative rank of different PDE5‐Is to provide an overview of the effectiveness and safety of all PDE5‐Is.

Notably, a network meta‐analysis should satisfy the transitivity assumption, in which there are no systematic differences between the available comparisons other than the interventions being compared 37 ; in other words, the participants could be randomized to any of the interventions in a hypothetical RCT consisting of all the interventions included in the network meta‐analysis.

2.5. Meta‐analysis of diagnostic test accuracy

Sensitivity and specificity are commonly used to assess diagnostic accuracy. However, diagnostic tests in clinical practice are rarely 100% specific or sensitive. 38 It is difficult to obtain accurate estimates of sensitivity and specificity in small diagnostic accuracy studies. 39 , 40 Even in a large sample size study, the number of cases may still be small as a result of the low prevalence. By identifying and synthesizing evidence on the accuracy of tests, the meta‐analysis of diagnostic test accuracy (DTA) provides insight into the ability of medical tests to detect the target diseases 41 ; it also can provide estimates of test performance, allow comparisons of the accuracy of different tests and facilitate the identification of sources of variability. 42 For example, the FilmArray® (Biomerieux, Marcy‐l'Étoile, France) meningitis/encephalitis (ME) panel can detect the most common pathogens in central nervous system infections, although reports of false positives and false negatives are confusing. 43 Based on meta‐analysis of DTA, Tansarli et al . 43 calculated that the sensitivity and specificity of the ME panel were both > 90%, indicating that the ME panel has high diagnostic accuracy.

3. HOW TO PERFORM A META‐ANALYSIS

3.1. frame a question.

Researchers must formulate an appropriate research question at the beginning. A well‐formulated question will guide many aspects of the review process, including determining eligibility criteria, searching for studies, collecting data from included studies, structuring the syntheses and presenting results. 44 There are some tools that may facilitate the construction of research questions, including PICO, as used in clinical practice 45 ; PEO and SPICE, as used for qualitative research questions 46 , 47 ; and SPIDER, as used for mixed‐methods research. 48

3.2. Form the search strategy

It is crucial for researchers to formulate a search strategy in advance that includes inclusion and exclusion criteria, as well as a standardized data extraction form. The definition of inclusion and exclusion criteria depends on established question elements, such as publication dates, research design, population and results. A reasonable inclusion and exclusion criteria will reduce the risk of bias, increase transparency and make the review systematic. Broad criteria may increase the heterogeneity between studies, and narrow criteria may make it difficult to find studies; therefore, a compromise should be found. 49

3.3. Search of the literature databases

To minimize bias and reduce hampered interpretation of outcomes, the search strategy should be as comprehensive as possible, employing multiple databases, such as PubMed, Embase, Cochrane Central Registry of Controlled Trials, Scopus, Web of Science and Google Scholar. 50 , 51 Removing language restrictions and actively searching for non‐English bibliographic databases may also help researchers to perform a comprehensive meta‐analysis. 52

3.4. Select the articles

The selection or rejection of the included articles should be guided by the criteria. 53 Two independent reviewers may screen the included articles, and any disagreements should be resolved by consensus through discussion. First, the titles and abstracts of all relevant searched papers should be read, and inclusion or exclusion criteria applied to determine whether these papers meet. Then, the full texts of the included articles should be reviewed once more to perform the rejection again. Finally, the reference lists of these articles should be searched to widen the research as much as possible. 54

3.5. Data extraction

A pre‐formed standardized data extraction form should be used to extract data of included studies. All data should be carefully converted using uniform standards. Simultaneous extraction by multiple researchers might also make the extracted data more accurate.

3.6. Assess quality of articles

Checklists and scales are often used to assess the quality of articles. For example, the Cochrane Collaboration's tool 55 is usually used to assess the quality of RCTs, whereas the Newcastle Ottawa Scale 56 is one of the most common method to assess the quality of non‐randomized trials. In addition, Quality Assessment of Diagnostic Accuracy Studies 2 57 is often used to evaluate the quality of diagnostic accuracy studies.

3.7. Test for heterogeneity

Several methods have been proposed to detect and quantify heterogeneity, such as Cochran's Q and I 2 values. Cochran's Q test is used to determine whether there is heterogeneity in primary studies or whether the variation observed is due to chance, 58 but it may be underpowered because of the inclusion of a small number of studies or low event rates. 59 Therefore, p < 0.10 (not 0.05) indicates the presence of heterogeneity given the low statistical strength and insensitivity of Cochran's Q test. 60 Another common method for testing heterogeneity is the I 2 value, which describes the percentage of variation across studies that is attributable to heterogeneity rather than chance; this value does not depend on the number of studies. 61 I 2 values of 25%, 50% and 75% are considered to indicate low, moderate and high heterogeneity, respectively. 60

3.8. Estimate the summary effect

Fixed effects and random effects models are commonly used to estimate the summary effect in a meta‐analysis. 62 Fixed effects models, which consider the variability of the results as “random variation”, simply weight individual studies by their precision (inverse of the variance). Conversely, random effects models assume a different underlying effect for each study and consider this an additional source of variation that is randomly distributed. A substantial difference in the summary effect calculated by fixed effects models and random effects models will be observed only if the studies are markedly heterogeneous (heterogeneity p < 0.10) and the random effects model typically provides wider confidence intervals than the fixed effect model. 63 , 64

3.9. Evaluate sources of heterogeneity

Several methods have been proposed to explore the possible reasons for heterogeneity. According to factors such as ethnicity, the number of studies or clinical features, subgroup analyses can be performed that divide the total data into several groups to assess the impact of a potential source of heterogeneity. Sensitivity analysis is a common approach for examining the sources of heterogeneity on a case‐by‐case basis. 65 In sensitivity analysis, one or more studies are excluded at a time and the impact of removing each or several studies is evaluated on the summary results and the between‐study heterogeneity. Sequential and combinatorial algorithms are usually implemented to evaluate the change in between‐study heterogeneity as one or more studies are excluded from the calculations. 66 Moreover, a meta‐regression model can explain heterogeneity based on study‐level covariates. 67

3.10. Assess publication bias

A funnel plot is a scatterplot that is commonly used to assess publication bias. In a funnel plot, the x ‐axis indicates the study effect and the y ‐axis indicates the study precision, such as the standard error or sample size. 68 , 69 If there is no publication bias, the plot will have a symmetrical inverted funnel; conversely, asymmetry indicates the possibility of publication bias.

3.11. Present results

A forest plot is a valid and useful tool for summarizing the results of a meta‐analysis. In a forest plot, the results from each individual study are shown as a blob or square; the confidence interval, usually representing 95% confidence, is shown as a horizontal line that passes through the square; and the summary effect is shown as a diamond. 70

4. PRINCIPLES OF META‐ANALYSIS PERFORMANCE

There are four most important principles of meta‐analysis performance that should be emphasized. First, the search scope of meta‐analysis should be expanded as much as possible to contain all relevant research, and it is important to remove language restrictions and actively search for non‐English bibliographic databases. Second, any meta‐analysis should include studies selected based on strict criteria established in advance. Third, appropriate tools must be selected to evaluate the quality of evidence according to different types of primary studies. Fourth, the most suitable statistical model should be chosen for the meta‐analysis and a weighted mean estimate of the effect size should be calculated. Finally, the possible causes of heterogeneity should be identified and publication bias in the meta‐analysis must be assessed.

5. STRENGTHS OF META‐ANALYSIS

Meta‐analyses have several strengths. First, a major advantage is their ability to improve the precision of effect estimates with considerably increased statistical power, which is particularly important when the power of the primary study is limited as a result of the small sample size. Second, a meta‐analysis has more power to detect small but clinically significant effects and to examine the effectiveness of interventions in demographic or clinical subgroups of participants, which can help researchers identify beneficial (or harmful) effects in specific groups of patients. 71 , 72 Third, meta‐analyses can be used to analyze rare outcomes and outcomes that individual studies were not designed to test (e.g. adverse events). Fourth, meta‐analyses can be used to examine heterogeneity in study results and explore possible sources in case this heterogeneity would lead to bias from “mixing apples and oranges”. 73 Furthermore, meta‐analyses can compare the effectiveness of various interventions, supplement the existing evidence, and then offer a rational and helpful way of addressing a series of practical difficulties that plague healthcare providers and researchers. Lastly, meta‐analyses may resolve disputes caused by apparently conflicting studies, determine whether new studies are necessary for further investigation and generate new hypotheses for future studies. 7 , 74

6. LIMITATIONS OF META‐ANALYSIS

6.1. missing related research.

The primary limitation of a meta‐analysis is missing related research. Even in the ideal case in which all relevant studies are available, a faulty search strategy can miss some of these studies. Small differences in search strategies can produce large differences in the set of studies found. 75 When searching databases, relevant research can be missed as a result of the omission of keywords. The search engine (e.g. PubMed, Google) may also affect the type and number of studies that are found. 76 Moreover, it may be impossible to identify all relevant evidence if the search scope is limited to one or two databases. 51 , 77 Finally, language restrictions and the failure to search non‐English bibliographic databases may also lead to an incomplete meta‐analysis. 52 Comprehensive search strategies for different databases and languages might help solve this issue.

6.2. Publication bias

Publication bias means that positive findings are more likely to be published and then identified through literature searches rather than ambiguous or negative findings. 78 This is an important and key source of bias that is recognized as a potential threat to the validity of results. 79 The real research effect may be exaggerated or even falsely positive if only published articles are included. 80 For example, based on studies registered with the US Food and Drug Administration, Turner et al . 81 reviewed 74 trials of 12 antidepressants to assess publication bias and its influence on apparent efficacy. It was found that antidepressant studies with favorable outcomes were 16 times more likely to be published than those with unfavorable outcomes, and the apparent efficacy of antidepressants increased between 11% and 69% when the non‐published studies were not included in the analysis. 81 Moreover, failing to identify and include non‐English language studies may also increase publication bias. 82 Therefore, all relevant studies should be identified to reduce the impact of publication bias on meta‐analysis.

6.3. Selection bias

Because many of the studies identified are not directly related to the subject of the meta‐analysis, it is crucial for researchers to select which studies to include based on defined criteria. Failing to evaluate, select or reject relevant studies based on stricter criteria regarding the study quality may also increase the possibility of selection bias. Missing or inappropriate quality assessment tools may lead to the inclusion of low‐quality studies. If a meta‐analysis includes low‐quality studies, its results will be biased and incorrect, which is also called “garbage in, garbage out”. 83 Strictly defined criteria for included studies and scoring by at least two researchers might help reduce the possibility of selection bias. 84 , 85

6.4. Unavailability of information

The best‐case scenario for meta‐analyses is the availability of individual participant data. However, most individual research reports only contain summary results, such as the mean, standard deviation, proportions, relative risk and odds ratio. In addition to the possibility of reporting errors, the lack of information can severely limit the types of analyses and conclusions that can be achieved in a meta‐analysis. For example, the unavailability of information from individual studies may preclude the comparison of effects in predetermined subgroups of participants. Therefore, if feasible, the researchers could contact the author of the primary study for individual participant data.

6.5. Heterogeneity

Although the studies included in a meta‐analysis have the same research hypothesis, there is still the potential for several areas of heterogeneity. 86 Heterogeneity may exist in various parts of the studies’ design and conduct, including participant selection, interventions/exposures or outcomes studied, data collection, data analyses and selective reporting of results. 87 Although the difference of the results can be overcome by assessing the heterogeneity of the studies and performing subgroup analyses, 88 the results of the meta‐analysis may become meaningless and even may obscure the real effect if the selected studies are too heterogeneous to be comparable. For example, Nicolucci et al . 89 conducted a review of 150 published randomized trials on the treatment of lung cancer. Their review showed serious methodological drawbacks and concluded that heterogeneity made the meta‐analysis of existing trials unlikely to be constructive. 89 Therefore, combining the data in meta‐analysis for studies with large heterogeneity is not recommended.

6.6. Misleading funnel plot

Funnel plots are appealing because they are a simple technique used to investigate the possibility of publication bias. However, their objective is to detect a complex effect, which can be misleading. For example, the lack of symmetry in a funnel plot can also be caused by heterogeneity. 90 Another problem with funnel plots is the difficulty of interpreting them when few studies are included. Readers may also be misled by the choice of axes or the outcome measure. 91 Therefore, in the absence of a consensus on how the plot should be constructed, asymmetrical funnel plots should be interpreted cautiously. 91

6.7. Inevitable subjectivity

Researchers must make numerous judgments when performing meta‐analyses, 92 which inevitably introduces considerable subjectivity into the meta‐analysis review process. For example, there is often a certain amount of subjectivity when deciding how similar studies should be before it is appropriate to combine them. To minimize subjectivity, at least two researchers should jointly conduct a meta‐analysis and reach a consensus.

The explosion of medical information and differences between individual studies make it almost impossible for healthcare providers to make the best clinical decisions. Meta‐analyses, which summarize all eligible evidence and quantitatively synthesize individual results on a specific clinical question, have become the best available evidence for informing clinical practice and are increasingly important in medical research. This article has described the basic concept, common methods, principles, steps, strengths and limitations of meta‐analyses to help clinicians and investigators better understand meta‐analyses and make clinical decisions based on the best evidence.

AUTHOR CONTRIBUTIONS

CM designed and directed the study. XMW and XRZ had primary responsibility for drafting the manuscript. CM, ZHL, WFZ and PY provided insightful discussions and suggestions. All authors critically reviewed the manuscript for important intellectual content.

CONFLICT OF INTEREST STATEMENT

The authors declare that they have no conflicts of interest.

ACKNOWLEDGEMENTS

This work was supported by the Project Supported by Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019 to CM) and the Construction of High‐level University of Guangdong (G820332010, G618339167 and G618339164 to CM). The funders played no role in the study design or implementation; manuscript preparation, review or approval; or the decision to submit the manuscript for publication.

Wang X‐M, Zhang X‐R, Li Z‐H, Zhong W‐F, Yang P, Mao C. A brief introduction of meta‐analyses in clinical practice and research . J Gene Med . 2021; 23 :e3312. 10.1002/jgm.3312 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Xiao‐Meng Wang and Xi‐Ru Zhang contributed equally to this work.

DATA AVAILABILITY STATEMENT

  • Open access
  • Published: 19 August 2024

Patient reported measures of continuity of care and health outcomes: a systematic review

  • Patrick Burch 1 ,
  • Alex Walter 1 ,
  • Stuart Stewart 1 &
  • Peter Bower 1  

BMC Primary Care volume  25 , Article number:  309 ( 2024 ) Cite this article

120 Accesses

7 Altmetric

Metrics details

There is a considerable amount of research showing an association between continuity of care and improved health outcomes. However, the methods used in most studies examine only the pattern of interactions between patients and clinicians through administrative measures of continuity. The patient experience of continuity can also be measured by using patient reported experience measures. Unlike administrative measures, these can allow elements of continuity such as the presence of information or how joined up care is between providers to be measured. Patient experienced continuity is a marker of healthcare quality in its own right. However, it is unclear if, like administrative measures, patient reported continuity is also linked to positive health outcomes.

Cohort and interventional studies that examined the relationship between patient reported continuity of care and a health outcome were eligible for inclusion. Medline, EMBASE, CINAHL and the Cochrane Library were searched in April 2021. Citation searching of published continuity measures was also performed. QUIP and Cochrane risk of bias tools were used to assess study quality. A box-score method was used for study synthesis.

Nineteen studies were eligible for inclusion. 15 studies measured continuity using a validated, multifactorial questionnaire or the continuity/co-ordination subscale of another instrument. Two studies placed patients into discrete groups of continuity based on pre-defined questions, one used a bespoke questionnaire, one calculated an administrative measure of continuity using patient reported data. Outcome measures examined were quality of life ( n  = 11), self-reported health status ( n  = 8), emergency department use or hospitalisation ( n  = 7), indicators of function or wellbeing ( n  = 6), mortality ( n  = 4) and physiological measures ( n  = 2). Analysis was limited by the relatively small number of hetrogenous studies. The majority of studies showed a link between at least one measure of continuity and one health outcome.

Whilst there is emerging evidence of a link between patient reported continuity and several outcomes, the evidence is not as strong as that for administrative measures of continuity. This may be because administrative measures record something different to patient reported measures, or that studies using patient reported measures are smaller and less able to detect smaller effects. Future research should use larger sample sizes to clarify if a link does exist and what the potential mechanisms underlying such a link could be. When measuring continuity, researchers and health system administrators should carefully consider what type of continuity measure is most appropriate.

Peer Review reports

Introduction

Continuity of primary care is associated with multiple positive outcomes including reduced hospitals admissions, lower costs and a reduction in mortality [ 1 , 2 , 3 ]. Providing continuity is often seen as opposed to providing rapid access to appointments [ 4 ] and many health systems have chosen to focus primary care policy on access rather than continuity [ 5 , 6 , 7 ]. Continuity has fallen in several primary care systems and this has led to calls to improve it [ 8 , 9 ]. However, it is sometimes unclear exactly what continuity is and what should be improved.

In its most basic form, continuity of care can be defined as a continuous relationship between a patient and a healthcare professional [ 10 ]. However, from the patient perspective, continuity of care can also be experienced as joined up seamless care from multiple providers [ 11 ].

One of the most commonly cited models of continuity by Haggerty et al. defines continuity as

“ …the degree to which a series of discrete healthcare events is experienced as coherent and connected and consistent with the patient’s medical needs and personal context. Continuity of care is distinguished from other attributes of care by two core elements—care over time and the focus on individual patients” [ 11 ].

It then breaks continuity down into three parts (see Table  1 ) [ 11 ]. Other academic models of patient continuity exists but they contain elements which are broadly analogous [ 10 , 12 , 13 , 14 ].

Continuity can be measured through administrative measures or by asking patients about their experience of continuity [ 16 ]. Administrative mesures are commonly used as they allow continuity to be calculated easily for large numbers of patient consultations. Administraive measures capture one element of continuity – the frequency or pattern of professionals seen by a patient [ 16 , 17 ]. There are multiple studies and several systematic reviews showing that better health outcomes are associated with administrative measures of continuity of care [ 1 , 2 , 18 , 19 ]. One of the most recent of these reviews used a box-score method to assess the relationship between reduced mortality and continuity (i.e., counting the numbers of studies reporting significant and non-significant relationships) [ 18 ]. The review examined thirteen studies and found a positive association in nine. Administrative measures of continuity cannot capture aspects of continuity such as informational or management continuity or the nature of the relationship between the patient and clinicians. To address this, there have been several patient-reported experience measures (PREMs) of continuity developed that attempt to capture the patient experience of continuity beyond the pattern in which they see particular clinicians [ 14 , 17 , 20 , 21 ]. Studies have shown a variable correlation between administrative and patient reported measures of continity and their relationship to health outcomes [ 22 ]. Pearson correlation co-efficients vary between 0.11 and 0.87 depending on what is measured and how [ 23 , 24 ]. This suggests that they are capturing different things and that both measures have their uses and drawbacks [ 23 , 25 ]. Patients may have good administrative measures of continuity but report a poor experience. Conversely, administrative measures of continuity may be poor, but a patient may report a high level of experienced continuity. Patient experienced continuity and patient satisfaction with healthcare is an aim in its own right in many healthcare systems [ 26 ]. Whilst this is laudable, it may be unclear to policy makers if prioritising patient-experienced continuity will improve health outcomes.

This review seeks to answer two questions.

Is patient reported continuity of care associated with positive health outcomes?

Are particular types of patient reported continuity (relational, informational or management) associated with positive health outcomes?

A review protocol was registered with PROSPERO in June 2021 (ID: CRD42021246606).

Search strategy

A structured search was undertaken using appropriate search terms on Medline, EMBASE, CINAHL and the Cochrane Library in April 2021 (see Appendix ). The searches were limited to the last 20 years. This age limitation reflects the period in which the more holistic description of continuity (as exemplified by Haggerty et al. 2003) became more prominent. In addition to database searches, existing reviews of PREMs of continuity and co-ordination were searched for appropriate measures. Citation searching of these measures was then undertaken to locate studies that used these outcome measures.

Inclusion criteria

Full text papers were reviewed if the title or abstract suggested that the paper measured (a) continuity through a PREM and (b) a health outcome. Health outcomes were defined as outcomes that measured a direct effect on patient health (e.g., health status) or patient use of emergency or inpatient care. Papers with outcomes relating to patient satisfaction or satisfaction with a particular service were excluded as were process measures (such as quality of documentation, cost to health care provider). Cohort and interventional studies were eligible for inclusion, if they reported data on the relationship between continuity and a relevant health outcome. Cross-sectional studies were excluded because of the risk of recall bias [ 27 ].

The majority of participants in a study had to be aged over 16, based in a healthcare setting and receiving healthcare from healthcare professionals (medical or non-medical). We felt that patients under 16 were unlikely to be asked to fill out continuity PREMs. Studies that used PREMs to quantitatively measure one or more elements of experienced continuity of care or coordination were eligible for inclusion [ 11 ]. Any PREMs that could map to one or more of the three key elements of Haggerty’s definition (Table  1 ) definition were eligible for inclusion. The types of continuity measured by each study were mapped to the Haggerty concepts of continuity by at least two reviewers independently. Our search also included patient reported measures of co-ordination, as a previous review of continuity PREMs highlighted the conceptual overlap between patient experienced continuity and some measures of patient experienced co-ordination [ 17 ]. Whilst there are different definitions of co-ordination, the concept of patient perceived co-ordination is arguably the same as management continuity [ 13 , 14 , 28 ]. Patient reported measures of care co-ordination were reviewed by two reviewers to see whether they measured the concept of management continuity. Because of the overlap between concepts of continuity and other theories (e.g., patient-centred care, quality of care), in studies where it was not clear that continuity was being measured, agreement, with documented reasons, was made about their inclusion/exclusion after discussion between three of the reviewers (PB, SS and AW). Disagreements were resolved by documented group discussion. Some PREMs measured concepts of continuity alongside other concepts such as access. These studies were eligible for inclusion only if measurements of continuity were reported and analysed separately.

Data abstraction

All titles/abstracts were initially screened by one reviewer (PB). 20% of the abstracts were independently reviewed by 2 other reviewers (SS and AW), blinded to the results of the initial screening. All full text reviews were done by two blinded reviewers independently. Disagreements were resolved by group discussion between PB, SS, AW and PBo. Excel was used for collation of search results, titles, and abstracts. Rayyan was used in the full text review process.

Data extraction was performed independently by two reviewers. The following data were extracted to an Excel spreadsheet: study design, setting, participant inclusion criteria, method of measurement of continuity, type of continuity measured, outcomes analysed, temporal relationship of continuity to outcomes in the study, co-variates, and quantitative data for continuity measures and outcomes. Disagreements were resolved by documented discussion or involvement of a third reviewer.

Study risk of bias assessment

Cohort studies were assessed for risk of bias at a study level using the QUIP tool by two reviewers acting independently [ 29 ]. Trials were assessed using the Cochrane risk of bias tool. The use of the QUIP tool was a deviation from the review protocol as the Ottowa-Newcastle tool in the protocol was less suitable for use on the type of cohort studies returned in the search. Any disagreements in rating were resolved by documented discussion.

As outlined in our original protocol, our preferred analysis strategy was to perform meta-analysis. However, we were unable to do this as insufficient numbers of studies reported data amenable to the calculation of an effect size. Instead, we used a box-score method [ 30 ]. This involved assessing and tabulating the relationship between each continuity measure and each outcome in each study. These relationships were recorded as either positive, negative or non-significant (using a conventional p  value of < 0.05 as our cut off for significance). Advantages and disadvantages of this method are explored in the discussion section. Where a study used both bivariate analysis and multivariate analysis, the results from the multivariate analysis were extracted. Results were marked as “mixed” where more than one measure for an outcome was used and the significance/direction differed between outcome measures. Sensitivity analysis of study quality and size was carried out.

Figure  1 shows the search results and number of inclusions/exclusions. Studies were excluded for a number of reasons including; having inappropriate outcome measures [ 31 ], focusing on non-adult patient populations [ 32 ] and reporting insufficient data to examine the relationship between continuity and outcomes [ 33 ]. All studies are described in Table  2 .

figure 1

Results of search strategy –NB. 18 studies provided 19 assessments

Study settings

Studies took place in 9 different, mostly economically developed, countries. Studies were set in primary care [ 5 ], hospital/specialist outpatient [ 7 ], hospital in-patient [ 5 ], or the general population [ 2 ].

Study design and assessment of bias

All included studies, apart from one trial [ 34 ], were cohort studies. Study duration varied from 2 months to 5 years. Most studies were rated as being low-moderate or moderate risk of bias, due to outcomes being patient reported, issues with recruitment, inadequately describing cohort populations, significant rates of attrition and/or failure to account for patients lost to follow up.

Measurement of continuity

The majority of the studies (15/19) measured continuity using a validated, multifactorial patient reported measure of continuity or using the continuity/co-ordination subscale of another validated instrument. Two studies placed patients into discrete groups of continuity based on answers to pre-defined questions (e.g., do you have a regular GP that you see? ) [ 35 , 36 ], one used a bespoke questionnaire [ 34 ], and one calculated an administrative measure of continuity (UPC – Usual Provider of Care index) using patient reported visit data collected from patient interviews [ 37 ]. Ten studies reported more than one type of patient reported continuity, four reported relational continuity, three reported overall continuity, one informational continuity and one management continuity.

Study outcomes

Most of the studies reported more than one outcome measure. To enable comparison across studies we grouped the most common outcome measures together. These were quality of life ( n  = 11), self-reported health status ( n  = 8), emergency department use or hospitalisation ( n  = 7), and mortality ( n  = 4). Other outcomes reported included physiological parameters e.g., blood pressure or blood test parameters ( n  = 2) [ 36 , 38 ] and other indicators of functioning or well-being ( n  = 6).

Association between outcomes and continuity measures

Twelve of the nineteen studies demonstrated at least one statistically significant association between at least one patient reported measure of continuity and at least one outcome. However, ten of these studies examined more than one outcome measure. Two of these significant studies showed negative findings; better informational continuity was associated with worse self-reported disease status [ 35 ] and improved continuity was related to increased admissions and ED use [ 39 ]. Four studies demonstrated no association between measures of continuity and any health outcomes.

The four most commonly reported types of outcomes were analysed separately (Table  3 ). All the outcomes had a majority of studies showing no significant association with continuity or a mixed/unclear association. Sensitivity analysis of the results in Table  3 , excluding high and moderate-high risk studies, did not change this finding. Each of these outcomes were also examined in relation to the type of continuity that was measured (Table  4 ) Apart from the relationship between informational continuity and quality or life, all other combinations of continuity type/outcome had a majority of studies showing no significant association with continuity or a mixed/unclear association. However, the relationship between informational continuity and quality of life was only examined in two separate studies [ 40 , 41 ]. One of these studies contained less than 100 patients and was removed when sensitivity analysis of study size was carried out [ 40 ]. Sensitivity analysis of the results in Table  4 , excluding high and moderate-high risk studies, did not change the findings.

Two sensitivity analyses were carried out (a) removing all studies with less than 100 participants and (b) those with less than 1000 participants. There were only five studies with at least 1000 participants. These all showed at least one positive association between continuity and health outcome. Of note, three of these five studies examined emergency department use/readmissions and all three found a significant positive association.

Continuity of care is a multi-dimensional concept that is often linked to positive health outcomes. There is strong evidence that administrative measures of continuity are associated with improved health outcomes including a reduction in mortality, healthcare costs and utilisation of healthcare [ 3 , 18 , 19 ]. Our interpretation of the evidence in this review is that there is an emerging link between patient reported continuity and health outcomes. Most studies in the review contained at least one significant association between continuity and a health outcome. However, when outcome measures were examined individually, the findings were less consistent.

The evidence for a link between patient reported continuity is not as strong as that for administrative measures. There are several possible explanations for this. The review retrieved a relatively small number of studies that examined a range of different outcomes, in different patient populations, in different settings, using different outcomes, and different measures of continuity. This resulted in small numbers of studies examining the relationship of a particular measure of continuity with a particular outcome (Table  4 ). The studies in the review took place in a wide variety of country and healthcare settings and it may be that the effects of continuity vary in different contexts. Finally, in comparison to studies of administrative measures of continuity, the studies in this review were small: the median number of participants in the studies was 486, compared to 39,249 in a recent systematic review examining administrative measures of continuity [ 18 ]. Smaller studies are less able to detect small effect sizes and this may be the principle reason for the difference between the results of this review and previous reviews of administrative measures of continuity. When studies with less than 1000 participants were excluded, all remaining studies showed at least one positive finding and there was a consistent association between reduction in emergency department use/re-admissions and continuity. This suggests that a modest association between certain outcomes and patient reported continuity may be present but, due to effect size, larger studies are needed to demonstrate it. The box score method does not take account of differential size of studies.

Continuity is not a concept that is universally agreed upon. We mapped concepts of continuity onto the commonly used Haggerty framework [ 11 ]. Apart from the use of the Nijmegen Continuity of care questionnaire in three studies [ 42 ], all studies measured continuity using different methods and concepts of continuity. We could have used other theoretical constructs of continuity for the mapping of measures. It was not possible to find the exact questions asked of patients in every study. We therefore mapped several of the continuity measures based on higher level descriptions given by the authors. The diversity of patient measures may account for some of the variability in findings between studies. However, it may be that the nature of continuity captured by patient reported measures is less closely linked to health outcomes than that captured by administrative measures. Administrative measures capture the pattern of interactions between patients and clinicians. All studies in this review (apart from Study 18) use PREMs that attempt to capture something different to the pattern in which a patient sees a clinician. Depending on the specific measure used, this includes: aspects of information transfer between services, how joined up care was between different providers and the nature of the patient-clinician relationship. PREMs can only capture what the patient perceives and remembers. The experience of continuity for the patient is important in its own right. However, it may be that the aspects of continuity that are most linked to positive health outcomes are best reflected by administrative measures. Sidaway-Lee et al. have hypothesised why relational continuity may be linked to health outcomes [ 43 ]. This includes the ability for a clinician to think more holistically and the motivation to “go the extra mile” for a patient. Whilst these are difficult to measure directly, it may be that administrative measures are a better proxy marker than PREMs for these aspects of continuity.

Conclusions/future work

This review shows a potential emerging relationship between patient reported continuity and health outcomes. However, the evidence for this association is currently weaker than that demonstrated in previous reviews of administrative measures of continuity.

If continuity is to be measured and improved, as is being proposed in some health systems [ 44 ], these findings have potential implications as to what type of measure we should use. Measurement of health system performance often drives change [ 45 ]. Health systems may respond to calls to improve continuity differently, depending on how continuity is measured. Continuity PREMs are important and patient experienced continuity should be a goal in its own right. However, it is the fact that continuity is linked to multiple positive health care and health system outcomes that is often given as the reason for pursing it as a goal [ 8 , 44 , 46 ]. Whilst this review shows there is emerging evidence of a link, it is not as strong as that found in studies of administrative measures. If, as has been shown in other work, PREMS and administrative measures are looking at different things [ 23 , 24 ], we need to choose our measures of continuity carefully.

Larger studies are required to confirm the emerging link between patient experienced continuity and outcomes shown in this paper. Future studies, where possible, should collect both administrative and patient reported measures of continuity and seek to understand the relative importance of the three different aspects of continuity (relational, informational, managerial). The relationship between patient experienced continuity and outcomes is likely to vary between different groups and future work should examine differential effects in different patient populations There are now several validated measures of patient experienced continuity [ 17 , 20 , 21 , 42 ]. Whilst there may be an argument more should be developed, the use of a standardised questionnaire (such as the Nijmegen questionnaire) where possible, would enable closer comparison between patient experiences in different healthcare settings.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Gray DJP, Sidaway-Lee K, White E, Thorne A, Evans PH. Continuity of care with doctors - a matter of life and death? A systematic review of continuity of care and mortality. BMJ Open. 2018;8(6):1–12.

Google Scholar  

Barker I, Steventon A, Deeny SR. Association between continuity of care in general practice and hospital admissions for ambulatory care sensitive conditions: cross sectional study of routinely collected, person level data. BMJ Online. 2017;356.

Bazemore A, Merenstein Z, Handler L, Saultz JW. The impact of interpersonal continuity of primary care on health care costs and use: a critical review. Ann Fam Med. 2023;21(3):274–9.

Article   PubMed   PubMed Central   Google Scholar  

Palmer W, Hemmings N, Rosen R, Keeble E, Williams S, Imison C. Improving access and continuity in general practice. The Nuffield Trust; 2018 [cited 2022 Jan 15]. https://www.nuffieldtrust.org.uk/research/improving-access-and-continuity-in-general-practice

Pettigrew LM, Kumpunen S, Rosen R, Posaner R, Mays N. Lessons for ‘large-scale’ general practice provider organisations in England from other inter-organisational healthcare collaborations. Health Policy. 2019;123(1):51–61.

Article   PubMed   Google Scholar  

Glenister KM, Guymer J, Bourke L, Simmons D. Characteristics of patients who access zero, one or multiple general practices and reasons for their choices: a study in regional Australia. BMC Fam Pract. 2021;22(1):2.

Kringos D, Boerma W, Bourgueil Y, Cartier T, Dedeu T, Hasvold T, et al. The strength of primary care in Europe: an international comparative study. Br J Gen Pract. 2013;63(616):e742–50.

Salisbury H. Helen Salisbury: everyone benefits from continuity of care. BMJ. 2023;382:p1870.

Article   Google Scholar  

Gray DP, Sidaway-Lee K, Johns C, Rickenbach M, Evans PH. Can general practice still provide meaningful continuity of care? BMJ. 2023;383:e074584.

Ladds E, Greenhalgh T. Modernising continuity: a new conceptual framework. Br J Gen Pr. 2023;73(731):246–8.

Haggerty JL, Reid, Robert, Freeman G, Starfield B, Adair CE, McKendry R. Continuity of care: a multidisciplinary review. BMJ. 2003;327(7425):1219–21.

Freeman G, Shepperd S, Robinson I, Ehrich K, Richards S, Pitman P et al. Continuity of care continuity of care report of a scoping exercise for the national co-ordinating centre for NHS service delivery and organisation R & D. 2001 [cited 2020 Oct 15]. https://njl-admin.nihr.ac.uk/document/download/2027166

Saultz JW. Defining and measuring interpersonal continuity of care. Ann Fam Med. 2003;1(3):134–43.

Uijen AA, Schers HJ, Schellevis FG. Van den bosch WJHM. How unique is continuity of care? A review of continuity and related concepts. Fam Pract. 2012;29(3):264–71.

Murphy M, Salisbury C. Relational continuity and patients’ perception of GP trust and respect: a qualitative study. Br J Gen Pr. 2020;70(698):e676–83.

Gray DP, Sidaway-Lee K, Whitaker P, Evans P. Which methods are most practicable for measuring continuity within general practices? Br J Gen Pract. 2023;73(731):279–82.

Uijen AA, Schers HJ. Which questionnaire to use when measuring continuity of care. J Clin Epidemiol. 2012;65(5):577–8.

Baker R, Bankart MJ, Freeman GK, Haggerty JL, Nockels KH. Primary medical care continuity and patient mortality. Br J Gen Pr. 2020;70(698):E600–11.

Van Walraven C, Oake N, Jennings A, Forster AJ. The association between continuity of care and outcomes: a systematic and critical review. J Eval Clin Pr. 2010;16(5):947–56.

Aller MB, Vargas I, Garcia-Subirats I, Coderch J, Colomés L, Llopart JR, et al. A tool for assessing continuity of care across care levels: an extended psychometric validation of the CCAENA questionnaire. Int J Integr Care. 2013;13(OCT/DEC):1–11.

Haggerty JL, Roberge D, Freeman GK, Beaulieu C, Bréton M. Validation of a generic measure of continuity of care: when patients encounter several clinicians. Ann Fam Med. 2012;10(5):443–51.

Bentler SE, Morgan RO, Virnig BA, Wolinsky FD, Hernandez-Boussard T. The association of longitudinal and interpersonal continuity of care with emergency department use, hospitalization, and mortality among medicare beneficiaries. PLoS ONE. 2014;9(12):1–18.

Bentler SE, Morgan RO, Virnig BA, Wolinsky FD. Do claims-based continuity of care measures reflect the patient perspective? Med Care Res Rev. 2014;71(2):156–73.

Rodriguez HP, Marshall RE, Rogers WH, Safran DG. Primary care physician visit continuity: a comparison of patient-reported and administratively derived measures. J Gen Intern Med. 2008;23(9):1499–502.

Adler R, Vasiliadis A, Bickell N. The relationship between continuity and patient satisfaction: a systematic review. Fam Pr. 2010;27(2):171–8.

Bodenheimer T, Sinsky C. From triple to quadruple aim: care of the patient requires care of the provider. Ann Fam Med. 2014;12(6):573–6.

Althubaiti A. Information bias in health research: definition, pitfalls, and adjustment methods. J Multidiscip Healthc. 2016;9:211–7.

Schultz EM, McDonald KM. What is care coordination? Int J Care Coord. 2014;17(1–2):5–24.

Hayden, van der Windt, Danielle, Cartwright, Jennifer, Cote, Pierre, Bombardier, Claire. Assessing Bias in studies of prognostic factors. Ann Intern Med. 2013;158(4):280–6.

Green BF, Hall JA. Quantitative methods for literature reviews. Annu Rev Psychol. 1984;35(1):37–54.

Article   CAS   PubMed   Google Scholar  

Safran DG, Montgomery JE, Chang H, Murphy J, Rogers WH. Switching doctors: predictors of voluntary disenrollment from a primary physician’s practice. J Fam Pract. 2001;50(2):130–6.

CAS   PubMed   Google Scholar  

Burns T, Catty J, Harvey K, White S, Jones IR, McLaren S, et al. Continuity of care for carers of people with severe mental illness: results of a longitudinal study. Int J Soc Psychiatry. 2013;59(7):663–70.

Engelhardt JB, Rizzo VM, Della Penna RD, Feigenbaum PA, Kirkland KA, Nicholson JS, et al. Effectiveness of care coordination and health counseling in advancing illness. Am J Manag Care. 2009;15(11):817–25.

PubMed   Google Scholar  

Uijen AA, Bischoff EWMA, Schellevis FG, Bor HHJ, Van Den Bosch WJHM, Schers HJ. Continuity in different care modes and its relationship to quality of life: a randomised controlled trial in patients with COPD. Br J Gen Pr. 2012;62(599):422–8.

Humphries C, Jaganathan S, Panniyammakal J, Singh S, Dorairaj P, Price M, et al. Investigating discharge communication for chronic disease patients in three hospitals in India. PLoS ONE. 2020;15(4):1–20.

Konrad TR, Howard DL, Edwards LJ, Ivanova A, Carey TS. Physician-patient racial concordance, continuity of care, and patterns of care for hypertension. Am J Public Health. 2005;95(12):2186–90.

Van Walraven C, Taljaard M, Etchells E, Bell CM, Stiell IG, Zarnke K, et al. The independent association of provider and information continuity on outcomes after hospital discharge: implications for hospitalists. J Hosp Med. 2010;5(7):398–405.

Gulliford MC, Naithani S, Morgan M. Continuity of care and intermediate outcomes of type 2 diabetes mellitus. Fam Pr. 2007;24(3):245–51.

Kaneko M, Aoki T, Mori H, Ohta R, Matsuzawa H, Shimabukuro A, et al. Associations of patient experience in primary care with hospitalizations and emergency department visits on isolated islands: a prospective cohort study. J Rural Health. 2019;35(4):498–505.

Beesley VL, Janda M, Burmeister EA, Goldstein D, Gooden H, Merrett ND, et al. Association between pancreatic cancer patients’ perception of their care coordination and patient-reported and survival outcomes. Palliat Support Care. 2018;16(5):534–43.

Valaker I, Fridlund B, Wentzel-Larsen T, Nordrehaug JE, Rotevatn S, Råholm MB, et al. Continuity of care and its associations with self-reported health, clinical characteristics and follow-up services after percutaneous coronary intervention. BMC Health Serv Res. 2020;20(1):1–15.

Uijen AA, Schellevis FG, Van Den Bosch WJHM, Mokkink HGA, Van Weel C, Schers HJ. Nijmegen continuity questionnaire: development and testing of a questionnaire that measures continuity of care. J Clin Epidemiol. 2011;64(12):1391–9.

Sidaway-Lee K, Gray DP, Evans P, Harding A. What mechanisms could link GP relational continuity to patient outcomes ? Br J Gen Pr. 2021;(June):278–81.

House of Commons Health and Social Care Committee. The future of general practice. 2022. https://publications.parliament.uk/pa/cm5803/cmselect/cmhealth/113/report.html

Close J, Byng R, Valderas JM, Britten N, Lloyd H. Quality after the QOF? Before dismantling it, we need a redefined measure of ‘quality’. Br J Gen Pract. 2018;68(672):314–5.

Gray DJP. Continuity of care in general practice. BMJ. 2017;356:j84.

Download references

Acknowledgements

Not applicable.

Patrick Burch carried this work out as part of a PhD Fellowship funded by THIS Institute.

Author information

Authors and affiliations.

Centre for Primary Care and Health Services Research, Institute of Population Health, University of Manchester, Manchester, England

Patrick Burch, Alex Walter, Stuart Stewart & Peter Bower

You can also search for this author in PubMed   Google Scholar

Contributions

PBu conceived the review and performed the searches. PBu, AW and SS performed the paper selections, reviews and data abstractions. PBo helped with the design of the review and was inovlved the reviewer disputes. All authors contributed towards the drafting of the final manuscript.

Corresponding author

Correspondence to Patrick Burch .

Ethics declarations

Ethics approval, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Burch, P., Walter, A., Stewart, S. et al. Patient reported measures of continuity of care and health outcomes: a systematic review. BMC Prim. Care 25 , 309 (2024). https://doi.org/10.1186/s12875-024-02545-8

Download citation

Received : 27 March 2023

Accepted : 29 July 2024

Published : 19 August 2024

DOI : https://doi.org/10.1186/s12875-024-02545-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

BMC Primary Care

ISSN: 2731-4553

meta analysis research question example

medRxiv

The Effectiveness and Cost-Effectiveness of Community Health Workers in Delivering Antenatal Care and Immunization Services to Pregnant Mothers and Children Under 5 in Rural Africa: A Study Protocol for a Systematic Review and Meta-Analysis

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kebby Zumani
  • For correspondence: [email protected] [email protected]
  • ORCID record for Henry Irumba
  • ORCID record for Adebukunola Olajumoke Afolabi
  • ORCID record for Semkelokuhle Dube
  • ORCID record for Vincent Byaruhanga
  • ORCID record for Olaotuyombo George
  • ORCID record for Mark Anum Nortey
  • ORCID record for Emmanuel Owusu Ofori
  • ORCID record for Birungi Bk Valentine
  • ORCID record for Patience Peru
  • ORCID record for Chikosa Kamwengo
  • ORCID record for Amoah Roberta Mensima
  • Info/History
  • Preview PDF

Background Maternal and child health care, particularly antenatal care (ANC) and immunization services, are essential to improving health outcomes in rural Africa. Despite global efforts, access to high-quality health care services remains limited in rural areas, contributing to high maternal and childhood mortality rates. Community Health Workers (CHWs) have been recognized as a promising solution for bridging this gap by providing essential services directly to underserved populations. This systematic review and meta-analysis aims to evaluate the effectiveness and cost-effectiveness of CHWs in delivering antenatal care and immunization services to pregnant mothers and children under 5 in rural Africa.

Methods This review will include randomized controlled trials (RCTs), cohort studies, case-control studies, and observational studies published from 2014 onward. The search strategy will be implemented across multiple databases, including Google Scholar, Academic Info, Cochrane, Refseek, PubMed, and MEDLINE. The primary outcomes will focus on clinical and economic measures, including maternal and child health outcomes and cost-effectiveness of CHW interventions. Data extraction and quality assessment will be conducted independently by two reviewers, with discrepancies resolved through discussion or the involvement of a third reviewer.

Discussion The findings from this review will contribute to the understanding of the role CHWs play in improving maternal and child health outcomes in rural Africa. The results will provide valuable insights for policymakers, health care providers, and stakeholders to inform future interventions and resource allocation strategies.

Registration This protocol is registered in PROSPERO Registration number CRD42024529963.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Email: zumanikebby97{at}gmail.com Zambia

Email: henryenriquejohn{at}gmail.com Uganda

Email: bukieafolabi{at}yahoo.com Nigeria

Email: dubesem2010{at}gmail.com Zimbabwe

Email: bvincent4christ{at}gmail.com Uganda

Email: georgeotun2{at}gmail.com Nigeria

Email: Markanum07{at}gmail.com Ghana

Email emma2ofori{at}yahoo.co.uk Ghana

Email: bbkvalentine{at}gmail.com Uganda

Email: perupatience{at}gmail.com Kenya

Email: chikosakamwengolangson{at}gmail.com Zambia

Email: mensima3199efua{at}gmail.com Ghana

Data Availability

All data produced in the present study are available upon reasonable request to the authors

https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=529963

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Twitter logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Public and Global Health
  • Addiction Medicine (340)
  • Allergy and Immunology (665)
  • Anesthesia (179)
  • Cardiovascular Medicine (2623)
  • Dentistry and Oral Medicine (314)
  • Dermatology (220)
  • Emergency Medicine (396)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (926)
  • Epidemiology (12165)
  • Forensic Medicine (10)
  • Gastroenterology (754)
  • Genetic and Genomic Medicine (4056)
  • Geriatric Medicine (384)
  • Health Economics (675)
  • Health Informatics (2617)
  • Health Policy (996)
  • Health Systems and Quality Improvement (976)
  • Hematology (359)
  • HIV/AIDS (842)
  • Infectious Diseases (except HIV/AIDS) (13651)
  • Intensive Care and Critical Care Medicine (789)
  • Medical Education (397)
  • Medical Ethics (109)
  • Nephrology (429)
  • Neurology (3826)
  • Nursing (208)
  • Nutrition (569)
  • Obstetrics and Gynecology (733)
  • Occupational and Environmental Health (690)
  • Oncology (2005)
  • Ophthalmology (580)
  • Orthopedics (238)
  • Otolaryngology (303)
  • Pain Medicine (250)
  • Palliative Medicine (73)
  • Pathology (470)
  • Pediatrics (1107)
  • Pharmacology and Therapeutics (458)
  • Primary Care Research (445)
  • Psychiatry and Clinical Psychology (3393)
  • Public and Global Health (6495)
  • Radiology and Imaging (1385)
  • Rehabilitation Medicine and Physical Therapy (805)
  • Respiratory Medicine (868)
  • Rheumatology (399)
  • Sexual and Reproductive Health (405)
  • Sports Medicine (338)
  • Surgery (440)
  • Toxicology (52)
  • Transplantation (185)
  • Urology (165)

IMAGES

  1. PPT

    meta analysis research question example

  2. What is a Meta-Analysis? The benefits and challenges

    meta analysis research question example

  3. How is a meta-analysis performed?

    meta analysis research question example

  4. Meta-Analysis Methodology for Basic Research: A Practical Guide

    meta analysis research question example

  5. PPT

    meta analysis research question example

  6. (PDF) Systematic Reviews With Meta-Analysis: Why, When, and How?

    meta analysis research question example

COMMENTS

  1. How to conduct a meta-analysis in eight steps: a practical guide

    2.1 Step 1: defining the research question. The first step in conducting a meta-analysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed.

  2. PDF How to conduct a meta-analysis in eight steps: a practical guide

    2 Eight steps in conducting a meta‑analysis 2.1 Step 1: dening the research question The rst step in conducting a meta-analysis, as with any other empirical study, is the denition of the research question. Most importantly, the research question deter-mines the realm of constructs to be considered or the type of interventions whose

  3. The 5 min meta-analysis: understanding how to read and ...

    Tip 1: Know the type of outcome than. There are differences in a forest plot depending on the type of outcomes. For a continuous outcome, the mean, standard deviation and number of patients are ...

  4. Systematic Reviews and Meta-Analysis: A Guide for Beginners

    The graphical output of meta-analysis is a forest plot which provides information on individual studies and the pooled effect. Systematic reviews of literature can be undertaken for all types of questions, and all types of study designs. This article highlights the key features of systematic reviews, and is designed to help readers understand ...

  5. Introduction to systematic review and meta-analysis

    It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...

  6. Ten simple rules for carrying out and writing meta-analyses

    Rule 1: Specify the topic and type of the meta-analysis. Considering that a systematic review [ 10] is fundamental for a meta-analysis, you can use the Population, Intervention, Comparison, Outcome (PICO) model to formulate the research question. It is important to verify that there are no published meta-analyses on the specific topic in order ...

  7. How to conduct a meta-analysis in eight steps: a practical guide

    2.1 Step 1: defining the research question. The first step in conducting a meta-analysis, as with any other empirical study, is the. definition of the research question. Most importantly, the ...

  8. Doing a Meta-Analysis: A Practical, Step-by-Step Guide

    Step 1: Defining a Research Question. A well-defined research question is a fundamental starting point for any research synthesis. The research question should guide decisions about which studies to include in the meta-analysis, and which statistical model is most appropriate.

  9. Meta-Analysis

    Definition. "A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning ...

  10. Meta-evaluation of meta-analysis: ten appraisal questions for

    Meta-analysis is a statistical procedure for analyzing the combined data from different studies, and can be a major source of concise up-to-date information. The overall conclusions of a meta-analysis, however, depend heavily on the quality of the meta-analytic process, and an appropriate evaluation of the quality of meta-analysis (meta-evaluation) can be challenging. We outline ten questions ...

  11. A step by step guide for conducting a systematic review and meta

    To do the meta-analysis, we can use free software, such as RevMan or R package meta . In this example, we will use the R package meta. The tutorial of meta package can be accessed through "General Package for Meta-Analysis" tutorial pdf . The R codes and its guidance for meta-analysis done can be found in Additional file 5: File S3.

  12. A Practical Guide to Meta-Analysis

    Define the Research Question. A meta-analysis begins with a question. Common questions addressed in meta-analyses are whether one treatment is more effective than another or if exposure to a certain agent will result in disease. Before beginning an analysis, the investigators need to define the problem or question of interest. ... For example ...

  13. Introduction to Meta-Analysis: A Guide for the Novice

    Similar to any research study, a meta-analysis begins with a research question. Meta-analysis can be used in any situation where the goal is to summarize quantitative findings from empirical studies. It can be used to examine different types of effects, including prevalence rates (e.g., percentage of rape survivors with depression), growth ...

  14. Research Guides: Study Design 101: Meta-Analysis

    Meta-analysis would be used for the following purposes: To establish statistical significance with studies that have conflicting results. To develop a more correct estimate of effect magnitude. To provide a more complex analysis of harms, safety data, and benefits. To examine subgroups with individual numbers that are not statistically significant.

  15. Chapter 10: Analysing data and undertaking meta-analyses

    Meta-analysis is the statistical combination of results from two or more separate studies. Potential advantages of meta-analyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims.

  16. Getting Started

    It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

  17. What is Meta-Analysis? Definition, Research & Examples

    Purpose: Meta-analysis is a statistical technique used to combine and analyze quantitative data from multiple individual studies that address the same research question. The primary aim of meta-analysis is to provide a single summary effect size that quantifies the magnitude and direction of an effect or relationship across studies.

  18. 1. Formulate the Research Question

    Step 1. Formulate the Research Question. A systematic review is based on a pre-defined specific research question (Cochrane Handbook, 1.1).The first step in a systematic review is to determine its focus - you should clearly frame the question(s) the review seeks to answer (Cochrane Handbook, 2.1).It may take you a while to develop a good review question - it is an important step in your review.

  19. Meta-Analytic Methodology for Basic Research: A Practical Guide

    Meta-analysis refers to the statistical analysis of the data from independent primary studies focused on the same question, which aims to generate a quantitative estimate of the studied phenomenon, for example, the effectiveness of the intervention (Gopalakrishnan and Ganeshkumar, 2013). In clinical research, systematic reviews and meta ...

  20. Meta‐analysis and traditional systematic literature reviews—What, why

    The research question for a meta-analysis could be formulated around specific theory (e.g., regulatory fit theory; Motyka et al., 2014) or model (e.g., technology acceptance model; King & He, 2006)). Defining a research question in meta-analysis requires a deep understanding of the topic and literature, and entails specifying a valuable ...

  21. Meta-analysis

    Meta-analysis is the statistical combination of the results of multiple studies addressing a similar research question. An important part of this method involves computing a combined effect size across all of the studies. As such, this statistical approach involves extracting effect sizes and variance measures from various studies. [ 1]

  22. Meta-analysis

    Meta-analysis is an objective examination of published data from many studies of the same research topic identified through a literature search. Through the use of rigorous statistical methods, it ...

  23. Assessing heterogeneity in meta-analysis: Q statistic or I2 index?

    Assessing heterogeneity in meta -analysis 11 being c 2 i i i w cw w = ( 1) where wi is the weighting factor for the ith study assuming a fixed -effects model (wi = 1/ 2 i ˆ ), k is the number of studies, and Q is the statistical test for heterogeneity proposed by Cochran (1954) and defined in equation (12). To avoid negative values for ˆ2 when Q (k - 1), ˆ2 is equated to 0.

  24. A brief introduction of meta‐analyses in clinical practice and research

    Researchers must formulate an appropriate research question at the beginning. A well‐formulated question will guide many aspects of the review process, ... the lack of information can severely limit the types of analyses and conclusions that can be achieved in a meta‐analysis. For example, the unavailability of information from individual ...

  25. Patient reported measures of continuity of care and health outcomes: a

    There is a considerable amount of research showing an association between continuity of care and improved health outcomes. However, the methods used in most studies examine only the pattern of interactions between patients and clinicians through administrative measures of continuity. The patient experience of continuity can also be measured by using patient reported experience measures.

  26. The Effectiveness and Cost-Effectiveness of Community Health Workers in

    Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. ... This systematic review and meta-analysis aims to evaluate the effectiveness and cost-effectiveness of CHWs in delivering antenatal care and immunization services ...