Caring about Power Analyses - ACS Chemical Neuroscience (ACS

Sep 21, 2017 - Everyone should care deeply about statistical power and effect size given that the current estimates of wasted nonreproducible and exag...
0 downloads 9 Views 414KB Size
Viewpoint pubs.acs.org/chemneuro

Caring about Power Analyses Jennifer E. Murray, Scott T. Barrett, Rebecca L. Brock, and Rick A. Bevins* Department of Psychology, University of NebraskaLincoln, Lincoln, Nebraska 68588-0308, United States ABSTRACT: Everyone should care deeply about statistical power and effect size given that the current estimates of wasted nonreproducible and exaggerated research findings range from 50 to 85%, combined with the mandates from the National Institutes of Health (NIH) that proposal reviewers focus on scientific rigor and investigators consider sex as a biological variable. In this Viewpoint, we provide recommendations and resources regarding power analyses aimed at enhancing rigor, and hence decreasing waste, when designing experiments. As part of this effort, we also make recommendations for reporting key statistics that will aid others in estimating sample size based on published research.

KEYWORDS: Data analyses, effect size, replication crisis, scientific rigor



REFLECTIONS In recent times, the word “crisis” is used liberally by scientists, the National Institutes of Health (NIH), and grant review panels to describe the 50−85% estimates of wasted nonreproducible and exaggerated research findings.1,2 The recent writings on this topic make a host of suggestions to address this issue. Some recommendations on this list include a priori registration of studies; more transparent and detailed reporting of methods and design; open sourcing of protocols, data, computer programs, and applications; improved training of scientists in methods, controls, and statistics (e.g., refs 1 and 3). The recommendations on this list vary by degrees of controversy. However, irrespective of individual opinion regarding any particular one of these recommendations, we can all agree that it is in our best interest as a scientific community to generate reliable and replicable data using the most powerful and efficient approach possible for the question being addressed. Across the “crisis” conversation, one particular issue is quite consistent regardless of the source. This issue, at least in our opinion, also seems to be on the less controversial side of the continuum and is readily addressable in comparison to many of the other recommendations. Namely, we need to embrace power analyses and employ best practices for estimating the sample size necessary for a given study design. Further, once these estimates are established, we need to adhere to them and not stop an under-powered study because initial exploratory analyses revealed a hypothesis-confirming outcome. If, however, scientific reasons are not enough, there are also practical reasons for caring about power analyses. Indeed, recent updates to grant submission guidelines from the NIH require evidence of rigorous experimental design. Increasingly, a substantive power analysis is a clear expectation scored by reviewers within the Approach section.4 Unfortunately, many of © XXXX American Chemical Society

us are not trained in quantitative methods beyond giving a formulaic nod to power analyses that may or may not be sufficient to muster a pass by reviewers. Given this situation, it is not surprising that the import of such calculations may not have the influence we wish from a scientific rigor or replication perspective. With these issues in mind, we have written this Viewpoint to emphasize the necessity of careful power analyses and to offer scientists in disciplines allied with chemical neuroscience a concise resource for a priori power analyses. To this end, we include a table of quality resources presenting relatively straightforward approaches to navigating power calculations and their presentation in reports and proposals. Further, we provide a number of recommendations to strengthen this component of rigor in grant submissions, as well as suggestions for reporting power analyses and effect sizes in publication.



RESOURCES

Several questions must be asked before proceeding with a power analysis. What type of data analytic technique will I implement to test each of my hypotheses? What is the anticipated size of each effect based on prior research and theory? Do I anticipate missing data? What sample size would be feasible given my intended study design and methods? Given the potential list of questions to address, it is essential to plan ahead and set aside ample time for the development of a sound approach to estimating power and sample size. In Table 1, we provide a list of resources to facilitate this process. Received: September 6, 2017 Accepted: September 8, 2017

A

DOI: 10.1021/acschemneuro.7b00341 ACS Chem. Neurosci. XXXX, XXX, XXX−XXX

ACS Chemical Neuroscience



RECOMMENDATIONS In this closing section, we offer a few of our personal recommendations for the reader’s consideration. We highlight examples that we have learned from the school of hard knocks (e.g., Make Friends) or areas where we personally have not done so well and where we hope the discipline as a whole, including ourselves, will do better in the future (e.g., Report More) to address these issues regarding rigor and replication. Make Friends. When developing a research plan, consider adding an expert in quantitative methods at the outset. We cannot emphasize enough the suggestion that the quantitative expert be consulted, or even recruited, early in the planning and writing process. There may be an extensive literature review, a needed meta-analysis, and/or further preliminary data required. Of course, each of these take time, and there are no shortcuts to a scientifically rigorous power analysis or data analytic approach. By working early and often with a quantitative expert, if an issue arises in the design of an experiment (e.g., consideration of estrous phase in female rodents), there is someone informed on the team who can provide input on how this will affect the analyses, including sample size needed to detect relevant main effects and interactions. Additionally, the thoughtful inclusion of this quantitative expert as part of the investigative team assures reviewers that unexpected challenges in data analyses will be met. Do Not Skimp. Depending on the complexity of the proposed project, you might need to run a series of power analyses that are specific to each experiment within the proposal, account for a range of possible effect sizes, and consider several plausible sample sizes. Although space is typically quite limited in a proposal, you should include as much detail as possible about the nature of the power analyses. It is important that careful consideration be given to the particular needs of an experimental approach, and if multiple approaches are proposed across the Aims, separate power analyses should also be conducted and described. Evidence of such consideration is vital for positive reviews regarding scientific rigor. With that, we should not have to say that one should avoid the sloppiness inherent in slapping a nondescript sentence at the end of the Data Analysis section. Follow and Assess. Closely related research in the field, including pilot data if available, should be used to calculate each power analysis. A thoroughly conducted power analysis will yield the most appropriate a priori sample size for the proposed experiment. Data derived from this sample size should provide sufficient power to observe an effect if that effect exists. Arbitrarily adding subjects to select conditions, eliminating purported outliers, and stopping the experiment when the desired p-value is attained are inappropriate approaches to ensuring that the results are robust. Some would even say these practices are irresponsible and, most certainly, such approaches are contributing to our current so-called replicability “crisis”. As an added note to an earlier point, there are excellent reasons for not including a subject in the analyses (e.g., missed placement of microelectrode, failure to self-administer a drug). However, the rule here should be transparency. That is, report the starting sample size, the number of subjects not included in analyses, the reason(s) and criteria for exclusion, and, for clarity, the final sample size. Report More. In an ideal scientific world, the a priori power analyses as well as the post hoc effect sizes would be included at each level of research dissemination for every experiment. Such

Table 1. Resources for Navigating a Power Analysis practical guidelines https://stats.idre.ucla.edu/other/mult-pkg/seminars/intro-power/ Murray, D. M. (2008) Sample size, detectable difference, and power. In The Complete Writing Guide to NIH Behavioral Science Grants (Scheier, L. M., and Dewey, W. L., Eds.), pp 89−106, Oxford University Press, New York. McClelland, G. H. (2000) Increasing statistical power without increasing sample size. Am. Psychol. 55, 963−964. planning for certain types of analysis indirect ef fects

direct ef fects

longitudinal data analysis

structural equation modeling

latent class analysis

multilevel modeling

missing data

Thoemmes, F., MacKinnon, D., and Reiser, M. (2010) Power analysis for complex mediational designs using Monte Carlo methods. Structural Equation Modeling: A Multidisciplinary Journal 17, 510−534. Fritz, M. S., and MacKinnon, D. P. (2007) Required sample size to detect the mediated effect. Psychol. Sci. 18, 233−239. Hoyle, R. H., and Kenny, D. A. (1999) Sample size, reliability, and tests of statistical mediation. In Statistical Strategies for Small Sample Research (Hoyle, R. H., Ed.), pp 195−222, Sage, Thousand Oaks, CA. Clelland, G. H., and Judd, C. M. (1993) Statistical difficulties of detecting interactions and moderator effects. Psychol. Bull. 114, 376−390. Rast, P., and Hofer, S. M. (2014) Longitudinal design considerations to optimize power to detect variances and covariances among rates of change: Simulation results based on actual longitudinal studies. Psychol. Methods 19, 133−154. Muthén, B. O., and Curran, P. J. (1997) General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychol. Methods 2, 371−402. MacCallum, R. C., Browne, M. W., and Sugawara, H. M. (1996) Power analysis and determination of sample size for covariance structure modeling. Psychol. Methods 1, 130−149. Muthén, L. K., and Muthén, B. O. (2002) How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling: A Multidisciplinary Journal 9, 599− 620. Nylund, K. L., Asparouhov, T., and Muthén, B. O. (2007) Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal 14, 535−569. Tein, J.-Y., Coxe, S., and Cham, H. (2013) Statistical power to detect the correct number of classes in latent profile analysis. Structural Equation Modeling: A Multidisciplinary Journal 20, 640−657. Snijders, T.A.B. (2005) Power and sample size in multilevel modeling. Encycl. Stat. Behav. Sci. 3, 1570−1573. Raudenbush, S. W., and Xiao-Feng, L. (2001) Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychol. Methods 6, 387−401. Pornprasertmanit, S., and Schneider, W. J. (2014) Accuracy in parameter estimation in cluster randomized designs. Psychol. Methods 19, 356−379. Moerbeek, M., van Breukelen, G.J.P., and Berger, M.P.F. (2000) Design issues for experiments in multilevel populations. J. Educ. Behav. Stat. 25, 271−284. special consideration

Schoemann, A. M., Miller, P., Pornprasermanit, S., and Wu, W. (2014) Using Monte Carlo simulations to determine power and sample size for planned missing designs. Int. J. Behav. Dev. 38, 471−479. Graham J. W., Taylor B. J., and Cumsille P. E. (2001) Planned missing data designs in the analysis of change. In New Methods for the Analysis of Change (Collins, L. M., and Sayer, A.G., Eds.), pp 323− 343, American Psychological Association, Washington, DC. popular software programs for power analyses

G*Power Mplus Optimal Design SAS PROC POWER SPSS SamplePower PASS pwr package for R

Viewpoint

http://www.gpower.hhu.de/en.html https://www.statmodel.com/ (for nested data) http://hlmsoft.net/od/ https://www.sas.com/en_us/software/stat.html https://www.ibm.com/analytics/us/en/technology/spss/ https://www.ncss.com/software/pass/ https://cran.r-project.org/web/packages/pwr/pwr.pdf

B

DOI: 10.1021/acschemneuro.7b00341 ACS Chem. Neurosci. XXXX, XXX, XXX−XXX

Viewpoint

ACS Chemical Neuroscience inclusion is invaluable to research transparency and rigor, as well as providing an ever-expanding foundation upon which to conduct future power analyses and build up the scientific field.5 To affect some of this change, journals should make reporting of statistical power and effects sizes the standard, rather than the exception. Further, null effects and small effect sizes are equally as valuable, and these should be disseminated in the same way as significant effects and large effect sizes so as to reduce unintentional replications in the field. The merits of such reporting are frequently discussed, but very rarely acted upon. Though resolving this disconnect lies in the hands of publishers, journal editors, and reviewers, there are preprint platforms in many fields (e.g., PsyArXiv and bioRxiv) to which such results can be reported. However, journals should change their practice rather than scientists having to solely rely on third-party reporting platforms for this additional information.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Telephone: 402-472-3721. ORCID

Rick A. Bevins: 0000-0002-2438-2264 Funding

R.A.B. and J.E.M. were partially supported by NIH Grants DA034389 and DA039356 while preparing this manuscript. Notes

The opinions expressed in this Viewpoint are solely those of the authors. The authors declare no competing financial interest.



REFERENCES

(1) Ioannidis, J. P.A. (2014) How to make more published research true. PLoS Med. 11 (10), e1001747. (2) Nosek, et al. (2015) Estimating the reproducibility of psychological science. Science 349, aac4716. (3) Breur, T. (2016) Statistical power analysis and the contemporary “crisis” in social sciences. Journal of Marketing Analytics 4, 61−65. (4) National Institutes of Health. (2016). Rigor and reproducibility. Retrieved from https://grants.nih.gov/reproducibility/index.htm on September 5, 2017. (5) Lakens, D. (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology 4, 863.

C

DOI: 10.1021/acschemneuro.7b00341 ACS Chem. Neurosci. XXXX, XXX, XXX−XXX