In exploring some old data recently to see if it was publishable, I began to contemplate the things we take for granted in scientific studies. Statistics are so commonly used today that any paper is expected to have a smattering of tests and the all important p-values. (The incorrect use of alpha as inter-changeable with p-value is an irksome issue but one that has been explored by others as is the issue of the value of the p-value!) Canned programs like Systat, SPSS, and even SAS or Minitab enable quick data analysis, which has revolutionized many fields of study. As the scientific community continues to grow and the pressure to get grants and publish increases, we should be more concerned the level of statistical knowledge our peers and students have. Knowledge of the software is not knowledge of statistical theory. I would hazard an argument that the latter is more important than the former.
Take for instance, the issue of sample size. In some fields, small sample sizes are all that is available–paleontologists for example–what you find is what you have. Yet, sample size directly impacts the power of the statistical test at hand. So, given a small sample size, a post hoc power analysis is a valuable self-assessment as to whether your conclusions are warranted.
Data distribution is another issue. Without a normal distribution (or a tendency toward it, at the very least), one cannot accurately estimate population parameters. While non-parametric statistical analysis has developed tremendously throughout the years, it is often viewed as a weak version of the more powerful multi-variate analysis. How many people take the time to test whether their data are normally distributed before they run their factor analysis, discriminant function, time regression, and so on?
Every single statistical test has a tremendous number of assumptions and rules. Those who consume studies that have statistical analyses assume (perhaps incorrectly) that each study is properly vetted by reviewers and editors to ensure the study has met these assumptions that the test is the most appropriate? But, we don’t know. Quite simply, we rely on three things. We trust that our colleagues adhere to professional ethical standards. In cases of honest error, we trust that the peer review process flags them for correction before publication. If the other two fail, we trust that studies seeking to replicate the original will be a final failsafe. Unless the topic is a hot one though that ultimate failsafe may never happen or take so long the original unwarranted conclusion takes root and becomes entrenched.
What do we do before results are replicated? Think critically. Treat everything that is published as if one is the peer reviewer. Any gaps in reporting or unanswered questions should raise flags that perhaps the study at hand may not be the best one to cite in one’s own paper without some discussion of the problems.
Thinking critically requires extra time, precious extra time. But, I think it produces a better, stronger community that fosters greater attention to detail and better results overall. I would like to see more of it, as opposed to the sloppy lit reviews, hurried and often trite uncritical discussion of methods, discriminating presentation of results that disguise true problems with the study, and overblown conclusions.