More Than Half of Psychology Studies Fail Reproducibility Test

More Than Half of Psychology Studies Fail Reproducibility Test

A team of psychologists led by social psychology expert Brian Nosek from the Center for Open Science (USA) attempted to replicate 100 different studies across various fields of psychology, all originally published in leading scientific journals. The results were shocking: according to the researchers, they were able to successfully reproduce only 39 out of the 100 studies to varying degrees of similarity with the original research. A full analysis, published in Science, showed that statistically significant results were obtained in only 36 percent of the studies, and the average level of significance was about half of what was originally reported in the articles.

The project to verify the reproducibility of psychological research began in November 2011, following several reports of fraud, exaggeration, and statistical errors in psychology publications. An independent group selected 100 articles from top peer-reviewed journals, covering a wide range of topics-from how children and adults respond to fear, to comparative studies of different methods for teaching arithmetic.

The researchers then attempted to replicate the described studies. In April 2015, the group published preliminary results. Scientists conducting the replications had to evaluate, based on several criteria, whether or not they were able to reproduce each study. After compiling the evaluations, each study was assigned one of seven ranks, from highest (“practically identical”) to lowest (“no similarity at all”). In the end, 15 studies could not be replicated at all, and the results of 61 articles were deemed non-replicable.

The full report published today revealed an even more troubling picture: despite 97 percent of the original studies claiming statistically significant results, these results were replicated in just over a third of cases, and the average effect size was nearly halved.

Some scientists have suggested that as many as 80 percent of all psychology studies may be non-reproducible, since Nosek’s team selected articles only from the most respected and peer-reviewed journals.

Reproducibility as a Key Scientific Standard

Reproducibility of research results is considered one of the most important criteria for scientific knowledge (alongside Karl Popper’s criterion of falsifiability). In general, reproducibility can be defined as the closeness of results from repeated experiments, provided all conditions are replicated (method, study design, tools, instruments, algorithms, sample selection, etc.). To ensure experiments can be repeated, scientific articles are expected to include detailed descriptions of their methods. A result is considered reliable if several independent research groups can repeat the experiment and obtain similar results.

Currently, there is an ongoing debate about reproducibility in the social sciences and humanities, as well as in interdisciplinary fields like psychology. It is now considered important to distinguish between replication (as described above) and reproducibility for social sciences. In this context, the researcher provides the dataset and a description of the data processing methods (including, for example, code and algorithms). If another scientist, using the same data and methods, obtains similar results, the work is considered reproducible.

Leave a Reply