The three Rs of study design

“You can’t fix by analysis what you bungled by design.”
Light, Singer and Willett (1990, page v)

Good study design is paramount. Here are the three Rs of study design.


Suppose you want to investigate the effect of a new support program on the well-being of tertiary students. How might you set up the study? It’s important to compare outcomes for students taking the new program with those accessing “usual care” (the standard support program). How might you determine which students receive usual care and which participate in the new program? One way would be to invite students needing support to try out the new program.

But consider the goal here – if students do better on the new program, can it be recommended on the basis as being the cause of relatively better well-being?

If allocation to the program groups was random, rather than by choice, then the strongest explanation is the different programs involved. Without randomisation, but an apparent effect, we are always left wondering: Did the new program cause the effect, or was it something else? Were the students who chose the new program systematically different from the students who did not? Responding to an invitation, or not, is a systematic difference, which might be associated with the outcome.

Randomisation ensures that the groups are balanced, on average, on other possible causes of the outcome. Researchers sometimes try to do this in other ways, by matching on important characteristics. Randomisation does this automatically; groups are balanced, on average, on known and unknown other causes of the outcome.

The random allocation of treatments means that the different treatments are the strongest explanation of markedly different group results. But this is not the only possible explanation. There could be some small amount of imbalance between the groups that is not completely eradicated by randomisation; this is known as “residual confounding”. For large sample sizes and big differences in the results, residual confounding becomes less and less plausible.


Replication, in statistical terms, refers to the idea of making independent observations on a sample from a population of interest, under the same conditions. This idea is important to ensuring that we are observing the relevant variation to the question of interest. For example, ten blood pressure measurements on one individual will be less variable than ten blood pressure measurements on different individuals.

When observations are strongly associated, perhaps because they are close in time or space, but the data are treated as if they provide independent observations, this is called “false replication”. The results of any analysis assuming independent observations will be incorrect.

In a world of big data, with automated recording of a vast amount of information, it sometimes appears that a data set has a very large sample size. However measurements close together in time and/or space on the same individual can be strongly related, so the data cannot be considered to be a large sample of independent observations.

In designing and analysing a quantitative study we may need to consider if we have replication at all the levels of variation relevant to the study. To understand variation at different levels, considering researching study scores in English in the final year of school. Suppose a random sample of 80 students is obtained from across the whole state. A second sample, also of 80 students, is obtained by choosing four schools at random, and then two classes from each of the schools; this second sample consists of students from eight classes in four schools. In the first example, there is no replication at the class or school level. In the second, as is common in educational contexts, there are three levels of variation, namely: school, class (within school) and student (within class). Replication is relevant to all levels of variation, and this must be properly analysed.

Reducing variation

In an overall statistical model for data, there is usually a systematic part to the model, and a random part. The random component captures the left-over or unexplained variation that is not accounted for by the systematic part. It is important to recognise that the analysis of data using such models is more effective when the amount of the left-over variation is smaller. Minimising the amount of variation is desirable. How can this be achieved? This will depend on the particulars of the study context and requires good domain knowledge. There are however statistical principles that can be used. Blocking is one example. Experimental blocks are where researchers try to identify experimental units that are homogeneous and then apply each treatment of interest in each block. The introduction of blocking was an important development in agricultural field experiments and the technique has been applied to experiments in human and animal sciences.


Light, R.J. Singer, J.D. & Willett, J.B. (1990). By design: Planning research on higher education. Harvard University Press.