Sample size calculation is a paramount aspect of medical research. By calculating the sample size, or the right portion of the population which will be tested, researchers can ensure validity and generalizability. Consequently, this can resolve practical demands, such as time delays, ethical regulations, and insufficient funding.
Can researchers be completely certain about the right sample size, though? When there’s only a portion of the general population, professionals cannot be sure if this particular portion is a 100% accurate representation of the whole population. Unfortunately, errors are not rare in research. For instance, when it comes to sample size calculations, sampling error can occur. This phenomenon is defined as the research uncertainty about the generalizability of their results. Thus, examiners should always aim to minimize errors and bias. Note that often confidence intervals are used to ensure generalizability (“Sample Size in Statistics (How to Find it): Excel, Cochran’s Formula, General Tips,” 2018).
Oversized and Undersized Studies
When it comes to calculating the In general, the sample needs to be big enough to guarantee the generalizability of results, and small enough to answer the research questions via the research sources available (Peat, 2011). However, calculating the sample size is always prone to errors, as explained above. In fact, calculating the sample size is a subjective process. For example, in large samples, some outcomes may appear statistically significant, while in clinical settings, they are unimportant. On the other hand, small samples may reveal some important clinical differences, which due to the small sample size do not show any statistical significance.
Experts need to be familiar with such issues to avoid them. In fact, the problems presented above are known as oversized and undersized studies and clinical trials. What’s more, when the study is oversized, type I error may occur. Type I error is defined as the wrong rejection of a true null hypothesis. To be more precise, this happens when the null hypothesis is true, but researchers reject it and accept the alternate one, which is the hypothesis explored by their team. Thus, oversized studies may waste resources and become unethical due to any excessive enrollment of subjects. On the other side, when the study is undersized, both type I and II errors may occur. Type II error is defined as the inability to reject a false null hypothesis. In other words, researchers may fail to reject the null hypothesis, which is untrue when compared to the alternate hypothesis. In fact, a small sample will often lead to inadequate statistical analyses. Undersized studies may also become unethical – simply because they won’t be able to fulfill the research goals of the study (Peat, 2011). Note that when sampling errors occur, it’s better to terminate a study rather than waste resources or mislead subjects.
Power and Probability
Before calculating the sample size, researchers need to consider essential factors, such as the power and the probability of their study. The power of the study is the probability of rejecting a false null hypothesis. In other words, the power of the study reveals if researchers can detect significant changes and reduce the probability of making type II error.
As a matter of fact, this is a vital practical issue. In clinical settings, type I and type II error can lead to different consequences (Peat, 2011). Let’s explore a study about a new cancer treatment. The null hypothesis will be that both the new and the existing treatment are the same. Type I error will mean that the existing treatment will be rejected, and the new intervention accepted. When a new treatment is less effective and more expensive, type I error can cause further damage to patients. Type II error, on the other hand, will mean that the new treatment won’t be accepted, even though it will be more effective than the existing one. Thus, due to type II error, many patients will be denied the new treatment.
Calculating the Sample Size and Subgroups
Technological advancements support research and medicine. Nevertheless, although many computer programs can help researchers calculate the sample size, the best way to perform calculations is to use a table. Tables are clear, accurate, and simple. As a matter of fact, using a table can be extremely helpful for chi-square tests and McNemar’s tests. Note that the McNemar’s test is used mainly for paired nominal data or 2×2 tables (e.g., smokers and non-smokers).
Perhaps one of the major factors that experts need to consider is the type of the measured variables. For instance, for categorical values, such as gender (e.g., male and female), the sample size should be doubled in order to provide clinically and statistically powerful results.
Confidence Intervals and Prevalence
Another important aspect when calculating the sample size is determining the confidence interval and prevalence. As explained above, choosing a sample helps experts find a mean that represents the population of interest – without testing the whole population or wasting resources. However, it’s uncertain if the sample would represent the population. Therefore, experts need to determine a confidence interval. Confidence intervals are likely to show the parameters that would apply to the population. They are based on a confidence level, which often is 95%. That means that, hypothetically, after any consequent sampling procedure within the same population, 95% of the cases will represent the true parameters of interest.
Estimating prevalence is also crucial. Prevalence is defined as the burden of a disease in a population (Ward, 2013). For instance, researchers may need to consider the prevalence of Alzheimer’s in the community (Beckett et al., 1992). Thus, research plans need to ensure an adequate number of cases and controls. Describing the process of calculating the sample size is also an important aspect of research (Peat, 2011). Documentation is crucial.
Considering prevalence is a complicated process. This task is even more complicated for rare events. For example, in a new surgical trial, serious adverse effects may not appear at all. However, that does not mean that there are no risks.
In rare events, to calculate the sample size, the upper limit of risk should be agreed upon. The following formula can help; 3/n (n being the sample size). What do these numbers mean? Let’s say that the upper limit of risk is one in ten, which is 10%. Thus, experts would need a sample, which equals 3/n, or in this case, 3 divided by 10% or 3 divided by 0.1. This makes 30 subjects. So, 30 subjects will be required to help experts fulfill the goals of their research.
Effect of Compliance
There are many factors researchers need to consider. Practical limitations, such as compliance with intervention, often become an obstacle to clinical trials. In general, if non-compliance is high, the size of the intervention groups needs to be doubled (Peat, 2011).
Although tables are beneficial, run-in periods are also a technique which can help experts ensure compliance. To be more precise, the method of eliminating non-compliant subjects during the run-in phase (zation) can help experts maintain the power of the study, especially in studies that measure efficacy. Nevertheless, in studies that measure effectiveness, this approach may reduce generalizability. In the end, goals and practice need to be balanced.
Calculating the Sample Size: Continuous Outcome Variables and Non-parametric Outcome Measurements
Medical research is a complicated process. Deciding on the primary outcome variable is crucial. However, as stated earlier, all factors that affect the topic of interest should be considered. Tables should always be consulted. They can help experts calculate the sample size for continuous outcome variables, for both paired and unpaired variables (Peat, 2011). Note that the effect size is defined as the smallest significant difference between groups. Also, the sample size depends on the standard deviation. Standard deviation is defined as the measure of variability in the collected data. Let’s say that experts need to assess subjects that weigh between 45 kg and 100 kg (Kadam & Bhalerao, 2010). This is a large variability, so a large sample will be needed.
However, if the variables are not normally distributed or if they are non-parametric, standard deviation cannot be calculated (in case there are more than two categories, and when data is collected via Borg and Likert scales). Again, describing the goals and the statistical procedures employed is vital.
Cases and Controls: Increasing the Power of the Study
Another research method is to balance the number of cases and controls. In rare events, populations, and diseases, the power of the study can be improved by enrolling more controls; for instance, more than two controls for each case. This is extremely helpful for testing the efficacy of new or expensive interventions (Peat, 2011).
Note that trade-off should also be considered. In simple words, the trade-off effect is defined as the decision of losing one quality to gain another.
Odds Ratio and Relative Risk
Odds ratio or the association between an exposure and an outcome is also Szumilas, 2010). Odds ratio (OR) can reveal if an outcome occurs as a result of exposure of interest. In other words, the odds ratio can reveal if a particular exposure is a risk factor for a disease. When the odds ratio is calculated and equals one, then the exposure does not affect the outcome. If this number is higher, the risk is also higher. Interestingly, logistic regression and confidence intervals can also be employed to determine the odds ratio (Peat, 2011).
When stratification based on confounders or matched case-control studies are performed, we should mention that overmatching is not a good approach. Overmatching can lead to low efficacy.
Correlation coefficients are also helpful to calculate the sample size. Again, tables are vital. In general, correlations show how strong the relationship between the two variables is. Usually, in linear regression analysis, Pearson’s correlation is used as a valuable indicator.
Note that the p-value should differ from zero to be significant (Peat, 2011). The p-values are described as the level of significance. Usually, p<0.05 is accepted as significant. That means that the probability of observing changes due to chance (not intervention) is 5%. As a matter of fact, as explained earlier, statistically significant associations do not always indicate clinically important differences.
Repeatability and Agreement
When calculating the sample size, no matter what variables or procedures have been employed, repeatability and agreement are two factors that should be considered.
To ensure repeatability, experts can increase the sample size. For studies with insufficient subjects, the measurements employed for each subject can be increased. Usually, a sample of 30 is the minimum. On the other hand, to ensure agreement between two continuously distributed measurements, a sample of 100 is acceptable (Peat, 2011). Of course, more subjects are needed for categorical data.
Analysis of Variance and Multivariate Analysis of Variance
Analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) are two popular statistical procedures that play a significant role in the calculation of the sample size. ANOVA can be used to test the differences between the mean values of various groups, e.g., different treatments. While ANOVA is used to test only one dependent variable at a time, MANOVA can be used for various variables at the same time. When it comes to results, note that a size effect of 0.1-0.2 is considered to be small, 0.25-0.4 medium, and 0.5 large.
For MANOVA, an ad-hoc method is needed. Usually, when researchers use ad hoc tests, that means that the method employed works only for the specific purpose it was designed for. In fact, ad hoc means “for a particular purpose only” (“Ad Hoc Analysis and Testing,” 2015).
The time to event or survival time can vary between days and years.
Since the number of deaths is the focus of interest, experts can either increase the number of subjects or the length of the follow-up period. As explained above describing sample size calculations is vital – including factors, such as the levels of significance, the power of the study, the expected effect size, and the standard deviation in the population.
Clinical trials are incredibly complex and involve numerous ethical issues. Therefore, experts can conduct a statistical analysis before the actual recruitment of participants. This method is known as interim analysis (Peat, 2011).
Interim analyses can be employed to help experts decide to continue a clinical trial or not. This can prevent failures at later stages and cut costs. Also, interim analyses can be used to check and reassess the planned sample size and recruitment process (Kumar & Chakraborty, 2016). Nevertheless, the number of interim analyses should be limited and decided prior the actual study. What’s more, since such analyses must unbiased, an independent monitoring committee can be asked to perform them.
Internal Pilot Studies
Internal pilot studies can be performed to calculate the sample size as well. Such studies involve the first patients enrolled. By analyzing the results obtained from the first subjects, experts can calculate the variance in the reference group and recalculate the sample size. Note that experts need to be blinded to the results. Also, it’s important to understand that these results should not be used separately to test the study hypothesis, but they should be included in the final analysis (Peat, 2011). By recalculating the sample size, the power and the efficacy of the study increase, which spares lots of efforts and sources. Depending on the study goals, a preliminary analysis can be done with 20 subjects for a study with a total number of 40 participants. At the same time, it can include 100 subjects for a study of 1000 participants.
Also, professionals should differentiate classical pilot studies from internal pilot studies. Usually, pilot studies are conducted before the actual study to test if the recruitment procedure and the equipment employed are effective. While results obtained from a classical pilot study are not included in the analysis, results from internal pilot studies are used in the final analysis.
Clinical trials aim to improve medicine and find the best treatment. However, new medications may have various unknown side effects. To tackle the problem of possible adverse effects, safety analysis is paramount (Peat, 2011).
Usually, after the recruitment of the sample size, experts need to perform a safety analysis, with results being interpreted by an external monitoring committee. In the end, risk-benefit assessment is crucial in medicine.
Stopping a Study
Equipoise is another principle that is vital and in favor of patients’ well-being. Equipoise shows the uncertainty in the minds of the researchers. In fact, clinical equipoise has been proposed as a solution to randomization and clinical merits of an intervention (Hey et al., 2017). Ethical considerations should always come first, and patients who enroll should not worry about receiving inferior treatment or insufficient care.
However, stopping a study should follow established rules. Sometimes, by continuing, adjusting confounders, and using subgroups, further analyses can reveal potential benefits.
In conclusion, calculating the sample size is a complex process. In the end, patients are not only numbers but human beings.
Since preliminary analyses can reveal some valuable results, experts may decide to stop a study. This decision can be based on both statistical and ethical issues. For instance, if an interim analysis shows some toxicity of a new treatment, researchers will not recruit any more subjects. Although a larger sample is needed to answer all questions about efficacy, subjects cannot be exposed to risks. Apart from possible risks, clinical trials can be stopped prematurely if obvious differences are revealed, or respectively, non-significant results are obtained early.
Stopping a study is a delicate topic, and an external committee needs to be consulted. In fact, some studies have been stopped prematurely without a real reason behind it. Thus, clear rules should be established. In general, to avoid a false positive result, the decision to stop a study should be based on a high level of significance or small p-values.
Ad Hoc Analysis and Testing (2015, September, 27). Retrieved from http://www.statisticshowto.com/probability-and-statistics/find-sample-size/
Beckett, L., Scherr, P., & Evands, D. (1992). Population prevalence estimates from complex samples. Journal of Clinical Epidemiology, 45(4), p. 393-402.
Hey, S., London, A., Weijer, C., Rid, A., & Miller, F. (2017). Is the concept of clinical equipoise still relevant to research? BMJ.
Kadam, P., & Bhalerao, S. (2010). Sample size calculation. International Journal of Ayurveda Research, 1(1), p. 55-57.
Kumar, A., & Chakraborty., B. (2016). Interim analysis: A rational approach of decision making in clinical trial. Journal of Advanced Pharmaceutical Technology & Research, 7(4), p. 118-122.
Peat, J. (2011). Calculating the Sample Size. Health Science Research, SAGE Publications, Ltd.
Sample Size in Statistics (How to Find it): Excel, Cochran’s Formula, General Tips (2018, January 15). Retrieved from http://www.statisticshowto.com/probability-and-statistics/find-sample-size/
Szumilas, M. (2010). Explaining Odds Ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry, 19(3), p. 227–229.
Ward, M. (2013). Estimating Disease Prevalence and Incidence using Administrative Data. The Journal of Rheumatology, 40(8), p. 1241–1243.