Reporting the Results

By | Health Sciences Research

Reporting the Results

Reporting the results is one of the fundamental aspects of clinical research. Accurate results benefit diagnosis, patient outcomes, and drug manufacturing. Nevertheless, measurements and findings are often prone to errors and bias (Bartlett & Frost, 2008).

Various research methods and statistical procedures exist to help experts erase discrepancies and reach “true” values. In the end, the main aim of any clinical trial is accuracy.

Repeatability and Medical Data

Repeatability is a paramount factor which reveals the consistency of any clinical method. In other words, repeatability unveils if the same instrument used in the same subject more than once will lead to the same results (Peat, 2011). Note that the term repeatability was introduced by Bland and Altman, after which various terminology was created in a similar way.

Although terms, such as repeatability, reproducibility, reliability, consistency and test-retest variability, can be used interchangeably – there are some slight differences. Repeatability, for instance, requires the same location, the same tool, the same observer, and the same subject. Consequently, the repeatability coefficient reveals the precision of a test and the difference between the two repeated tests findings over a short period of time. To test repeatability in continuous data – statistics, such as the intraclass correlation coefficient and Levine’s test of equal variance, can be utilized. For categorical data – kappa and proportion in the agreement can support research. Reproducibility, on the other side, refers to the ability to replicate medical studies. In other words, it reveals the agreement between results – obtained from different subjects, via different tools, and at different locations (“What is the difference between repeatability and reproducibility?” 2014).

Data types also matter. As explained above, in case of continuously distributed measures, measurement error (or standard error of the measurement (SEM)) and intraclass correlation coefficient (ICC) are the two most effective indicators of repeatability (regarding the reliability of a measure). The measurement error reveals any within-subject test-retest variation. Note the measurement error is an absolute estimate of the absolute range in which the true value can be found. On the other hand, the ICC is defined as a relative estimate of repeatability. To be more precise, it reveals any between-subject variance to the total variance for continuous measures. When it comes to interpretation, a high ICC means that only a small proportion of the variance is due to within-subject variance. In fact, ICC close to one means there’s no within-subject variance.

For categorical data, there are also various methods – with kappa being one of the sufficient statistics. Basically, kappa is similar to ICC but applicable to categorical data. Thus, kappa values close to one indicate total agreement (Peat, 2011). Note that repeatability in categorical measurements is also called misclassification error.

Continuous Data and True Values

Medical research is a curious niche full of unexpected outcomes. Variations and errors are quite common. Variations may occur even when the same subject is tested twice via the same tool. Such discrepancies might be a result of various factors: within-observer variation (intra-observer error), between-observer variation (inter-observer error), within-subject variation (test-retest error), and actual changes in the subject after a certain intervention (responsiveness). To be more precise, variations may occur due to changes: in the observer, the subject or the equipment.

Consequently, it’s hard to analyze true values. To guarantee the accuracy, any good study design should ensure that more than one measurement will be taken from each subject to assess estimate repeatability (Peat, 2011).

Selection Bias and Sample Size

Selection bias affects the repeatability scores. Therefore, studies that have different subject selection criteria cannot be compared. At the same time, estimates of studies with three or four repeated measures cannot be compared to studies with two repeated measures.

Note that estimates of ICC may be higher and estimates of measurement error lower if the inclusion criteria lead to variations. To set an example, usually, ICC will be higher when subjects are selected randomly. Researchers should recruit a sample of minimum 30 subjects to guarantee adequate measurements of variance.

Within-subject Variance and Two Paired Measurements

Paired data is vital in research. Paired data should be used to measure within-subject variances. The mean values and the standard deviation (SD) of the differences also must be computed. The measurement error can be later transformed into a 95% range. In fact, this is the so-called limits of agreement – or the 95% certainty that the true value for a subject lies within the calculated range (Peat, 2011).

  • Paired t-tests are beneficial in assessing systematic bias between observers.
  • A test of equal variance (e.g., Levene’s test of equal variance) can be helpful to assess repeatability in two different groups.
  • A plot of the mean value is also crucial in order to assess the difference between various measures for each subject. This is an effective method as usually, the mean-vs-difference plot (or Bland-&-Altman plot) is clearer than any scatter plots.
  • Note that Kendall’s correlation can add more valuable insights to the study. Kendall’s tau-b correlation coefficient indicates the strength of association that exists between two variables (“Kendall’s Tau-b using SPSS Statistics”).

Measurement Error and Various Measurements per Subject

Nothing is only black and white in medical research. Often, more than two measures are required per subject. In case there are more than two measurements taken per subject, experts should calculate the variance for each subject and after that, any within-subject variances. Note that such values can be calculated via ANOVA.  A mean-vs-standard deviation plot can visualize the results. In addition, Kendall’s coefficient can indicate if there’s any systematic error.

Note that deciding on a reliable measure out of a selection of different measures is a difficult task. In clinical settings, this is a vital process as it may affect patients’ outcomes. Assessing measurement errors is also fundamental. The measurement error indicates the range of normal and abnormal values from the baseline. In fact, these values can reveal either a positive or a negative effect of a treatment (Peat, 2011). Let’s say that previously abnormal values have come close to normal values. One interpretation of this phenomenon can be that a disease has been affected by a treatment in a positive direction.

ICC and Methods to Calculate ICC

The ICC is an essential indicator in order to show the extent to which multiple measures taken from the same subjects are related to each other. This form of correlation is also known as a reliability coefficient. As explained above, a high ICC value means that variances are due to true differences between subjects, while the rest due to measurement errors (within-subject variance).

Unlike other correlation coefficients (Pearson’s correlation, for example), ICC is relatively easy to calculate. There are a few methods that can help experts calculate ICC, along with other sufficient computer programs. The first method employs a one-way analysis of variance – it is used when the difference between observers is fixed. The second method that can be beneficial refers to cases when there are many observers – it is based on two-way analysis of variance. There is also a third method which is simplified – based on only two measures per subject.

  • In fact, P values can be computed from ICC, which eliminates the need for a test of significance. However, note that the Pearson’s correlation coefficient (R) is often used to describe repeatability or agreement, which could lead to false interpretations. To set an example, in cases when there’s a systematic difference (e.g., the second set of measures which is larger than the first), the correlation can be perfect but yet, the repeatability poor.
  • A coefficient of variation, which is the within-subject SD divided by the mean of the measures, may be employed. Still, ICC is a more accurate indicator.
  • F tests to show if ICC differs from zero can be performed. Generally speaking, an F test is computed after methods, such as ANOVA and regression, to assess if the mean values of two populations differ (“F Statistic / F Value: Simple Definition and Interpretation”).
  • Confidence intervals should be calculated as well to support the data analysis.

Measurement Error and ICC

To sum up, measurement error and ICC are two paramount indicators in medical research. Since both give different statistics, they should be reported together.

Note that while the ICC is related to the ration of the measurement error to the total SD, the measurement error is an absolute value related to the total SD (Peat, 2011).

Repeatability of Categorical Data

The repeatability of categorical data (e.g., the presence of illnesses) collected via surveys and questionnaires is also vital. As explained above, when it comes to categorical data, measurement error is called misclassification error. Note that there are a few requirements which are mandatory: 1) the questions, the mode of administration and the settings must be identical on each occasion, 2) subjects and observers must be blinded to the results, and 3) the time between the test-retest processes must be appropriate. When it comes to the community, the repeatability values should be established in similar community settings (not extreme subsamples). On top of that, patients who are tested quite often should be excluded as they might have all answers well-rehearsed.

Kappa is the most popular statistics, which can reveal the observed proportion in agreement and the estimate correct classification value. Usually, kappa is beneficial for measuring test-retest repeatability of self-administered surveys and between-observer agreement in interviews. Note that a kappa value of zero reveals the chance agreement, while a value of one the perfect agreement. In addition, 0.5 shows moderate agreement, above 0.7 – good agreement, above 0.8 – very good agreement. We should mention that the average correct classification rate is an alternative to kappa: it’s higher than the observed proportion in agreement, and it represents the probability of a consistent reply.

Repeatability and Validity

In the end, repeatability and validity go hand in hand. Basically, poor repeatability leads to the poor validity of an instrument and limits the accuracy of results. Thus, many indicators and statistics can be employed. Since causes and outcomes are all interconnected, it’s not recommended to use ICC in isolation – simply because ICC is not a very responsive factor and it is not powerful enough to describe the consistency of an instrument.

We should mention that even a valid instrument may reveal some measurement error. In fact, measurement error has a simple clinical interpretation, which makes it a good statistic to use.

Agreement and Measures

Apart from repeatability, the agreement is another paramount aspect of medical research. The agreement is defined as the extent to which two different methods used to measure a particular variable can be compared or substituted with one another. To set an example, experts should know when measurements taken from the same subject via different tools can be used interchangeably (Peat, 2011). Note that agreement, or comparability of the tests, mainly assesses the criterion and the construct validity of a test. Nevertheless, results can never be identical.

There are numerous statistics which can be explored to measure agreement. There are tables that can guide experts how to employ several effective methods in various situations. For example, just like with repeatability, measurement error, ICC, and paired tests are among the most powerful statistics for continuous data and units the same. Also, Kappa is the main indicator that can help researchers with the analysis of categorical data. On the other hand, in situations when one measure is continuous, and the other one categorical – Receiver Operating Curve (ROC) curve can be employed (“What is a ROC curve?”).

Continuous Data and Units the Same

As mentioned earlier, different measures rarely give identical results. Even if experts measure weight via two different scales, figures won’t be exactly the same. Thus, when two measurements have to be used interchangeably or converted from one another, it must be clear how much error there will be after the conversion.

When figures are expressed in the same units, the agreement can be assessed via the measurement error or the mean value of the within-subject variance. Since these measures are calculated within the same group of subjects, methods are similar to the ones for repeatability.

Agreement and Mean-vs-differences Plot

Drawing a mean-vs-differences plot is also an effective method, which can be used along with calculating the 95% of agreement. Note that when two tools do not agree well, this effect might be because they measure different variables or because one of the instruments is unprecise.

Note that when it comes to a poor agreement, consistent bias can be assessed by computing the rank correlation coefficient of a plot. Kendall’s correlation coefficient, for instance, can indicate if the agreement and the size of measurement are related. If the correlation is high, then there’s a systematic bias which varies with the size of the measurement. In case such a relationship occurs, a regression equation can help experts convert measurements. Usually, a regression equation can help researchers explore the connections between sets of data and predict future events. Note that in linear regression, there’s a perfectly straight line.

95%-of-Agreement and Clinical Differences

Calculating the 95%-of-agreement is also essential. As a matter of fact, Bland and Altman defined the limits of agreement as the range in which 95% of the difference can be found (Peat, 2011). Note that a measure with poor repeatability will never agree well with another tool.

Variances are common phenomena in research. As described above, it’s normal for two instruments to express some differences. However, such instruments can be used interchangeably in practice only when this range of differences is not clinically important. In the end, patients’ well-being is the main goal of science.

Continuous Data and Units Different

Medical research is a challenging process, which involves the use of numerous statistical procedures. In fact, even different units should be compared from time to time. Measuring the extent to which different instruments can be used is vital to estimate if one measurement predicts the other.

When it comes to continuous data and units different, linear regression and correlation coefficients are the most accurate statistics, which can help experts check what extent of the variation in one measure is explained by the other measure (Peat, 2011).

Agreement and Categorical Data

Continuous data is crucial, so is categorical information. Categorical measurements and the level of agreement between them can reveal the utility of a test. To be more precise, the ability of a test to predict the presence or the absence of disease is paramount in medicine (Peat, 2011). In clinical settings, methods, such as self-reports, observation of clinical symptoms and diagnostic tests (e.g., X-rays), can help experts classify patients according to a presence of a disease or an absence of a disease (e.g., tuberculosis).

In this case, sensitivity and specificity are two essential statistics as they can be applied to different populations. Sensitivity is defined as the proportion of ill subjects who are correctly diagnosed by a positive test result. Specificity, on the other hand, is the proportion of disease negative patients who are correctly diagnosed by a negative test result. What’s more, these indicators can be compared between different studies, which employ different selection criteria and different testing methods.

Yet, the probability that the measure will reveal the correct diagnosis is the most important aspect, which can be achieved by the positive predictive value (PPV) and the negative predictive value (NPV) of a test (Peat, 2011). Note that PPV and NPV depend on the prevalence of a disease, so they cannot be applied to studies with different levels of prevalence of illness. We should mention that in rare diseases, the PPV will be closer to zero. This effect is because experts cannot be certain if a positive result can actually reveal an existing illness.

Likelihood Ratio and Confidence Intervals

The likelihood ratio is the most effective statistic used to compare different populations and clinical settings. The likelihood ratio is defined as the likelihood that certain findings would be expected in subjects with the disorder of interest, compared to subjects without that disease. This statistic incorporates both sensitivity and specificity and reveals how good a test is (e.g., the higher the value, the more effective the test will be). The likelihood ratio can be used to calculate pre-test and post-test odds, which can provide valuable clinical information (“Likelihood ratios”).

Note that all statistics described above, including likelihood ratio, reveal a certain degree of error (Peat, 2011). Therefore, their 95% confidence intervals should be calculated.

Continuous and Categorical Measures

Continuously distributed information (such as blood tests) is often needed in practice as it can predict the presence of a disease. Also, in order to predict the presence or the absence of a condition, experts need cut-off values, which can indicate normal and abnormal results. The ROC curve is the most effective method to obtain such information.

Note that in order to draw a ROC curve, the first step is to calculate the sensitivity and the specificity of the measure – for different cut-off points of the variable. The bigger the area under the curve is, the more effective the test is. Also, if experts want to check if one test can differentiate between two conditions, they can plot two ROC curves on the same graph and compare them (Peat, 2011).

Relative Risk, Odds Ratios and Number Needed to Treat

Measures of association are also vital in reporting the results. The relative risk (RR), odds ratio (OR) and number needed to treat (NNT) can reveal if there’s a risk of a disease in patients exposed to a certain factor or a treatment (Peat, 2011).

Relative Risk and Associations

For prospective cohort and cross-sectional studies, relative risk is the most effective statistic to present associations between exposures and outcomes. Note that exposures can be related to personal choice (e.g., drinking) or occupational and environmental risks (e.g., pollutants).

Consequently, relative risk is calculated by comparing the prevalence of an illness in the exposed and non-exposed group. Note that relative risk depends on the time needed for an illness to develop.

Odds Ratios and Case-control Studies

Odds ratios are another essential characteristic. It can be employed in case-control studies in which is impossible to calculate the relative risk due to the sampling method. The odds ratios represent the odds of exposure in both cases and controls. Note that in such studies, the prevalence of a disease does not represent the prevalence in the community. Nevertheless, statistical procedures like multiple regression allow experts to apply odds ratios to cohort and cross-sectional studies. When confounders occur, experts may employ adjusted odds ratios – or when confounders have been removed from the association between risks and outcomes.

Here we should mention that both statistics, relative risk, and odds ratios, are hard to interpret. Due to their complicity, when it comes to 95% confidence intervals, it is recommended to use a statistical program. Note that sometimes both statistics may differ when in practice, the absolute effect of the exposure is the same. On the other hand, they may be statistically the same when in reality, the absolute effect is actually different. Such differences may mislead experts and lead to type I or type II errors. Thus, odds ratios are recommended only for case-control studies and rare diseases (Peat, 2011).

Number Needed to Treat and Practice

Analysis of medical data might be tricky. While odds ratios are difficult to explore, the number needed to treat is a statistic, which is extremely beneficial in practice. The number needed to treat is defined as the estimate of patients who need to undergo treatment for one additional subject to benefit (Peat, 2011). In other words, this is the number of subjects which experts need to treat to prevent one bad outcome (“Number Needed to Treat”).

Note that the number needed to treat represents the clinical effect of a new treatment. It should balance the costs of treatment and possible negative effects for the controls. The number needed to treat can be calculated from meta-analyses and findings from different studies. There are also formulas to convert odds ratios to a number needed to treat. To set an example, if the number needed to treat for a new intervention equals four (NNT=4), that means that experts need to treat four people to prevent one bad outcome. It’s not surprising that it’s better to save one life for four patients, instead of one life for ten patients (Peat, 2011).

Matched and Paired Studies

There are different study designs, and in fact, case-control studies are a popular method. Basically, in case-control studies, the matching process of cases and controls is based on confounders. Note that removing any possible effects in the study design is a more effective technique than analyzing confounders at a later stage of the study. Also, we should mention that analyses may employ paired data and methods such as conditional logistic regression.

Another crucial characteristic of matched and paired analyses is the fact that the number of units is the number of matches or pairs – not the total number of subjects. The effect of pairing also has an effect on associations, confidence intervals and odds ratios (Peat, 2011). Usually, matched analyses reduce bias and improve the precision of confidence intervals.

More than One Control and Precision

In some situations, more than one control can be used for each case. This technique improves precision. Note that the number of controls is considered for an effective sample size.

Let’s say we have 30 cases and 60 controls; the number of matched pairs will be 60.

Logistic Regression and t-tests

When there are non-matched data, experts can use logistic regression and calculate adjusted odds ratios. Note that logistic regression is used when more than one independent variable determines an outcome. The outcome, on the other hand, is a dichotomous variable (e.g., data can be coded as 1 (e.g., pregnant) and 0 (non-pregnant)).

In addition, the differences in outcomes between cases and controls can be calculated via a paired t-test. This type of testing assesses if the mean difference between two sets of observations (each subject is measured twice) is zero. Multiple regression is also beneficial.

Exact Methods

Data analysis should be based on accurate statistical methods. It’s unethical to violate information to obtain statistically significant results, which are not clinically important. In cases when the prevalence of a disease is not common, exact methods can be employed. Exact methods can also be used for small sample size and small groups in stratified analyses (Peat, 2011).

The differences between normal methods and exact method are:

  • Normal methods rely on big samples
  • Normal methods utilize normally distributed data
  • The variable of interest in normal methods is not rare
  • Exact methods require more complex statistical packages

Rate of Occurrence and Prevalence Statistics

To explore rare diseases, experts may investigate the incidence or the rate of occurrence of a disease. The incidence reveals any new cases within a defined group and a defined period of time. Usually, since some diseases are rare, this number is expressed per 10,000 or 100,000 subjects (e.g., children less than five years old).

Prevalence, on the other hand, is estimated from the total number of cases – regarding a specific illness, a given population, and a clear time period (e.g., 10% of the population in 2017). Prevalence is affected by factors, such as the number of deaths (Peat, 2011).

Confidence Intervals and Chi-square

Confidence intervals are also paramount in exact methods. As explained above, the 95% confidence intervals are defined as the range in which experts are 95% confident that the true value lies. Usually, the exact confidence intervals are based on the Poisson distribution. This type of distribution helps researchers investigate the probability of events in a certain period.

When we need to explore the association between a disease and other factors, such as age, we can employ a contingency table. This helps the cross-classification of data into a table in order to visualize the total number of participants in all subgroups. Note that chi-square is a statistic that is of great help. Usually, Pearson’s chi-square is used for large samples (more than 1000, with five subjects in each cell of the table). Continuity adjusted chi-square, on the other hand, can be adjusted for samples under 1000. Last but not the least, Fischer’s chi-square is applicable when there are less than five subjects in each case. Chi-square tests are also used to analyze subsets of information (Peat, 2011).

Reporting the results is a complicated process. Data types and statistical procedures may challenge scientific findings. Nevertheless, experts should always aim for accuracy – with the sole purpose to improve patients’ well-being.


Bartlett, J., & Frost, C. (2008). Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound in Obstetrics and Gynecology, 4, p. 466-75.

Kendall’s Tau-b using SPSS Statistics. Retrieved from

Likelihood Ratios. Retrieved from

Number needed to treat. Retrieved from

Peat, J. (2011). Reporting the Results. Health Science Research: SAGE Publications, Ltd.

Statistic / F Value: Simple Definition and Interpretation. Retrieved from

What is a ROC curve? Retrieved from

What is the Difference Between Repeatability and Reproducibility? (2014, June 27). Retrieved from


Reviewing the Literature

By | Health Sciences Research

Reviewing the Literature

Reviewing the literature is a challenging step which researchers need to take on the journey to success. A literature review is not just an empty box on a study protocol that needs to be ticked off. It’s not a boring pile of articles that need to be summarized either. Reviewing the literature can help experts understand the topic of interest and justify the need for their study.

Note that there are different types of publications. There is primary literature, which consists of original materials in peer-reviewed journals, conference papers, and reports. Secondary literature, on the other hand, consists of interpretations and evaluations of the primary source literature, such as review articles, meta-analyses, systematic reviews, and references. Last but not the least, tertiary literature is like a collection of primary and secondary sources, such as textbooks, encyclopedias, and handbooks (“Types of medical literature”).


Appraising the Literature

When it comes to calculating the In general, the sample needs to be big enough to guarantee the generalizability of results, and small enough to answer the research questions via the research sources available (Peat, 2011). However, calculating the sample size is always prone to errors, as explained above. In fact, calculating the sample size is a subjective process. For example, in large samples, some outcomes may appear statistically significant, while in clinical settings, they are unimportant. On the other hand, small samples may reveal some important clinical differences, which due to the small sample size do not show any statistical significance.

Experts need to be familiar with such issues to avoid them. In fact, the problems presented above are known as oversized and undersized studies and clinical trials. What’s more, when the study is oversized, type I error may occur. Type I error is defined as the wrong rejection of a true null hypothesis. To be more precise, this happens when the null hypothesis is true, but researchers reject it and accept the alternate one, which is the hypothesis explored by their team. Thus, oversized studies may waste resources and become unethical due to any excessive enrollment of subjects. On the other side, when the study is undersized, both type I and II errors may occur. Type II error is defined as the inability to reject a false null hypothesis. In other words, researchers may fail to reject the null hypothesis, which is untrue when compared to the alternate hypothesis. In fact, a small sample will often lead to inadequate statistical analyses. Undersized studies may also become unethical – simply because they won’t be able to fulfill the research goals of the study (Peat, 2011). Note that when sampling errors occur, it’s better to terminate a study rather than waste resources or mislead subjects.

Since the main goal in medicine is to provide clear information about medical practices and patient care, medical literature is a source of knowledge and expertise. At the same time, the load of articles is increasing rapidly, so it’s hard to track accurate and relevant literature. Thus, critical appraisal of literature can help experts distinguish quality data from flawed experiments, which can mislead and harm evidence-based practice (Umesh et al., 2016).

Therefore, apart from designing a sophisticated study design and applying accurate research methods, critical appraisal skills are crucial for success.

Critical Appraisal

Critical appraisal is defined as the application of various scientific rules to assess the validity of a study and its findings. In other words, critical appraisal is the evaluation of the scientific merit of a medical study (Peat, 2011).

The most important step during any critical appraisal is to identify if the relationship between variables is causal or due to other effects, such as confounding, bias or chance. What’s more, the following steps are paramount to decide: 1) if a medical article is a valuable source of information; 2) if the employed research methods are valid, and 3) if the literature evidence is enough in order to implement changes in clinical practice (Peat, 2011):

  • Identify goals and hypotheses
  • Identity study designs and research methods
  • Assess the criteria for inclusion, exclusion, and sample size
  • Assess sources of bias and confounding
  • Appraise statistical methods and results
  • List strengths and weaknesses and draw a conclusion

A critical appraisal can help researchers prioritize medical research and justify the need for a new study or intervention. In case experts identify poor practices with poor efficacy and effectiveness, the gaps in knowledge should be addressed. Consequently, more rigorous studies should be designed and conducted.

Note that peer review is also a valuable aspect for any successful critical appraisal.

Systematic Reviews

Systematic reviews are also fundamental in medicine. They are defined as the procedure of selecting and combining the evidence from the most rigorous studies available (Peat, 2011). Often narrative reviews and editorials offer literature that supports the researcher’s point. However, systematic reviews that include all relevant studies and meaningful results should be presented.

Just as with critical appraisal, a checklist is also available to help experts. After articles have been selected, their results can be combined via a meta-analysis and a combination of odds ratios. Note that systematic reviews are not necessarily limited to randomized controlled trials.

  • Define outcome variables and interventions of interest
  • Define search strategies and literature databases
  • Define inclusion and exclusion criteria for studies
  • Conduct a literature search
  • Review of studies by two observers and consensus are a must
  • Conduct a review
  • Get data and conduct a meta-analysis

Submit and publish the final review

Cochrane Collaboration

The Cochrane Collaboration is an important part of the assessment of systematic reviews. In fact, this is the gold standard for assessing evidence for healthcare and practice. Up-to-date reviews of trials can save lives. It’s not surprising that the Cochrane collaboration helps volunteers submit reviews and promote high standards in systematic reviews and practice. This approach also aims to: address specific health problems, train experts to avoid duplication, teach efficient search strategies, and support meta-analyses (Peat, 2011).

Thus, the Cochrane collaboration has become an international network, with centers, books, and programs all over the world. Note that reviews are incorporated into the Cochrane Database of Systematic Reviews and the Cochrane Library. There are strict guidelines, of course, to establish rules and avoid duplication. The Cochrane structure compromises of review groups, methods working groups, and centers. We should mention that the review group is a network of experts interested in a topic, that provide their own funding. Anyone can reach a review group, submit a topic for approval and register their title. Then, a protocol should be submitted; after approval, the actual review can be conducted. Consequently, the group critically appraises the review and publishes the results. An editorial team and method working groups are designed to support the review groups. On the other hand, Cochrane centers manage and coordinate the collaboration of researchers worldwide (Peat, 2011).

Evidence-based Practice

Evidence-based practice is another important key to medical success and patients’ well-being. It is defined as patient care which is based on evidence of the best studies available. In other words, evidence-based practice uses the best evidence available to deliver patient care. From assessment to clinical trials, evidence-based practice emphasizes the importance of knowledge and data. Simply because inaccuracy and ‘maybe’s’ can put patients at risk. Thus, there are strict rules this approach must follow (Peat, 2011):

  • Define the problem and break it down into questions that can be answered
  • Select relevant studies and choose the most accurate ones
  • Appraise the evidence and focus on reliability, validity, results, generalizability, etc.
  • Make clinical decisions and implement changes
  • If any information is missing, conduct a new study
  • Evaluate the outcome of any changes that have been implemented

Note that the Cochrane collaboration can also support evidence-based practice. In fact, journals like Evidence-Based Medicine can provide appraisals of studies and facilitate research. The evidence-based approach can benefit medical prognoses, interventions, hospitalization rates, testing procedures, and new health-related topics. At the same time, experts must consider costs and risks; and most of all, patients’ well-being.


Reviewing the literature is a challenging task which can benefit medical care. To set an example, critical appraisal combats information overload to improve healthcare practices (Al-Jundi et al., 2017). Although reviewing the literature is a time-consuming process, it’s worth it.


Al-Jundi, A., & Sakka, S. (2017). Critical appraisal of clinical research. Journal of clinical and diagnostic research, 11(5).

Peat, J. (2011). Reviewing the literature. Health science research. SAGE Publications, Ltd.

Reviewing the literature: A short guide for research students. Retrieved from

Types of medical literature (2018, April 5). Retrieved from

Umesh, G., Karippacheril, J., & Magazine, R. (2016). Critical appraisal of published literature. Indian Journal of Anaesthesia, 60 (9), p. 670-673.

Calculating the Sample Size

By | Health Sciences Research

Sample Size

Sample size calculation is a paramount aspect of medical research. By calculating the sample size, or the right portion of the population which will be tested, researchers can ensure validity and generalizability. Consequently, this can resolve practical demands, such as time delays, ethical regulations, and insufficient funding.

Can researchers be completely certain about the right sample size, though? When there’s only a portion of the general population, professionals cannot be sure if this particular portion is a 100% accurate representation of the whole population. Unfortunately, errors are not rare in research. For instance, when it comes to sample size calculations, sampling error can occur. This phenomenon is defined as the research uncertainty about the generalizability of their results. Thus, examiners should always aim to minimize errors and bias. Note that often confidence intervals are used to ensure generalizability (“Sample Size in Statistics (How to Find it): Excel, Cochran’s Formula, General Tips,” 2018).


Oversized and Undersized Studies

When it comes to calculating the In general, the sample needs to be big enough to guarantee the generalizability of results, and small enough to answer the research questions via the research sources available (Peat, 2011). However, calculating the sample size is always prone to errors, as explained above. In fact, calculating the sample size is a subjective process. For example, in large samples, some outcomes may appear statistically significant, while in clinical settings, they are unimportant. On the other hand, small samples may reveal some important clinical differences, which due to the small sample size do not show any statistical significance.

Experts need to be familiar with such issues to avoid them. In fact, the problems presented above are known as oversized and undersized studies and clinical trials. What’s more, when the study is oversized, type I error may occur. Type I error is defined as the wrong rejection of a true null hypothesis. To be more precise, this happens when the null hypothesis is true, but researchers reject it and accept the alternate one, which is the hypothesis explored by their team. Thus, oversized studies may waste resources and become unethical due to any excessive enrollment of subjects. On the other side, when the study is undersized, both type I and II errors may occur. Type II error is defined as the inability to reject a false null hypothesis. In other words, researchers may fail to reject the null hypothesis, which is untrue when compared to the alternate hypothesis. In fact, a small sample will often lead to inadequate statistical analyses. Undersized studies may also become unethical – simply because they won’t be able to fulfill the research goals of the study (Peat, 2011). Note that when sampling errors occur, it’s better to terminate a study rather than waste resources or mislead subjects.

Power and Probability

Before calculating the sample size, researchers need to consider essential factors, such as the power and the probability of their study. The power of the study is the probability of rejecting a false null hypothesis. In other words, the power of the study reveals if researchers can detect significant changes and reduce the probability of making type II error.

As a matter of fact, this is a vital practical issue. In clinical settings, type I and type II error can lead to different consequences (Peat, 2011). Let’s explore a study about a new cancer treatment. The null hypothesis will be that both the new and the existing treatment are the same. Type I error will mean that the existing treatment will be rejected, and the new intervention accepted. When a new treatment is less effective and more expensive, type I error can cause further damage to patients. Type II error, on the other hand, will mean that the new treatment won’t be accepted, even though it will be more effective than the existing one. Thus, due to type II error, many patients will be denied the new treatment.

Calculating the Sample Size and Subgroups

Technological advancements support research and medicine. Nevertheless, although many computer programs can help researchers calculate the sample size, the best way to perform calculations is to use a table. Tables are clear, accurate, and simple. As a matter of fact, using a table can be extremely helpful for chi-square tests and McNemar’s tests. Note that the McNemar’s test is used mainly for paired nominal data or 2×2 tables (e.g., smokers and non-smokers).

Perhaps one of the major factors that experts need to consider is the type of the measured variables. For instance, for categorical values, such as gender (e.g., male and female), the sample size should be doubled in order to provide clinically and statistically powerful results.

Confidence Intervals and Prevalence

Another important aspect when calculating the sample size is determining the confidence interval and prevalence. As explained above, choosing a sample helps experts find a mean that represents the population of interest – without testing the whole population or wasting resources. However, it’s uncertain if the sample would represent the population. Therefore, experts need to determine a confidence interval. Confidence intervals are likely to show the parameters that would apply to the population. They are based on a confidence level, which often is 95%. That means that, hypothetically, after any consequent sampling procedure within the same population, 95% of the cases will represent the true parameters of interest.

Estimating prevalence is also crucial. Prevalence is defined as the burden of a disease in a population (Ward, 2013). For instance, researchers may need to consider the prevalence of Alzheimer’s in the community (Beckett et al., 1992). Thus, research plans need to ensure an adequate number of cases and controls. Describing the process of calculating the sample size is also an important aspect of research (Peat, 2011). Documentation is crucial.

Rare Events

Considering prevalence is a complicated process. This task is even more complicated for rare events. For example, in a new surgical trial, serious adverse effects may not appear at all. However, that does not mean that there are no risks.

In rare events, to calculate the sample size, the upper limit of risk should be agreed upon. The following formula can help; 3/n (n being the sample size). What do these numbers mean? Let’s say that the upper limit of risk is one in ten, which is 10%. Thus, experts would need a sample, which equals 3/n, or in this case, 3 divided by 10% or 3 divided by 0.1. This makes 30 subjects. So, 30 subjects will be required to help experts fulfill the goals of their research.

Effect of Compliance

There are many factors researchers need to consider. Practical limitations, such as compliance with intervention, often become an obstacle to clinical trials. In general, if non-compliance is high, the size of the intervention groups needs to be doubled (Peat, 2011).

Although tables are beneficial, run-in periods are also a technique which can help experts ensure compliance. To be more precise, the method of eliminating non-compliant subjects during the run-in phase (zation) can help experts maintain the power of the study, especially in studies that measure efficacy. Nevertheless, in studies that measure effectiveness, this approach may reduce generalizability. In the end, goals and practice need to be balanced.

Calculating the Sample Size: Continuous Outcome Variables and Non-parametric Outcome Measurements

Medical research is a complicated process. Deciding on the primary outcome variable is crucial. However, as stated earlier, all factors that affect the topic of interest should be considered. Tables should always be consulted. They can help experts calculate the sample size for continuous outcome variables, for both paired and unpaired variables (Peat, 2011). Note that the effect size is defined as the smallest significant difference between groups. Also, the sample size depends on the standard deviation. Standard deviation is defined as the measure of variability in the collected data. Let’s say that experts need to assess subjects that weigh between 45 kg and 100 kg (Kadam & Bhalerao, 2010). This is a large variability, so a large sample will be needed.

However, if the variables are not normally distributed or if they are non-parametric, standard deviation cannot be calculated (in case there are more than two categories, and when data is collected via Borg and Likert scales). Again, describing the goals and the statistical procedures employed is vital.

Cases and Controls: Increasing the Power of the Study

Another research method is to balance the number of cases and controls. In rare events, populations, and diseases, the power of the study can be improved by enrolling more controls; for instance, more than two controls for each case. This is extremely helpful for testing the efficacy of new or expensive interventions (Peat, 2011).

Note that trade-off should also be considered. In simple words, the trade-off effect is defined as the decision of losing one quality to gain another.

Odds Ratio and Relative Risk

Odds ratio or the association between an exposure and an outcome is also Szumilas, 2010). Odds ratio (OR) can reveal if an outcome occurs as a result of exposure of interest. In other words, the odds ratio can reveal if a particular exposure is a risk factor for a disease. When the odds ratio is calculated and equals one, then the exposure does not affect the outcome. If this number is higher, the risk is also higher. Interestingly, logistic regression and confidence intervals can also be employed to determine the odds ratio (Peat, 2011).

When stratification based on confounders or matched case-control studies are performed, we should mention that overmatching is not a good approach. Overmatching can lead to low efficacy.

Correlation Coefficients

Correlation coefficients are also helpful to calculate the sample size. Again, tables are vital. In general, correlations show how strong the relationship between the two variables is. Usually, in linear regression analysis, Pearson’s correlation is used as a valuable indicator.

Note that the p-value should differ from zero to be significant (Peat, 2011). The p-values are described as the level of significance. Usually, p<0.05 is accepted as significant. That means that the probability of observing changes due to chance (not intervention) is 5%. As a matter of fact, as explained earlier, statistically significant associations do not always indicate clinically important differences.

Repeatability and Agreement

When calculating the sample size, no matter what variables or procedures have been employed, repeatability and agreement are two factors that should be considered.

To ensure repeatability, experts can increase the sample size. For studies with insufficient subjects, the measurements employed for each subject can be increased. Usually, a sample of 30 is the minimum. On the other hand, to ensure agreement between two continuously distributed measurements, a sample of 100 is acceptable (Peat, 2011). Of course, more subjects are needed for categorical data.

Analysis of Variance and Multivariate Analysis of Variance

Analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) are two popular statistical procedures that play a significant role in the calculation of the sample size. ANOVA can be used to test the differences between the mean values of various groups, e.g., different treatments. While ANOVA is used to test only one dependent variable at a time, MANOVA can be used for various variables at the same time. When it comes to results, note that a size effect of 0.1-0.2 is considered to be small, 0.25-0.4 medium, and 0.5 large.

For MANOVA, an ad-hoc method is needed. Usually, when researchers use ad hoc tests, that means that the method employed works only for the specific purpose it was designed for. In fact, ad hoc means “for a particular purpose only” (“Ad Hoc Analysis and Testing,” 2015).

Survival Analyses

The time to event or survival time can vary between days and years.

Since the number of deaths is the focus of interest, experts can either increase the number of subjects or the length of the follow-up period. As explained above describing sample size calculations is vital – including factors, such as the levels of significance, the power of the study, the expected effect size, and the standard deviation in the population.

Interim Analysis

Clinical trials are incredibly complex and involve numerous ethical issues. Therefore, experts can conduct a statistical analysis before the actual recruitment of participants. This method is known as interim analysis (Peat, 2011).

Interim analyses can be employed to help experts decide to continue a clinical trial or not. This can prevent failures at later stages and cut costs. Also, interim analyses can be used to check and reassess the planned sample size and recruitment process (Kumar & Chakraborty, 2016). Nevertheless, the number of interim analyses should be limited and decided prior the actual study. What’s more, since such analyses must unbiased, an independent monitoring committee can be asked to perform them.

Internal Pilot Studies

Internal pilot studies can be performed to calculate the sample size as well. Such studies involve the first patients enrolled. By analyzing the results obtained from the first subjects, experts can calculate the variance in the reference group and recalculate the sample size. Note that experts need to be blinded to the results. Also, it’s important to understand that these results should not be used separately to test the study hypothesis, but they should be included in the final analysis (Peat, 2011). By recalculating the sample size, the power and the efficacy of the study increase, which spares lots of efforts and sources. Depending on the study goals, a preliminary analysis can be done with 20 subjects for a study with a total number of 40 participants. At the same time, it can include 100 subjects for a study of 1000 participants.

Also, professionals should differentiate classical pilot studies from internal pilot studies. Usually, pilot studies are conducted before the actual study to test if the recruitment procedure and the equipment employed are effective. While results obtained from a classical pilot study are not included in the analysis, results from internal pilot studies are used in the final analysis.

Safety Analysis

Clinical trials aim to improve medicine and find the best treatment. However, new medications may have various unknown side effects. To tackle the problem of possible adverse effects, safety analysis is paramount (Peat, 2011).

Usually, after the recruitment of the sample size, experts need to perform a safety analysis, with results being interpreted by an external monitoring committee. In the end, risk-benefit assessment is crucial in medicine.


Stopping a Study

Equipoise is another principle that is vital and in favor of patients’ well-being. Equipoise shows the uncertainty in the minds of the researchers. In fact, clinical equipoise has been proposed as a solution to randomization and clinical merits of an intervention (Hey et al., 2017). Ethical considerations should always come first, and patients who enroll should not worry about receiving inferior treatment or insufficient care.

However, stopping a study should follow established rules. Sometimes, by continuing, adjusting confounders, and using subgroups, further analyses can reveal potential benefits.

In conclusion, calculating the sample size is a complex process. In the end, patients are not only numbers but human beings.

Since preliminary analyses can reveal some valuable results, experts may decide to stop a study. This decision can be based on both statistical and ethical issues. For instance, if an interim analysis shows some toxicity of a new treatment, researchers will not recruit any more subjects. Although a larger sample is needed to answer all questions about efficacy, subjects cannot be exposed to risks. Apart from possible risks, clinical trials can be stopped prematurely if obvious differences are revealed, or respectively, non-significant results are obtained early.

Stopping a study is a delicate topic, and an external committee needs to be consulted. In fact, some studies have been stopped prematurely without a real reason behind it. Thus, clear rules should be established. In general, to avoid a false positive result, the decision to stop a study should be based on a high level of significance or small p-values.


Ad Hoc Analysis and Testing (2015, September, 27). Retrieved from

Beckett, L., Scherr, P., & Evands, D.  (1992). Population prevalence estimates from complex samples. Journal of Clinical Epidemiology, 45(4), p. 393-402.

Hey, S., London, A., Weijer, C., Rid, A., & Miller, F. (2017). Is the concept of clinical equipoise still relevant to research? BMJ.

Kadam, P., & Bhalerao, S. (2010). Sample size calculation. International Journal of Ayurveda Research, 1(1), p. 55-57.

Kumar, A., & Chakraborty., B. (2016). Interim analysis: A rational approach of decision making in clinical trial. Journal of Advanced Pharmaceutical Technology & Research, 7(4), p. 118-122.

Peat, J. (2011). Calculating the Sample Size. Health Science Research, SAGE Publications, Ltd.

Sample Size in Statistics (How to Find it): Excel, Cochran’s Formula, General Tips (2018, January 15). Retrieved from

Szumilas, M. (2010).  Explaining Odds Ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry, 19(3), p. 227–229.

Ward, M. (2013). Estimating Disease Prevalence and Incidence using Administrative Data. The Journal of Rheumatology, 40(8), p. 1241–1243.

Appraising Research Protocols

By | Health Sciences Research

Appraising Research Protocols

Research ideas are the core of any scientific progress. Nevertheless, planning is a fundamental step towards success within medical settings. Only a sophisticated study design can support experts during their research endeavors. Creating explicit hypotheses, employing adequate research methods, and computing statistical procedures are vital. Simply because precision can lead to meaningful results, overcome bias, and improve patients’ well-being.

That said, even the most fascinating research ideas can fail due to poor management practices and insufficient funding. To attract funding, studies must be elegant, clear, and justified (Peat, 2011). A study protocol which explains the aim of the experiment and the main research steps is a must. The following considerations are fundamental:

  • When conducting a new study, experts should focus on areas which lack adequate data and evidence
  • Hypotheses should represent clear research questions and ideas
  • Study designs should answer the research questions and represent the main goal of the study
  • Methods and statistical procedures should be adequate to minimize errors and bias
  • Researchers should prepare a protocol and obtain ethical approval
  • Experts should estimate budget and find appropriate funding bodies

The most important aspect is to ensure that the intended study aspires for better healthcare and improved patient outcomes.


Core Checklist

Aims and hypotheses: Research ideas should be clear, and hypotheses should be testable. The very first section of the protocol must be elegant as it sets the direction of the whole document. When it comes to hypotheses, three is the maximum number to ensure clarity. Note that hypotheses which are complex, or which contain multiple causes, reflect unclear thinking. On top of that, hypotheses should differ for experimental designs and descriptive studies (Peat, 2011). Experts can employ a null hypothesis, which states there aren’t any significant differences between variables, or a priori/alternate hypothesis, which states the direction of the relationship between variables. Hypotheses can be numbered to ensure readability. In the end, the significance of the study (e.g., prevention) should be explained.

Background: The background section should be intriguing and sophisticated – just like an introduction in a study journal. As a matter of fact, this section should sell the study. Most of all, experts must explain in the protocol what the study will achieve and why – with an emphasize on any new information the study may acquire. Stating personal experience in the field is also recommended. Note that topic sentences can be used to clarify short paragraphs in order to foster readability.

Research methods: Research methods should match the aims of the study, and the process should follow certain time periods. Aspects, such as recruitment, sample size, generalizability, and confounders, should be described in detail. The inclusion and exclusion criteria should be listed, which may foster repeatability. Interim analyses, stopping rules, and methods to control for confounders are also vital.

Conducting the study: Another important aspect, which tackles all details of data collection, location, observers, training, documentation, and statistics. Research steps should fulfill the aim of the study and answer the stated research questions.

Statistical procedures: Data should be collected to answer specific research questions. It’s unethical to adjust data only to get statistically powerful results. Data types, variables, and missing data should be considered and documented. The interpretation of data should also be explained in the study protocol.

Budget and staff requirements: Costs should be precise, and various research aspects, such as training, should be considered. Requests and budget should be justified – a factor which can be supported by the actual significance of the study.

Methodological Studies

Often to conduct a meaningful study, researchers have to develop a new testing tool. To establish the validity of new questionnaires and equipment and to ensure repeatability, methodological studies should be conducted. Note that many measures lack validity and lead to errors and bias. Thus, experts should not rely only on established and convenient instruments. Methodological studies focus on such issues and help researchers access high-quality data. In addition to the main checklist presented above, the following specific requirements should be considered in any methodological study protocol:

  • Study objectives: The protocol should focus on repeatability, sensitivity, specificity, agreement, and responsiveness.
  • Methods: Risks and pilot studies should be considered. The development of surveys and a timetable for data collection should be explained in detail.
  • Reducing bias: This is another important aspect that the study protocol should consider. From blinding procedures to randomization practices – methods should be clear and easy to understand.
  • Statistical methods: All statistical steps should be presented. This includes how the results of each data analysis will be used to fulfill the purposes of the study.

Clinical Studies

In clinical settings, studies that explore the effects of a new treatment are paramount. Experimental clinical studies aim to answer questions related to efficacy and effectiveness. Thus, such studies require high levels of precision. Note that in clinical trials and case-control studies, the selection criteria and allocation process play a crucial role in the generalizability of the results (Peat, 2011). Below are some additional steps to the core checklist, which clinical studies and protocols should follow:

  • Study design: All characteristics and matching criteria should be presented.
  • Treatment or intervention: Aspects, such as placebo and compliance, should be stated in the study protocol.
  • Methods: Sampling methods, recruitment procedures, outcome variables, measurements, pilot studies, and feedback to subjects need to be included in the protocol. Note that the validity of the methods is fundamental.
  • Reducing bias and confounders: All factors that may lead to bias and errors should be reported, including procedures to reduce bias and eliminate confounders.
  • Statistical methods: Here experts should consider if statistically important differences mean clinically important variations. The type of analysis also matters (e.g., intention to treat). Threshold or dose-response effects should be considered.

Epidemiological Studies

The relationship between exposures and outcomes is paramount in healthcare. Thus, epidemiological studies are vital. They are used to measure incidence and prevalence and to assess the effects of all environmental interventions. Here, the sample size is essential as many studies compare populations or subgroups over time (Peat, 2011). Precision is needed to ensure generalizability. When it comes to epidemiological studies, in addition to the main checklist, experts should consider the following specifics:

  • Study design: Experts should describe if the study is ecological, cross-sectional, cohort study, or intervention.
  • Recruitment: The selection criteria and sampling methods should be discussed.
  • Methods: Since medical terms vary, the actual definition used to describe the disease of interest should be stated in a clear manner.
  • Measurements: For any exposure and outcome measures aspects, such as repeatability, validity, and applicability, are crucial.
  • Reducing bias: The study protocol should explain how experts will increase response rates and improve follow-up procedures.
  • Statistical methods: Each method should be explained in detail, including any implications for possible causations.


While ambitious ideas make the world of science spin, funding is the main aspect that can bring innovations into practice. Obtaining funding is a challenging task, though. In order to get funding, research projects should be clear and clinically beneficial. In addition, application protocols should be beautifully prepared and presented (Peat, 2011).

Preparing a good application requires lots of patience, teamwork, and persistence. Paperwork can be burdensome but vital. Having a reliable team can help researchers prepare a grant application. Front-and-back pages (e.g., budget, bibliographies, ethics, signatures) should not be underestimated, and deadlines should not be taken lightly. Note that peer review and editing can be time-consuming.

When it comes to peer review, internal and external peer review can only benefit a grant application. The best option is to consult people who have been involved in research and grantsmanship. On top of that, the application should be presented to people who are not experts. If an application can be understood by people who are not familiar with the research topic, then the application will be easily understood by the people involved in the actual granting process. Although it might be frustrating to receive negative feedback and rewrite time-consuming aspects, always listen to your peers. Yet, try to differentiate useful advice from unscientific and personal comments (Peat, 2011).

Most of all, the study should be beautifully presented. The hypotheses should be clear, the aim of the study should be relevant to clinical practices, and the application should be organized and visually appealing. In fact, good presentation is recommended not only to receive funding but to contribute to the reputation of the research team.

It’s not only about novel ideas and good science. The committee panel receives numerous applications, so papers that are beautifully arranged have a better chance to succeed. There must be a logical flow, charts, and timelines. A topic sentence at the beginning of each paragraph can support readability. Large font, simple language, and sufficient white space are recommended.

Granting Process

The committee panel may consist of people who are not familiar with the research topic. As a matter of fact, only a small number of readers, such as the content experts and the spokesperson, will read the application in detail. They are the ones who’ll influence the decision of the whole granting committee. Thus, any research limitations should be tackled, pilot data presented, and budget justified.

Note that budget is often limited. Thus, justifying the budget (e.g., training, equipment, rewards) is vital. A rounded budget shows inaccuracy and lack of precision. On top of that, when it comes to more expensive tools and staff, the budget should be precisely explained (Peat, 2011).

In the end, conducting a study is one of the most rewarding events for any research team or educational institution. Just like having an article published in a scientific journal, obtaining a funding grant is one of the best rewards for medical experts. Simply because that shows that your grant application has incorporated the best science available, with the sole purpose to improve medicine and patients’ well-being.

Research Ethics

Apart from funding, research ethics are another paramount factor which researchers must consider. Ethical principles also play a crucial role in the funding process. Research ethics come before any other interests; therefore, they are clearly defined by governments and committees:

  • Medical studies should be approved by ethics committees
  • Research staff should be professional and motivated
  • The aims of the study should justify any potential risk for the subjects
  • Participants should be able to withdraw freely without a risk to their health
  • If a treatment proves to be harmful, research should be terminated
  • Participants should be informed, and consent sought
  • The well-being and feelings of subjects should always come first

Note that medical studies which include vulnerable people, unconscious patients, and children add some additional challenges, which experts should consider (Peat, 2011).

Last but not the least, when it comes to recruitment, consent is essential. It’s unethical to recruit family members or people who cannot refuse to participate. On top of that, reimbursement should not be the only motive for subjects to participate.

Ethics Committees

To ensure good research ethics, any medical study should be approved by an appropriate ethics committee (Peat, 2011). A committee may consist of ministers of religion, lawyers, clinicians, etc. The committee must ensure that:

  • Patients are informed
  • Consent obtained
  • Any possible risks are justified
  • Unethical research prevented

Note that studies are ethical only when researchers are uncertain which treatment is more beneficial for patients. On the other hand, unethical situations may include:

  • Conducting studies that have no practical implications
  • Starting a new study without considering previous data and findings
  • Not following the study protocol
  • Conducting a study without a control group or exposing the control group to placebo (instead of standard treatment)
  • Testing children or vulnerable people when questions can be answered by adults
  • Including measures not approved by the ethics committee
  • Enrolling subjects only to get statistically powerful results
  • Stopping a study inadequately
  • Failing to analyze data and failing to report results

To sum up, planning and conducting medical studies, appraising research protocols and applying for funding, following research ethics and improving patients’ well-being are only a few of the most challenging aspects of research. Nevertheless, science is rewarding.


Peat, J. (2011). Appraising research protocols. Health Science Research. SAGE Publications, Ltd.

Medical Data: Planning the Analysis and Choosing the Right Statistical Methods

By | Health Sciences Research

Medical Data: The Core of Research

Analyzing data is one of the most important and exciting parts of any health-related study or clinical trial. In the end, only data can answer all research questions and hypotheses. Just like with any other aspect of research, analysis of data should undergo some careful consideration. It’s not surprising that good documentation is a must and every step of the data analysis should be recorded. In fact, good data management practices should provide excellent records and visual representations. Note that using a logbook is a good technique to support data analysis and documentation practices.

Most of all, data should be objective, clear, and truthful (Peat, 2011). Let’s not forget that it’s unethical to adjust information and datasets only to get statistically significant results, which are not clinically important. In the end, patients’ well-being and quality of life come before numbers and reports.


Data Analysis: Planning the Analysis

Before researchers start with the data analysis, there are a few recommendations that should be considered. The first step experts must take is to perform a univariate analysis. Only after that, bivariate and multivariate analyses can be conducted. This gives researchers a chance to analyze each variable in detail (Peat, 2011). It also helps data analysts get some meaningful insights into aspects, such as the type of variables, the range of values, possible errors, and skewness.

It’s important to mention that when data is not normally distributed, experts should employ non-parametric statistics or turn to the transformation of data. Again, all steps must be entered into the logbook of the study.

Creating the Right Questions

Before proceeding with the analysis, researchers must have clear ideas and methods to deal with missing data. Missing data when random may affect the statistical power of the study but not the actual results. For instance, it’s common for participants to skip an item by accident, especially when the visual presentation of the questionnaire is not clear (Peat, 2011). Therefore, surveys and apps should be simple, clear, and user-friendly. If possible, experts can contact those subjects and ask them to clarify any missing information.

When this tendency is non-random, though, missing data may affect the generalizability of the study. As a matter of fact, people often avoid revealing information about their economic status, which may affect the results. In fact, a helpful tip is to include missing data in the prevalence rates but not the bivariate or multivariate analyses. For continuous data, on the other hand, a mean value can be computed. Note that in this case, using the mean of the total sample is considered as a conservative approach, so it’s better to use subgroup mean values.

Analyzing Outliers

Outliers are called incorrect values or anomalies. They can be described as figures that lie far outside the norm. In fact, when the research sample is small, outliers can lead to type I and II errors. That means that if outliers are included in the analysis, generalizability may be affected.

To deal with abnormalities, researchers can simply delete any outlying values (Peat, 2011). Another approach is to recode the outliers and replace them with values which are closer to the mean. All steps should be documented.

Categorizing Variables

Defining and categorizing variables are also crucial aspects of the data analysis. Not surprisingly, before any bivariate or multivariate analyses, all variables should be categorized.

Note that outcome variables are the dependent variables, which can be placed on the y-axis. Intervening variables, such as secondary and alternative outcomes, also go on the y-axis. On the other hand, explanatory variables, called independent variables, risk factors, exposures, and predictors should be plotted on the x-axis (Peat, 2011).

Data Documentation & Research

Data documentation is paramount. As explained above, all steps of the data analysis – from recoding outliers to performing univariate analyses – should be documented in a data management file (Peat, 2011). Aspects, such as structure, location, coding, and missing values, must be documented and kept safely.

Although digital solutions support clinical trials, print-outs should also be stored and secured. Files should be kept together and labeled accordingly. In the end, documentation should ensure transparency and interoperability.

Interim Analyses

Dealing with data is tricky. Interim analyses are crucial in research as they can support good ethical principles and management practices. However, experts should try to minimize the number of interim analyses. The most desirable option is to have the dataset completed before the actual start of the analysis.

Note that medical data is sensitive. Therefore, it should be used only for the purposes it was collected for and for the hypotheses which were formulated prior to the study. Otherwise, data can be misused, which is a phenomenon known as data dredging or fishing expeditions (Peat, 2011). Data dredging is the practice of analyzing big datasets only to find relationships that don’t exist.  In fact, cross-sectional and case-control studies are sometimes prone to such practices.

Still, some datasets can be explored a step further. If a new study has developed an appropriate study design, an existing set can be used more than once to explore new relationships. In fact, in high-quality data, results that were not anticipated would not interfere with the study. Note that research hypotheses should have biological plausibility, which means that there would be a cause-and-effect relationship between a factor and a disease.

Data Analysis: The Methods

After all the corrections have been made, researchers can start with the actual statistical analysis. We should mention that the results represent the sample, which supposedly represents the population. However, in practice there are numerous differences between samples and populations, so the information may vary as the random sampling is repeated. This is a phenomenon known as sampling variability, which researchers try to eliminate.

Univariate Methods

Univariate methods can help experts explore each variable. The frequencies of categorical data and the distribution of continuous variables should be calculated in order to gain meaningful results. There are tables which can help researchers decide if categorical data and groups should be combined or if procedures like chi-square (e.g., Pearson’s chi-square, Fisher’s exact test, etc.) should be conducted. Note that for categorical data with ordered categories, non-parametric statistics can be employed.

Since categories with small numbers can affect the results significantly, groups can be combined. This can be done by analyzing the distribution of each variable (Peat, 2011), which, as mentioned above, should be done before the start of any bivariate or multivariate analyses

Continuous Data

Although the analysis of categorical data is pretty straightforward, continuous and discrete data also offer numerous insights. For continuous data, experts should check if figures are normally distributed or skewed. If not, the transformation of data or non-parametric methods should be considered (Peat, 2011). Yet, note that parametric methods provide clearer results for the same sample size when compared to non-parametric methods.

Apart from utilizing the pathway above, experts should calculate basic summary statistics, such as distribution, mean, standard deviation, median, and range of each variable (Peat, 2011). Note that the mean is defined as the average value of data. Median is the central point of data, and half of the measures lie below and half above it. Range, on the other hand, is defined as the measure of spread from lowest to the highest value.

When it comes to continuous data, experts should understand that medians and mean values are identical when there’s a normal distribution, and different when there’s a skewed distribution: Also, it’s important to understand that if the datasets are skewed to the right, the mean will be an over-estimate of the median value. On the other hand, if the datasets are skewed to the left, the mean will be an under-estimate of the median value. Note that in case the mean values and the median values cannot be revealed, there are other formulas that experts can employ (often by calculating the 95% range of the value).

Confidence Intervals

Confidence intervals are paramount values in research and data analysis. Just like with the mean and the median values, confidence intervals can help experts and statisticians make sense out of their datasets. Confidence intervals indicate a range of values within which the true summary statistic can be found (Streiner, 1996).

Interestingly, the 95% confidence interval is defined as an estimate of the range in which there is a 95% chance that the true value lies (Peat, 2011). While the 95% confidence interval measures precision, it’s important to remember that it differs from the interval defined by the mean +/-2 standard deviations (SDs). To be more precise, the SD indicates the variability of the original data points. The confidence intervals, on the other hand, are constructed based on the standard error of the mean, which is the variability of the mean values. Both values can help experts analyze their datasets in depth.

Baseline Comparisons

Baselines comparisons are vital. One of the first steps experts need to take is to compare vital characteristics, such as confounders and other effects that may affect the results. Randomized trials usually ensure transparency and balance of confounders. However, some researchers may perform a significance test in order to compare a baseline with a final measurement in each separate group (Bland & Altman, 2011). Note that often mean values and SDs are needed to report baseline characteristics of continuously distributed data (instead of standard error or 95% confidence interval).

Most of all experts should understand that sometimes statistical tests are not enough, and the absolute differences between the subjects are better indicators to test any possible clinical differences between groups (Peat, 2011). In the end, all findings should be adjusted for bivariate and multivariate analyses, such as multiple regression.

Bivariate and Multivariate Methods

After computing fundamental initial steps, such as univariate analyses, distribution of variables, categorization of variables, and investigation of baseline characteristics, it’s time to continue further. Although bivariate and multivariate methods may sound more complicated, there are clear formulas, which experts can use.

Note that depending on the number of outcomes, there are different statistical techniques (Peat, 2011). Some of the main procedures for studies with one outcome are McNemar’s test, Kappa, Friedman’s analysis of variance, paired t-test, and intra-class correlation. For two outcomes, on the other hand, statistics, such as likelihood ration, Kendall’s correlation, Wilcoxon test, Mann-Whitney test, and logistic regression, can be employed. Canonical correlation and adjusted odds ratios can be utilized for more than two outcomes. Note that multiple regression and logistic regression are two of the most popular tests employed in any multivariate analysis.

Visual Representation

To sum up, dealing with data may seem complicated. Yet, it’s one of the most exciting aspects of research. There are numerous variables, categories, types of analyses, and statistical methods to help researchers analyze medical information. Most of all, to make sense out of this maze of figures and values, visual representations are crucial.

Experts should always provide tables, graphs, and charts to present their findings and ensure transparency. In the end, it’s not a secret that good documentation practices support scientific research. In practice, the visual representation can enhance communication between doctors and patients and boost the accuracy of diagnostic inferences (Garcia-Retamero & Hoffrage, 2013).

Because data is not only an abstract notion – data should be utilized to support people’s well-being and scientific progress.


Bland, J., & Altman, D. (2011). Comparisons against baseline within randomised groups are often used and can be highly misleading. Trials, 12.

Garcia-Retamero, R., & Hoffrage, U. (2013). Visual representation of statistical information improves diagnostic inferences in doctors and their patients. Social Science and Medicine, 83, p.27-33.

Peat, J. (2011). Analysing the data. Health Science Research: SAGE Publications, Ltd.

Streiner, D. (1996). Maintaining standards: differences between the standard deviation and standard error, and when to use each. The Canadian Journal of Psychiatry, 41(8), p. 498-502.

Why Is Continuous Data “Better” than Categorical or Discrete Data? (2017, April 7). Retrieved from

Subjective Outcomes: Questionnaires and Data Forms

By | Health Sciences Research

Questionnaires, Data Forms & Research

Medical research and data go hand in hand. Often, to collect health-related information, experts rely on surveys, data forms, and questionnaires. Such measurements are based on data about previous symptoms, demographics, and other vital details which objective methods and laboratory results may have missed. One of the biggest advantages of questionnaires and data forms is the fact that they are cost-effective, simple, and quick to administer. On top of that, such tools are focused on patients and their subjective opinion, which is the core topic of interest in the field of digital health.

However, developing a new questionnaire or a survey comes with many challenges. First, as medical research should be precise and valid, the first step in the development of a new tool is a detailed literature review. Also, experimental goals need to be clear: experts need to clarify what they will gain from the development of a new questionnaire and explore if there are not any other sources they can use to collect information (“A Step-by-Step Guide to Developing Effective Questionnaires and Survey Procedures for Program Evaluation & Research”). Consequently, research variables and coding schedules must be well-defined. All items must be able to detect subtle changes, and at the same time, they must be logical, organized, short, and simple to understand. To help the effective development of a new questionnaire, other experts need to be consulted: peer review, for instance, is a recommended technique (Peat, 2011). Most of all, pilot testing must be conducted to ensure good validity, reliability, and generalizability.


Choosing the Mode of Administration

One of the first aspects to consider is the mode of administration. There are different modes of administration: self-administered questionnaires, surveys administered by a caregiver or a family member, and researcher-administered forms – each with its various benefits and disadvantages (Peat, 2011). For instance, self-administered questionnaires are time and cost-effective, and as such, they are extremely beneficial in large samples. However, self-administered forms are prone to unclarity and low response rates. Surveys administered by a family member or a caregiver can be used in pediatric populations or for adults who cannot respond for themselves. However, they can refer only to observed symptoms, such as vomiting – simply because some outcomes, such as morning stiffness, can be known only to the patients. On the other hand, surveys administered by a researcher can help experts collect more complex data, which is paramount in rare diseases. Unfortunately, such tools are more expensive and prone to interpretation bias.

Another important factor is the procedure itself: surveys can be done in person, via the phone and post or web-based. Recently, health technology has been established as an effective tool, and online surveys have slowly replaced paper forms. No matter what type of administration experts choose, it’s extremely important for experimenters to be consistent throughout the whole study, which will increase the internal validity and reliability of the new test.

Creating the Right Questions

Creating new research questions is perhaps one of the most exciting and challenging parts of research. As explained above, a literature review is crucial to help experts gain a better understanding of the topic of interest and the existing measurements. Knowing if there are other similar and valid surveys is vital, and in fact, it can save precious time (Peat, 2011).

All questions should be easy to understand and relevant. Therefore, the main characteristics of each sample (including size) should be considered. In fact, when it comes to participants, the sampling procedures (randomized, etc.) also need to be established. On top of that, researchers must decide if the questionnaire will be confidential for follow-up purposes or completely anonymous.

The content, the wording, the order and the length itself are other vital aspects researchers need to focus on. Note that sometimes the same question can be asked twice but in a different way to double check responses and social desirability.

We should mention that, usually, research questions tackle two types of medical information: qualitative (which is used to generate hypotheses) and quantitative (which is needed to test hypotheses) (“Questionnaire Design”). For qualitative data, exploratory and open questions might be better. Although they are more difficult to code and analyze, open questions widen the scope of research and help experts generate new ideas (Peat, 2011). On the other hand, standardized close-ended items are needed for collecting quantitative information. They also come with lots of challenges, such as attracting random responses. Still, they help researchers collect data quickly, via fixed and pre-coded replies (Peat, 2011).

In any case, experts should always try to reduce unambiguity. If one could measure exposures, confounders, outcomes, and demographics at the same time, this would be the ultimate testing tool. Therefore, focus groups can be extremely valuable to collect new ideas, census forms to help research generate questions, and peer review to establish internal validity.

Note that when sensitive information, such as ethnicity or income, is needed, surveys become more complicated. Do not forget that such questions may reduce the response rate, so they can be excluded or added at the end of the survey.  In fact, using wording similar to the national census, for instance, is a good method to make participants feel more comfortable (Peat, 2011).

The Power of Wording

As explained above, questions should be relevant, simple, valid, and responsive to change. Therefore, the wording is a paramount aspect of the development of a new survey.

  • Positive wording is more attractive and generates better response rates. Therefore, experts should avoid medical jargon and terms.
  • The construction of the items should be simple to understand and easy to answer, code, and analyze. Experts must ask only one question at the time. In case it’s needed, capitalization of words can be applied to clarify the question and help subjects focus better.
  • Pilot studies are a must. They can help experts deal with unambiguity. During a pilot study, experts can ask participants to rephrase questions they don’t understand, and consequently, achieve some new and better wording.
  • If there’s one correct answer, the response options should be clear. If there are multiple-response categories, these different groups should have the same meaning for everyone. For instance, timing responses, such as ‘seldom’ or ‘rare,’ may mean different things to different people.
  • ‘Don’t know’ options should be avoided. People are more willing to choose ‘Don’t know,’ especially when the meaning of the question is unclear. What’s more, this effect can lead to low validity and generalizability of the tool.
  • Dealing with missing data is also vital and can help experts increase generalizability. In addition, researchers should decide if missing answers can be used as negative responses in the analysis.
  • Last but not the least, when it comes to questionnaires in different languages, second opinion and translation are a must.

Presentation Matters

Apart from the wording of the questions, the layout is also vital. Note that electronic surveys engage participants more, especially younger subjects. Tools should be clear, short, and attractive (Peat, 2011). They need to have:

  • Large font
  • Enough space
  • Mild colors
  • Numbered questions
  • Titles of the questionnaires
  • Instructions on how to answer

Note that tick boxes are more attractive than other response options. Circling, percentages or frequency of behaviors can be confusing. In fact, time responses may be inconsistent and may mean different things to different people, as explained above. A suggestion is, instead of ‘usually,’ to ask for a precise indication of frequency (‘1-6 times a year’, etc.).

Again, it’s crucial to decide if missing answers mean negative responses. For instance, a parent can skip a statement like ‘Is affectionate’ about their child. Well, we have to ask ourselves: have they missed the item unintentionally, or does that mean that the child is not affectionate?

When answers are given via a time scale, there are some tricky details as well. For instance, it’s a fact that subjects avoid the ends of the scale. So, instead of having a 5-point scale with ‘never’ and ‘always’ at the two ends, extend the scale and put ‘almost never’ and ‘almost always’ in between. On top of that, short scales may not detect subtle changes, which decreases responsiveness. Note that although scales require larger samples, they are statically more powerful than ‘yes’ or ‘no’ answers, or other fixed responses, such as ‘true’ or ‘false’ and ‘agree’ or disagree’ (“A Step-by-Step Guide to Developing Effective Questionnaires and Survey Procedures for Program Evaluation & Research”).

Last but not the least, confidential or anonymous, researchers should always add a ‘Thank you!’ section at the end of the survey.

Coding & Safety

When it comes to new questionnaires, coding should be considered. Coding is a method that benefits data collection and analysis.

Self-coding data collection forms, for instance, can spare researchers crucial time.

Usually, in paper forms, an expert transfers the codes to a spreadsheet (“How to analyze questionnaire responses”). Thus, to guarantee safety, observer data should also be entered, and hard copies of the original forms should be kept to maintain quality control.

Pilot Study Wanted

When the initial draft of the new questions is ready, a pilot study should be conducted, for both questionnaires and data collection forms. Note that a number of pilot studies might be required (Peat, 2011). Some of the main steps are to administer the test in the same way it will be administered and to a similar sample to the target population. Experts should check if all answers are completed, measure time, and ask for feedback. As a result, they should reword the items, exclude unclear questions, and shorten the scale. If needed, another pilot testing can be done.

Ensure Validity & Repeatability

Of course, validity and repeatability are crucial aspects. When it comes to internal validity, peer review may help experts ensures good face validity. Factor analysis and logistic regression, on the other hand, can help experts analyze which items contribute to the topic and to what extent. Cronbach’s alpha can also be used to assess the degree of correlation between the new items. Note that this coefficient varies between zero and one, but when it’s too high (close to one), results indicate that some items are too identical and should be excluded (Peat, 2011). For instance, ‘vomiting’ and ‘feeling sick’ may be too similar and overlapping.

To sum up, choosing the outcomes is crucial and sometimes a new questionnaire or a data collection form may be required. In the end, it’s important to establish a harmonic relationship between clinical experience, measures, statistics, and theory – with the sole purpose to understand a disease and improve people’s well-being.


A Step-by-Step Guide to Developing Effective Questionnaires and Survey Procedures for Program Evaluation & Research. Retrieved from

How to Analyze Questionnaire Responses. Retrieved from

Peat, J. (2011). Choosing the Measurements. Health Science Research, SAGE Publications, Ltd.

Questionnaire Design Retrieved from

Research Methodology: Validity

By | Health Sciences Research

            Validity & Outcome Measurements

Designing a study is a complicated process and choosing the measurements is one of the essential milestones in research. From creating appropriate items to computing statistical analyses, ensuring validity is vital.

Although the definition of validity has undergone many modifications, there are two types of validity: external and internal; both being crucial factors to consider.


External Validity: Aim for Generalizability

External validity is known as generalizability or the extent to which scientific findings can be applied to other settings rather than the ones tested. In other words, external validity reveals if research outcomes apply to everyday life and the general population.

Note that external validity is a complex concept which can’t be measured in a single statistical analysis. Therefore, external validity must be agreed upon between experts. One of the good methods is to implement strict inclusion and exclusion criteria, especially in medical settings. For instance, in clinical trials, a study with good external validity would involve hospitalized patients and would reveal results which can be applied to the general population near the same hospital. On the other hand, in population research, random sampling and high response are needed to guarantee external validity (Peat, 2011).

Here we should mention a couple of curious examples, which tackle the problem of external validity. In psychology, for instance, conformity and diffusion of responsibility are common phenomena. Several studies replicated some alarming findings of people’s nature. For example, a study conducted by Latan and Darley (1968) tested if participants would help a sick person while waiting at a laboratory. The findings showed that people would only act if they thought they were the only person waiting in the laboratory. If there were more people around them, the chances to help someone sick decreased. This study has good external validity. The fatal case of Kitty Genovese who was stabbed near her apartment while her neighbors watched passively through their windows proved the existence of the bystander effect – or the mentioned phenomenon about observers being less likely to help in the presence of other people.

Internal Validity: Improve Your Measurements

Internal validity can be defined as the degree to which a measurement is valid in what it claims to assess. To be more precise, this type of validity refers to the design of the study – or if it measures what it is supposed to assess (Peat, 2011). In other words, internal validity explains how well a study avoids confounders and other nuisances, such as within-subject and between-between observer errors.

To guarantee good internal validity, measurements need to be accurate and precise. Note that for objective measurements, such as spirometers, internal validity is not a warning concern (Peat, 2011). However, for measures, such as subjective outcomes, surrogate end-points, and predictions – internal validity is crucial. Therefore, when developing a new measurement or an instrument which can be prone to bias, such as health-related quality of life tool, a lot of testing is needed to reach good internal validity.

Consequently, to show how robust each measurement is, there are four types of internal validities which researchers need to assess:

Face Validity

Face validity is an essential aspect to consider. Face validity or measurement validity shows if a measurement is at face value and if it assesses what it appears to test. As explained above, often experts need to decide upon the concept of validity as statistical analyses cannot identify good validity. This is extremely vital for subjective outcome measurements. Researchers need to measure if a tool identifies all changes and symptoms if it’s acceptable and precise and if it fulfills its purposes (Peat, 2011).

When it comes to the development of new questionnaires, for instance, face validity can be increased by the inclusion of questions which are relevant, reveal proper wording, and have a good response rate. Although the research panel decides on these issues, both researchers and subjects need to agree about the acceptability of any new survey.

Content Validity

Content validity, known as logical or rational validity, shows if a measurement manages to assess every facet of a theoretical construct. In other words, content validity shows if a questionnaire covers the domain of interest in an adequate way and if it represents the illness of interest precisely (Peat, 2011). This is an important indicator for both subjective and objective measurements. Content validity also needs to be discussed by experts to reach acceptability. Note that in every survey, each item may have different content validity.

Some of the techniques to increase content validity is to cover all aspects of the disease and to measure all confounders and nuisance. In questionnaires with many items, statistical procedures can help researchers include or eliminate items. For instance, factor analysis to find all items that belong to an independent domain and all items that cluster together can be performed. Also, Cronbach’s alpha is a vital indicator of internal consistency – if questions reveal close replies, then they address the same dimension. Note that by eliminating questions that correlate with each other, internal consistency may increase; however, that would limit the domains and the applicability of a tool. Thus, it’s better to include various questions to obtain a comprehensive picture of the disease or the treatment of interest.

Criterion Validity

Criterion validity reveals how well a measurement agrees with an established standard, and consequently, how it correlates with other research measures. Criterion validity is fundamental, especially when it comes to new measures. If new tools prove to be more beneficial than the established standard (e.g., time-effective, cost-effective and repeatable), then the old gold standard should be replaced. To assess criterion validity – or which tool is better or if two measures can be used interchangeably – the conditions of measurement should be identical, the order of the tests should be randomized, the interval between assessments should be short, and most of all, researchers and subjects should be blinded to the results (Peat, 2011).

What’s more, criterion validity can be used to predict outcomes: in other words, it can be utilized to predict the gold standard results. This property is known as the predictive utility. For example, the severity of back pain in predicting future back problems can be assessed. In this case, aspects, such as the history of pain, current therapy and objective outcomes (such as X-rays), can be included in the analysis.

Construct Validity

Measuring validity is an important task, and as explained above, there are various techniques that can be implemented in research. Here we should mention a fundamental requirement regarding the study sample. Usually, a wide range of studies (especially when measuring construct and criterion validity), focus on two defined groups: subjects with a well-defined disease and healthy individuals. This extremity of choosing well-defined groups brings clarity to the analysis, but it has some disadvantages. One of the cons is that it limits the applications of a tool. Thus, it’s recommended to apply outcome measurements to individuals who are not diagnosed clearly or who have less severe symptoms. In addition, it’s beneficial if validity is measured in random samples.

Note that when it comes to the measurement of validity, the relationship between validity and repeatability also matters. Basically, repeatability or test-retest reliability refers to the precision of an instrument. It measures the variation in tools over a short period – with measures being administered to the same subjects, under identical conditions. Usually, criterion and construct validity improve when repeatability is high. Still, do not forget that good repeatability does not guarantee good validity.


Barlett, J. (2008). Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound in Obstetrics and Gynecology, 31(4).

Construct Validity. Retrieved from

Latane, B., & Darley, J. (1966). Bystander “Apathy”. American Scientist, 57, p.244-268.

Peat, J. (2011). Choosing the Measurements. Health Science Research, SAGE Publications, Ltd.

What Is Validity (2013). Retrieved from

Choosing the Measurements: Outcome Measurements Matter

By | Health Sciences Research

Choosing the Outcome Measurements: Planning Is the Core of Research

Conducting a medical study is a challenging process. From choosing a study design to computing a statistical analysis, medical research is complicated. On top of that, health-related studies must follow safety and ethical regulations, which adds burden to researchers. To avoid any possible complications, planning is paramount.

Choosing the right measurements is an essential part of each study. Outcome measurements should be well-defined and accurate in order to help researchers understand the connection between variables, assess the benefits of a new intervention, and improve patients’ well-being (Peat, 2011).


Defining the Outcome and Explanatory Variables

When it comes to choosing the right measurements, explanatory and outcome variables also need to be clear and easy to measure in order to test the main hypothesis of the study. Note that explanatory variable is a type of independent variable. While independent variables are completely autonomous and unaffected by any other factors, explanatory variables are not entirely independent (“Explanatory Variable & Response Variable: Simple Definition and Uses,” 2015).  Still, explanatory variables are vital as they can explain any possible changes and can affect the dependent variable. In fact, most phenomena are interconnected. Let’s say one wants to measure the effects of fast food and soda drinks on weight: these variables are not completely independent as food corners often offer menus that contain both options. Thus, independent and explanatory variables are two terms that are often used interchangeably. However, in precise clinical studies with multiple outcomes, all measured factors should be well-defined.

On the other hand, dependent variables, also called outcome and response variables, are the factors that are expected to change during an experiment. The outcome variable is the focus of any study, including clinical trials. For instance, experts may be interested in treatments that prolong the life of cancer patients. The type of treatment (chemotherapy, for example) will be the explanatory variable, while the survival time will be the outcome variable (“Explanatory Variable & Response Variable: Simple Definition and Uses,” 2015). Let’s not forget, though, that today’s medicine and healthcare technology focus not only on mortality rates but patients’ overall well-being and quality of life (especially in severe and chronic conditions).

Subjective Vs. Objective Outcome Measurements

Choosing the outcome measures can be a tricky task. All outcome measurements are vital for research and experts can decide on either subjective or objective outcome measurements. Both subjective and objective outcome measurements have their benefits and challenges in practice, and it’s a fact that there’s not a one-model-fits-all approach.

Subjective outcome measurements, for instance, are defined as any measurements that are open to interpretation. They can be self-administered, administered by an observer or a medical professional. One of the advantages of the subjective measurements is that they are easy to administer, cost-effective, and rapid. As such, they are an effective method in clinical trials, which can assess if there’s an improvement in people’s self-reported status (Peat, 2011). Examples of subjective outcomes are questionnaires about the frequency of symptoms or illness severity. However, as the name suggests, subjective outcome measurements are based on subjective judgment, and as such, they can be prone to errors and bias. Therefore, when it comes to clinician-reported outcomes, for example, training of staff is crucial in order to avoid observer bias.

On the other hand, objective measurements gather medical information collected by standard instruments or professional equipment, which usually reduces bias. For instance, lab results and biochemistry data are examples of objective measurements (Peat, 2011). However, one of the disadvantages is that these measurements collect short-term data that changes quickly, such as blood pressure. Nevertheless, objective measurements are precise, and therefore, highly implemented in research.

Multiple Outcome Measurements

Multiple outcomes are also paramount in research, and clinical trials and experts need to consider them when choosing the outcome measurements. Since any new treatment affects various factors in one’s life, often one single measure is not enough to reflect all the physical, emotional and social changes in patients (Tyler et al., 2011). In fact, in cases when the most important outcome is unclear, and effectiveness and efficacy must be checked across various domains, multiple measures are needed (Peat, 2011). However, when designing a study with multiple outcome measures, there must be a clear differentiation between primary and secondary outcomes in order to overcome all statistical challenges in the analysis. In fact, often researchers have a small set of measuring tools as primary outcomes and a broader one as secondary.

Note that outcomes that are significant from a medical point of view might be perceived as less important by subjects. For instance, experts may aim to reduce hospitalization rates, while patients may aim further, as returning to work.

Surrogate End-points & Clinical Trials

Note that in long-term clinical trials choosing the right outcome measurements is a delicate process. In fact, the primary outcome variable is called a surrogate end-point or an alternative short-term outcome. Surrogate end-points are defined as biomarkers “intended to substitute for a clinical endpoint” (Aronson, 2005). Surrogate end-points facilitate research as they are easy to implement and more cost-effective. On top of that, there are some ethical issues that allow the use of surrogate end-points only. For instance, in laboratory settings and physiological markers, blood pressure indicators may be used as a surrogate for stroke. Still, when the primary outcome is mortality, end-points cannot substitute the true end-points in the long-term and more research is needed to confirm findings and benefits of treatment.

In general, the clinical trial is a complicated process, which is often marked by low recruitment rates, financial demands, and ethical regulations. Research is needed as new drugs, and alternative treatments can improve people’s well-being and save lives.

Outcome Measurements: Qualities to Consider

No matter what outcome measurements researchers choose, there are some essential qualities that all measures need to cover:


All measurements need to reveal good validity. Validity can be explained as the degree to which a measurement is valid and strong in what it claims to measure. For instance, intelligence tests need to measure intelligence, not memory, in order to be valid.

  • Face validity is one of the essential types of validity. As the name suggests, face validity describes if a measurement is at face value and assesses what appears to be measured. In clinical settings, face validity shows if the outcome measures identify important symptoms and changes.
  • Content validity or logical/rational validity shows if a measurement manages to measure every facet of a theoretical construct. In medicine, content validity guarantees that measurement is relevant to the study and the illness in general.
  • Criterion validity reveals how well a measurement can predict a health-related outcome and correlates with other research measures.
  • Construct validity can be defined as the extent to which a test measures the construct it’s supposed to test.

Reliability & Repeatability:

Reliability or the consistency of measurement is also crucial. The repeatability of a measurement is essential because good test-retest reliability can avoid variability. In other words, the same test given to the same people in a short period should show the same results. The between-observer agreement should also be sought. Note that between-observer agreement or inter-rater reliability reveals the consensus between different raters.


Errors should be diminished by good outcome measurements. There are three types of systematic errors (Peat, 2011):

  • Subject errors are systematic errors related to subjects’ bias.
  • Observer errors are due to differences in observers and the administering and interpreting of a test.
  • Instrument errors are caused by the instrument itself. Thus, tools should be accurate and calibrated according to a standard (certain temperature, for example).


Responsiveness is another crucial quality (Tarrant et al., 2014). It measures the efficacy and effectiveness of an intervention and the extent to which the service quality meets clients’ needs. Measurements should be responsive to within-subject changes. Some tools cannot detect small changes, and therefore, they cannot be used as primary outcomes. For example, measurements, such as a 5-point scale in which the symptom frequency is categorized as ‘constant,’ ‘frequent,’ ‘occasional,’ ‘rare’ or ‘never,’ are not responsive; such tools cannot detect subtle changes in symptoms. On top of that, experts should consider not only physiological changes but the quality of life.

Sample Size:

The statistical power of a test is another factor to consider when choosing outcome measurements (Peat, 2011). In general, the sample size needs to be adequate in order to show clinical and statistical differences between groups. Often, studies focus only on primary outcomes, but in practice, a broad range of outcomes should be implemented in research.

Sample size and types of variables go together. Here we should mention that categorical values, for instance, need bigger samples. A categorical or nominal variable has two or more categories (without any order). Gender is a categorical variable with two categories (male and female), and there is no intrinsic ordering of the categories. Ordinal values, on the other hand, are similar but there’s order in the categories. Note that the interval of the categories and their order is inconsistent, though. Economic status – with low, medium, and high categories – is an example of ordinal values. On the other hand, interval values have also categories placed in order but with equal spacing in between. Income of $5,000, $10,000 and $15,000 is a good example, as the size of that interval is the same ($5,000). Last but not the least, continuously distributed measurements, such as blood pressure, are vital (“What is the difference between categorical, ordinal and interval variables?” 2017).

To sum up, choosing the outcome measurements is a serious step that lies in the path to success of any health-related study. Because outcome measurements matter!


Aronson, J. (2005). Biomarkers and surrogate endpoints. British Journal of Clinical Pharmacology, 59(5), p.491-494.

Explanatory Variable & Response Variable: Simple Definition and Uses (2015, February 16). Retrieved from

Peat, J. (2011). Choosing the Measurements. Health Science Research, SAGE Publications, Ltd.

Tarrant, C., Angell, E., Baker, R., Boulton, M., Freeman, G., Wilkie, P., Kackson, P., Wobi, F., & Ketley, D. (2014). Responsiveness of primary care services: development of a patient-report measure – qualitative study and initial quantitative pilot testing. Health Services and Delivery Research, No. 2, 46.

Tyler, K., Normand, S., & Horton, N. (2011). The Use and Abuse of Multiple Outcomes in Randomized Controlled Depression Trials. Contemporary Clinical Trials, 32(2), p. 299-304.

What is the difference between categorical, ordinal and interval variables? (2017). Retrieved from

Conducting the study – Managing Clinical Trials

By | Health Sciences Research

Managing Clinical Trials

Medical research revolves around challenging ideas, practical solutions, and optimistic results. However, even the most promising clinical trial can collapse if it faces poor management practices. Since medical research is a complicated and sensitive process, it requires expert management (Farrell et al., 2011). High standards can be achieved only via careful planning, good management, and regular collaboration between experts.

As a matter of fact, there are several aspects researchers need to consider prior conducting a study to ensure good management practices and high standards (Peat, 2011):

  • Pilot studies are mandatory to test recruitment procedures and equipment. As a result, during the actual study, only pretested instruments and methods will be employed.
  • At the same time, interim and safety analyses need to be minimized because preliminary results, especially in small studies, can lead to bias.
  • Documentation and data safety are essential. All errors, changes, and deviations from the study protocol should be documented, which will facilitate the management process.
  • Handbooks can be beneficial to guide experts through the documentation process. The study handbook must be available at all times, and it must indicate vital details, such as study meetings and errors spotted in the data collection.
  • With information being transparent at all times, monitoring of data is also an essential aspect of good management practices.
  • Training of staff is paramount in any well-managed trial. Note that researchers must be blinded to the study results. In fact, rotation of staff is a good management practice to ensure standardization.
  • Last but not the least, regular team meetings or online assessments of the whole research team can become the key to success. They can help researchers feel appreciated and supported.


Management and Monitoring Committees

Although medical teams play a leading role in research, monitoring committees are also needed to ensure good management practices. In fact, creating a hierarchy of committees, whose role will be decided before conducting the study, is one of the most effective practices (Peat, 2011). While internal committees include investigators and staff, external committees consist of peers and other experts. Independent experts are a must: they can help researchers decide on the number of interim analyses, and they can support integrity. Note that there must be a committee that should be responsible particularly for data analyses.

Regular closed meetings are crucial for good management, decision-making, and quality of data. Last but not the least, presentations and discussions (on both national and international level) can facilitate the multi-centered approach which is needed in research (Farrell et al., 2010).

Management and Research Teams

Medical research is a diverse field of work, but in the end, research teams are the most active participants in any clinical trial. In fact, managing research teams do not differ than managing business teams. Good teams consist of professionals with different expertise; therefore, diversity should be promoted to create a productive environment. To be more precise, each medical team includes numerous experts, such as a chief investigator, a trial manager, a data manager, administrative staff, statisticians, and programmers.

Training is also mandatory in various aspects, such as recruitment and data entry. On the other hand, all roles should be well-defined, and staff should be able to take responsibility for their actions. The most important aspect is to create an atmosphere marked by personal satisfaction and a friendly attitude.

Study coordinator motivates the staff and encourages research. The study coordinator should be familiar with all aspects of research and staff roles to promote high standards and integrity. From organizing paperwork by checking data entries to talking directly with participants – the study coordinator should be a knowledgeable person passionate about the project.

People who manage medical research must have excellent communication and organization skills and must express enthusiasm and innovative thinking. Managers must encourage professional development and personal satisfaction at the same time. They must create a level of trust and a balance between realistic priorities and interesting tasks. Most of all, good managers must celebrate research success.

Management and Data

Let’s not forget that clinical trials rely on data. Recruiting participants and collecting data is challenging, so are data entry and analysis. Here are some basic principles and steps of managing medical information experts must consider:

  • Code data and decide on missing values
  • Enter data into a database
  • Conduct multiple checks
  • Make corrections when necessary
  • Check for duplicates (codes, matching indicators, etc.)
  • Merge data from different instruments
  • Archive copies to ensure safety and confidentiality
  • Limit access to protect sensitive information

After data collection, all results must be entered into a database. Creating a good database design is difficult, but it’s a process that can spare lots of effort and errors at the later stages of any analysis. When planning a design, the type and the size of the tested variables need to be considered prior research.

Checking for errors is fundamental. Cross-checks can help experts spot values which are outside the utilized ranges and which consequently can’t be used in the analysis (Peat, 2011).

The actual process of data entry is also prone to errors. Self-coding questionnaires are easier to analyze. Nevertheless, any research team must agree on either alphabetic or numeric information prior research. As explained above, range checks, visual checks, and cross-checks are a must. Also, good management should ensure that the entered data will be verified by another coordinator. Note that research staff should be blinded to the study results to avoid errors and bias. Once again, all errors should be documented in the data management manual of the study (Peat, 2011). If there are missing data, a beneficial approach is to contact subjects to clarify or expand on the missing information. When more than one instrument has been utilized, several records should be created. Note that these records should match on at least two indicators to ensure integrity and validity of data (e.g., name and number). Often commercial software can be utilized to support researchers in data analysis; such programs adopt a precise and complete approach to data management.

Connectivity Software: Tools that Support Data Management

In today’s digital health era, connectivity software and relational databases can be highly beneficial, especially when compared to Excel or Word tables. Such programs allow statistical packages to analyze the collected data directly – without having data exported and then imported again. This guarantees transparency and precision. On top of that, a read-only mode can be implemented, which ensures safety and integrity of data management.

However, some studies can’t access such facilities. When connectivity software or relational databases are not available, abbreviated files and spreadsheets can be created to export data and then analyze it (Peat, 2011). During this process, all changes must be monitored in the main file, so the master database will be able to reach high standards and integrity.

Security and Confidentiality: The Main Goal of Good Management Practices

When we talk about data management, the safety of data should always be the main aim of research. Technical issues, accidents, and theft happen, so backup files are also needed. It’s a good idea to archive files. Note that often computer departments deal with security and technical issues.

Medical information should support interoperability and collaboration, but safety is vital. Thus, rights can be restricted and read-only modes created. Confidentiality should also be considered, especially when it comes to pediatric populations and rare diseases. In fact, in some working files, all information about the subjects may be excluded to prevent administrative clerks from having access to people’s data (Peat, 2011).

Managing Clinical Trials: Conclusion

Management practices in medical research are paramount. Experts need guidance because clinical trials are complex, expensive, and time-consuming studies. Researchers should never rush or force participants. Quite the opposite: researchers should plan the study carefully and create a good working environment.

Good management practices imply that tasks and goals are clear way before the actual executing and analysis. All aspects of research must be planned in advance. Good planning means minimal work for participants and investigators. Technology is also fundamental: for instance, digital recruiting methods and online data forms can facilitate research. Note that when it comes to digital solutions, good managers should provide training of staff.

Most of all, good management practices must ensure that staff feels appreciated, driven, and motivated. Not surprisingly, communication and collaboration are paramount. In the end, medical findings should not be restricted to a single organization – but patients all over the world.

To sum up, despite the field of work – from clinical trials to business conglomerates – good management practices are vital to help teams succeed.


Farrell, B., Kenyon, S., Shakur, H. (2010). Managing clinical trials. BioMed Central.

Peat, J. (2011). Conducting the Study. Health Science Research, SAGE Publications, Ltd.

Choosing the Outcome Measurements: Confounders, Effect-modifiers & Intervening Variables

By | Health Sciences Research

Study Designs: Basics of Research

Associations in Research

In clinical settings, measurements are often designed to assess the effect of a new drug and/or treatment on a specific outcome, such as frequency of symptoms. In practice, however, physical, emotional, cognitive, and social factors mix into one. Consequently, researchers can never be certain if the effect of the factor they measure is definite and independent or if it is affected by other exposures.

Choosing the outcome measurements becomes paramount. Often associations and connections between variables and outcomes are affected by nuisance factors, which may lead to bias and misinterpretations. Confounders and effect-modifiers are two types of nuisance factors or co-variates that researchers need to consider and reduce, either at the design stage or the data analysis stage (Peat, 2011). One of the best ways to minimize the effects of such factors is to conduct a broad randomized controlled trial. Apart from the study design, there are suitable statistical procedures that can benefit each study and reduce external nuisance.


Confounders Explained

Confounders are common co-variates, which are defined as risk factors related to both the exposure and the outcome of interest (without lying on the causative pathway) (LaMorte & Sullivan). In other words, confounders lead to distortion in the measurement of the associations between the variables of interest. To avoid critical misinterpretations, experts need to investigate all possible confounders (Peat, 2011).

A clear example of confounding is the study of birth order and the risk of Down syndrome conducted by Stark and Mantel (1966). The research team suggested that there was an increased risk of Down syndrome in relation to birth order: the risk was estimated as higher with each successive child. However, the fact that birth order is linked to maternal age cannot be ignored. Even if the sibling gap is small, it’s logical to assume that maternal age increases with each successive child. In fact, maternal age proved to be a crucial factor.

Another example of confounding is patients’ history of smoking in relation to heart diseases and exercising. It’s been proven that smoking itself increases the risk of a heart disease. At the same time, smoking is linked to exercising: usually, people who exercise on a regular basis smoke less. Thus, when experts analyze the association between heart disease and exercising, they need to consider the history of smoking (Peat, 2011).

Also, age can be a significant confounder. For instance, age can affect the relationship between physical activity and heart problems. Older people are usually less active, and at the same time, older people are at greater risk of developing a disease.

Eliminating Confounders

Confounders need to be reduced as they may lead to over- or under-estimation of factors. Over-estimation is the phenomenon of establishing a stronger association that the one that actually exists. On the other hand, under-estimation is the error of finding a weaker association than the real one. Removing the effects of confounders can be achieved at the design stage or the data analysis stage (Peat, 2011). Considering nuisance at the design stage is usually better, and as a matter of fact, it’s extremely important in cases prone to bias, such as case-control, cohort, cross-sectional, and ecological studies.

In fact, one of the best ways to control for confounders is the use of large randomized trials. Otherwise, if experts let the subjects allocate themselves to different groups, this may lead to selection bias and may affect the confounding effect. Let’s explore the study which investigated the connection between heart issues and exercising again. If experts let subjects self-allocate to an exposure group, often smokers would allocate themselves to a low exercise frequency group. Thus, smoking could become a significant confounder (Peat, 2011).

Sometimes mathematical adjustment of groups may be needed to reduce the uneven distribution of confounders. Controlling for confounders at the data analysis stage, however, is less effective. Note that it may include restriction, matching, and stratification.

A powerful method is to perform a stratified analysis. Stratified analyses require the measurement of confounders in subgroups or strata. The effect of a confounder can have more than one category, which is known as a stratum. All levels should be analyzed; for instance, there might be different analyses for each gender (Peat, 2011). If the stratified estimates differ from the total estimate, this shows a confounding effect. For instance, a study on the connection between living areas (urban and rural) and chronic bronchitis used a stratified analysis. The study showed that smoking is a confounder in the relationship between rural areas and bronchitis, and in fact, smoking may be more prevalent in urban areas.

The stratified analysis requires mathematical adjustments, and therefore, the sample size is also crucial. In large samples, confounders may seem statistically significant while in real life, there are not of clinical importance. On the other hand, small samples might not reveal statistical significance while in reality, confounders affect the results. Thus, confounders with an odd ratio, which is the association between exposure factors and outcome variables, of more than OR=2.0 should always be considered (Peat, 2011).

Multivariate and logistic regression analyses also minimize the effects of confounders. What’s more, they are two of the most powerful statistical procedures

Effect-modifiers Revealed

While confounders affect both the exposure and the outcome, effect-modifiers are associated only with the outcome (“The difference between ‘Effect Modification’ & ‘Confounding,’” 2015). Effect-modifiers or interactive variables affect the causal factor on an outcome. It’s also about stratification – or when exposure has a different outcome in different subgroups. For instance, when a new drug has a different effect on men and women.

Effect-modifiers, however, are not the nuisance because they can give valuable insights. In fact, they can be easily recognized in each stratum (Peat, 2011). The age difference and the risk of a disease can be a clear example of effect-modifiers. For instance, when the risk of disease is estimated for different age strata, in many studies, it’s visible that this risk increases with age.

Recognizing Effect-modifiers

Stratified analysis can be a beneficial way to analyze effect-modifiers. Let’s take a sample stratified by smoking and analyze the relationship between smoking and the risk of myocardial infarction. In this case, blood pressure acts as an effect-modifier. Note that it’s been proven that the risk of myocardial infarction is higher in smokers with normal blood pressure than in smokers with higher blood pressure (Peat, 2011).

However, when there are more than a few effect-modifiers, multivariate analysis is better. Note that confounders and effect-modifiers are treated differently in multivariate analysis.

In fact, the size of the estimates or the β coefficients are always vital (Peat, 2011). Also, note that when the outcome variable is dichotomous, a larger sample is needed to provide meaningful results in the analysis.

Intervening Variables In Research

Intervening or mediating variables are also crucial factors that experts need to consider. They are hypothetical variables which cannot be observed in practice (“Intervening Variables”). Thus, it is not possible to say how much of the effect is due to the independent variable and how much is due to the intervening variables.

A classic example is the study of the connection between poverty (independent variable) and shorter lifespan (dependent variable). We cannot say that poverty directly affects longevity, so an intervening variable, such as lack of access to healthcare, can be vital. Although intervening variables are an alternative outcome of the exposure factor, they cannot be included as an exposure factor. Simply because, as explained above, they are hypothetical constructs (Peat, 2011).

Intervening Variables & Analysis

As intervening variables are closely related to the outcome, they may distort all multivariate models (Peat, 2011).

An example is a study on the development of asthma. In this case, hay fever will be an intervening factor as hay fever is part of the same allergic process as the development of asthma (for instance, exposure to particles, such as pollens). Thus, hay fever and asthma have strong associations.

Confounders, Effect-modifiers Or Intervening Variables?

Deciding if a factor is a confounder, an effect-modifier or an intervening variable requires careful consideration and good data analysis.

Note that all factors can be categorical or continuously distributed. In any case, misinterpretation of variables and associations may lead to bias.

Research should eliminate errors and bias because it can affect people’s well-being and lead to fatal outcomes.


Intervening Variables. Retrieved from

LaMorte, W., & Sullivan, L. Confounding and Effect Measure Modification. Retrieved from

Peat, J. (2011). Choosing the Measurement. Health Science Research, SAGE Publications, Ltd.

Stark, C., & Mantel, N. (1996). Effects of Maternal Age and Birth Order on the Risk of Mongolism and Leukemia. JNCI: Journal of the National Cancer Institute, Volume 37, Issue 5, p. 687-698.

The difference between ‘Effect Modification’ & ‘Confounding’ (2015, June 4). Retrieved from