# SAMPLE SIZE ESTIMATION: OFTEN A DISREGARDED, AND YET AN ESSENTIAL DRIVER OF RESEARCH QUALITY AND INTEGRITY

## Abstract

The snowballing demand of research publications for student graduation and faculty promotion, dearth of quality research specialists and bypassing of conventional inspection of research papers for appropriate study design and methodology from numerous journals have collectively perpetuated the obliviousness and deliberate faux pas of basic research study principles. More than anything else, we have pigeonholed research as merely ‘a concept of statistical significance; p-value of <0.05’. We are so obsessed with reaching this remarkable threshold of p-value of <0.05 that we overlook to take into account the sample size estimation very often. This research mindset and practice seriously impeaches the integrity and validity of scientific research and practice.The sample size determination is an integral part of scientific research, regardless of study design. Incongruous, insufficient, or disproportionate sample size continues to affect the generalization of the research. In scientific research, sample size estimation is a statistical procedure of determining the number of study subjects required to make interpretations about a target population. Researchers in the field of healthcare setting; however, often offer little attention to sample size estimation. Indeed, they select the most convenient figures for their research studies such as 20, 50, 100, 500, and so on. Those who propose and grant approvals to above mentioned research studies should comprehend that there are significant statistical and ethical consequences in the selection of sample size for a research study. A research study with an excessively outsized sample might be considered unethical owing to redundant involvement of additional participants and, in a similar way, the increased research expenses. By the same token, a research study with a relatively too small sample would be incapable of spotting scientifically essential effects. Therefore, such a research would be logically impractical.1–3The sample should always be representative of whole population. You would agree that greater the population, greater the variation in characteristics under study. The large sample size helps detect even minute differences since the probability of getting a result by chance reduces with additional subjects, and makes your sample more representative of the population. From a statistical point of view, research studies with the adequate number of participants have satisfactory statistical ‘power’ to identify study findings.2,3How to calculate a sample size would be beyond the scope of this article. However, we would like to underline few general points that a researcher must consider while calculating the sample size for research study;1. Level of Significance: Level of significance depends solely on alpha (α). It is known as type I error. It represents the probability of rejecting the null hypothesis (Hypothesis testing is defined as selecting a hypothesis from two contradicting hypotheses, Null hypothesis states that there is no statistically significant difference between the estimated and hypothesized value in the studied population and Alternative hypothesis means that there is statistically significant difference between the estimated and hypothesized value in the population under study) when it is actually true. Its usual values are 0.05 (5%) or 0.01 (1%). To elaborate, 1% level of significance means that there is only 1 chance in 100 of falsely rejecting the null hypothesis.3–72. Power (1 – β): It is the probability of a research study to establish a link or causal association between two variables provided that an association exists. Power is related with beta (β). It is known as type II error. It represents the probability of rejecting the null hypothesis when it is actually false. Its usual values are 80% which suggest that we are accepting that 20% of the times; one in five, we will fail to notice an actual difference. The high power of the study to detect the differences in two groups will be attained by the large sample size. Hence, it is essential to consider this statistical parameter in sample size calculation.2–53. Effect Size: Effect size is defined as the measure of difference or association in values of the variable in study groups. For instance, if the average weight gain after one nutrition plan is 5 kg and following another nutrition plan is 10 kg, the absolute effect size would be 5 kg. The relative effect size in this example is 5kg/10kg; 50%. We usually estimate effect size on the basis of pilot studies, earlier published reports and clinical experience with the study problem under investigation. Most importantly, required sample size is small if the effect size between the study groups is large and vice versa.3–5, 84. Population Variance: For a scientific research, a researcher is required to predict the population variance of a continuous outcome variable which is a square quantity of the statistical measure; standard deviation (SD). Researchers frequently look for estimates in published papers since the variance is commonly unknown. We would require a small sample size for a homogenous population because of low variance or SD, Likewise, a large sample size would be required if population under study is heterogeneous due to large variance or SD.5. Non-response Rate: Non-response rate is the number of participants quitting the research study. The non-response rate in the sample size estimation depends on previous literature, experience and type of population to ensure getting an unbiased estimate.4An example of sample size calculation is given here. A researcher wants to compare the effectiveness of Losartan and Captopril in reducing systolic blood pressure. Previous research study demonstrated the average systolic blood pressure of 132.86 mmHg and 127.44 mmHg with both the drugs, respectively. The standard deviation was 15.34 mmHg and 18.23 mmHg, respectively. After adding for 5% non-response rate in each group mentioned in previous study, what would be the sample size to study the difference in systolic blood pressure between two groups? The p-value (α) is <0.05 and power (1 – β) of study is 80%. The answer is 160 subjects should be enrolled in each group.Noteworthy to mention that sample calculation methods vary from one study design to another. The above mentioned example is carried out using OpenEpi which itself does not add non-response rate and has to be carried out manually (Steps: OpenEpi → Sample Size → Mean Difference → Enter Data → Calculate) software which is freely available from the internet and does not apply to every research study.9 Study scope, study population, study setting and sampling technique are major considerations for sample size calculation. Professional statistical assistance is always highly suggested.## References

Hickey GL, Grant SW, Dunning J, Siepe M. Statistical primer: sample size and power calculations-why, when and how? Eur J Cardiothorac Surg 2018;54(1):4–9.

Sami W, Alrukban MO, Waqas T, Asad MR, Afzal K. Sample Size Determination In Health Research. J Ayub Med Coll Abbottabad 2018;30(2):308–11.

Weber EJ, Hoo ZH. Why sample size estimates? Emerg Med J 2018;35(12):755–6.

Gupta KK, Attri JP, Singh A, Kaur H, Kaur G. Basic concepts for sample size calculation: Critical step for any clinical trials! Saudi J Anaesth 2016;10(3):328–31.

Das S, Mitra K, Mandal M. Sample size calculation: Basic principles. Indian J Anaesth 2016;60(9):652–6.

Banerjee A, Chitnis UB, Jadhav SL, Bhawalkar JS, Chaudhury S. Hypothesis testing, type I and type II errors. Ind Psychiatry J 2009;18(2):127–31.

Yarandi HN. Hypothesis testing. Clin Nurse Spec 1996;10(4):186–8.

Kadam P, Bhalerao S. Sample size calculation. Int J Ayurveda Res. 2010;1(1):55–7.

Lê HG, Kang JM, Moe M, Jun H, Thái TL, Lee J, et al. Genetic polymorphism and natural selection of circumsporozoite surface protein in Plasmodium falciparum field isolates from Myanmar. Malar J 2018;17(1):361.