Chapter 5 Statistical inference

Why did estimates diverge from the true parameters in the previous examples? Why did estimated treatment effect differed from the true treatment effect we used to generate the data? This brings us to the important concept in statistical analysis: a difference between the true parameter in the population and the sample estimate. Population refers to all members of specified group, whereas sample contains only a part, or a subset of a population from which it is taken. To generate the data in simulations, we have assumed DGP in the population, out of which a single or multiples samples were drawn.

The mathematical notation to differentiate true population parameters and sample estimates, or statistics involve using Greek letters to represent true population parameters, and Latin letters to represent sample estimates. For example, \(\mu\) stands for true population mean, while \(\bar{x}\) stand for sample mean, where bar symbol (" \(\bar{}\) “) indicates mean. When it comes to standard deviation, \(\sigma\) stands for true population parameter, while SD stands for sample estimate. Sometimes the hat symbol (” \(\hat{}\) ") is used to denote estimator. For example, \(\hat{y_i}\) would be a model estimate or prediction of the observed \(y_i\).

The difference between the true population parameter and the sample estimate (assuming correctly identified and represented DGP among other issues such as the use of non-biased estimators for example) is due to the sampling error, that we will covered shortly. In the simulations, we are certain about the true population parameters, but in real life we are uncertain about the true parameters and we need to somehow quantify this uncertainty. This is the objective of statistical inference. In other words, with statistical inference we want to generalize from sample to a population, while taking into account uncertainties of the estimates.

5.1 Two kinds of uncertainty, probability, and statistical inference

In the previous RCT example, by using the Pre-Test score and Group variables, we have estimated predictive performance of the model using Post-test as the target variable. Error (or uncertainty) around individual Post-test score was much bigger than SESOI, which made individual prediction practically useless. This prediction uncertainty decreased as new variables were introduced to the prediction model. With model involving Squat 1RM variable, we have achieved much better predictive performance. This type of uncertainty can be called epistemic uncertainty (145), which is a result of lack of knowledge or incomplete information.

In the Prediction section of this book, I have utilized irreducible error in the DGP to represent stochastic component or the random error. Due to this random error, scores will differ from sample to sample. This type of uncertainty can be called aleatory uncertainty (145), which results due intrinsic randomness. Another flagship example of the aleatory uncertainty is tossing a dice, drawing a card from a shuffled pack, or random sampling that produces the sampling error.

You can argue of course that aleatory uncertainty is ultimately epistemic uncertainty. For example, if I knew infinite details about the dice tossing, I would be able to predict exactly the number that will be landed. Philosophers have been arguing about these issues for ages, and it is not in the domain of this book to delve deeper into the matters of uncertainty.

The theory of statistical inference and statistics in general rests on describing uncertainties by using probability. Since there are two kinds of uncertainty, there are two kinds of probabilities and their meaning. Aleatory uncertainties, like tossing a dice or random sampling, are described using long-frequency definition of probability. For example, it can happen that I toss six for 4 times in a row, but in the long-run, which means infinite number of times, probability of tossing a six is equal to 1/6, or probability of 0.166, or 16.6% (assuming a fair dice of course). Probability viewed from this perspective represents long-frequency number of occurrences of the event of interest, divided by number of total events. For example, if I toss a dice for 1000 times, and if I get number six for 170 times, the probability of tossing a six is equal to 170 / 1000, or 17%.

With epistemic uncertainty, probability of a proposition simply represents a degree-of-belief in the truth of that proposition (145). The degree-of-belief interpretation of probability is referred to as subjective probability or personal probability, while long-frequency interpretation of probability is referred to as objective probability. There are two major schools of statistical inference leaning either on long-frequency interpretation of probability, called frequentist, or leaning on degree-of-belief interpretation of probability, called Bayesian. There are of course many nuances and other schools of statistical inference (41,43,56) which are beyond the scope of this book. The additional approach to inference that will be considered in this book as a preferred approach is the bootstrap (43,79,159,160). But more about it later.

To better explain the differences between the frequentist and Bayesian approach to statistical inference I am going to use known male mean height and SD in the population. The population parameter of interest is the population mean (Greek \(\mu\)) estimated using the sample mean (\(\bar{x}\)).

References

41. Dienes, Z. Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference. 2008 edition. New York: Red Globe Press, 2008.

43. Efron, B and Hastie, T. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. 1 edition. New York, NY: Cambridge University Press, 2016.

56. Gelman, A and Hennig, C. Beyond subjective and objective in statistics. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180: 967–1033, 2017.

79. Hesterberg, TC. What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum. The American Statistician 69: 371–386, 2015.

145. O’Hagan, T. Dicing with the unknown. Significance 1: 132–133, 2004.

159. Rousselet, GA, Pernet, CR, and Wilcox, RR. A practical introduction to the bootstrap: A versatile method to make inferences by using data-driven simulations., 2019.

160. Rousselet, GA, Pernet, CR, and Wilcox, RR. The percentile bootstrap: A teaser with step-by-step instructions in R., 2019.