Chapter 1 Introduction

The real world is very complex and uncertain. In order to help in understanding it and to predict its behavior, we create maps and models (146,188). One such tool are statistical models, representing a simplification of the complex and ultimately uncertain reality, in the hope of describing it, understanding it, predicting its behavior, and help in making decisions and interventions (76,115,130,150). In the outstanding statistics book “Statistical Rethinking” (130), the author stresses the distinction between Large World and Small World, described initially by Leonard Savage (16,58,174):

"All statistical modeling has these same two frames: the small world of the model itself and the large world we hope to deploy the model in. Navigating between these two worlds remains a central challenge of statistical modeling. The challenge is aggravated by forgetting the distinction.

The small world is the self-contained logical world of the model. Within the small world, all possibilities are nominated. There are no pure surprises, like the existence of a huge continent between Europe and Asia. Within the small world of the model, it is important to be able to verify the model’s logic, making sure that it performs as expected under favorable assumptions. Bayesian models have some advantages in this regard, as they have reasonable claims to optimality: No alternative model could make better use of the information in the data and support better decisions, assuming the small world is an accurate description of the real world.

The large world is the broader context in which one deploys a model. In the large world, there may be events that were not imagined in the small world. Moreover, the model is always an incomplete representation of the large world, and so will make mistakes, even if all kinds of events have been properly nominated. The logical consistency of a model in the small world is no guarantee that it will be optimal in the large world. But it is certainly a warm comfort."

Creating “Small Worlds” relies heavily on making and accepting numerous assumptions, both known and unknown, as well as prior expert knowledge, which is ultimately incomplete and fallible. Because all statistical models require subjective choices (56), there is no objective approach to make “Large World” inferences. It means that it must be us who make the inference, and claims about the “Large World” will always be uncertain. Additionally, we should treat statistical models and statistical results as being much more incomplete and uncertain than the current norm (8).

We must accept the pluralism of statistical models and models in general (133,134), move beyond subjective-objective dichotomy by replacing it with virtues such as transparency, consensus, impartiality, correspondence to observable reality, awareness of multiple perspectives, awareness of context-dependence, and investigation of stability (56). Finally, we need to accept that we must act based on cumulative knowledge rather than solely rely on single studies or even single lines of research (8).

This discussion is the topic of epistemology, scientific inference, and philosophy of science, thus far beyond the scope of the present book (and the author). Nonetheless, it was essential to convey that statistical modeling is a process of creating the “Small Worlds” and deploying it in the “Large World”. There are three main classes of tasks that the statistical model is hoping to achieve: description, prediction, and causal inference (76).

The following example will help in differentiating between these three classes of tasks. Consider a king who is facing a drought who must decide whether to invest resources in rain dances. The queen, upon seeing some rain clouds in the sky, must decide on whether to carry her umbrella or not. Young prince, who likes to gamble during his hunting sessions, is interested in knowing what region of his father’s vast Kingdom receives the most rain. All three would benefit from an empirical study of rain, but they have different requirements of the statistical model. The king requires causality: Do rain dances cause rain? The queen requires prediction: Does it look likely enough to rain for me to ask my servants to get my umbrella? The prince requires simple quantitative summary description: have I put my bets on the correct region?

The following sections will provide an overview of the three classes of tasks in the statistical modeling. Data can be classified as being on one of four scales: nominal, ordinal, interval or ratio and description, prediction and causal techniques differ depending on the scales utilized. For the sake of simplicity and big picture overview, only examples using ratio scale are to be considered in this book.

References

8. Amrhein, V, Trafimow, D, and Greenland, S. Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician 73: 262–270, 2019.

16. Binmore, K. Rational Decisions. Fourth Impression edition. Princeton, NJ: Princeton University Press, 2011.

56. Gelman, A and Hennig, C. Beyond subjective and objective in statistics. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180: 967–1033, 2017.

58. Gigerenzer, G, Hertwig, R, and Pachur, T. Heuristics: The Foundations of Adaptive Behavior. Reprint edition. Oxford University Press, 2015.

76. Hernán, MA, Hsu, J, and Healy, B. A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks. CHANCE 32: 42–49, 2019.

115. Lang, KM, Sweet, SJ, and Grandfield, EM. Getting beyond the Null: Statistical Modeling as an Alternative Framework for Inference in Developmental Science. Research in Human Development 14: 287–304, 2017.

130. McElreath, R. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 1 edition. Boca Raton: Chapman and Hall/CRC, 2015.

133. Mitchell, S. Unsimple truths: Science, complexity, and policy. paperback ed. Chicago, Mich.: The Univ. of Chicago Press, 2012.

134. Mitchell, SD. Integrative Pluralism. Biology & Philosophy 17: 55–70, 2002.

146. Page, SE. The Model Thinker: What You Need to Know to Make Data Work for You. Basic Books, 2018.

150. Pearl, J and Mackenzie, D. The Book of Why: The New Science of Cause and Effect. 1 edition. New York: Basic Books, 2018.

174. Savage, LJ. The Foundations of Statistics. 2nd Revised ed. edition. New York: Dover Publications, 1972.

188. Weinberg, G and McCann, L. Super thinking: The big book of mental models. New York: Portfolio/Penguin, 2019.