Welcome

Cover image

The aim of this book is to provide an overview of the three classes of tasks in the statistical modeling: description, prediction and causal inference (76). Statistical inference is often required for all three tasks. Short introduction to frequentist null-hypothesis testing, Bayesian estimation and bootstrap are provided. Special attention is given to the practical significance with the introduction of magnitude-based estimators and statistical inference by using the concept of smallest effect size of interest (SESOI). Measurement error is discussed with the particular aim of interpreting individual change scores. In the second part of this book, common sports science problems are introduced and analyzed with the bmbstats package.

This book, as well as the bmbstats package are in active open-source development. Please be free to contribute pull request at GitHub when you spot an issue or have an improvement idea. I am hoping both this book and the bmbstats package to be collaborative tools that can help both up-and-coming as well as experienced researchers and sports scientists.

R and R packages

This book is fully reproducible and was written in R (154) and the R-packages automatic (117), bayestestR (128), bmbstats (97), bookdown (205), boot (39), carData (52), caret (109), cowplot (201), directlabels (80), dorem (96), dplyr (194), effects (4951), forcats (192), ggplot2 (190), ggridges (202), ggstance (70), hardhat (184), kableExtra (208), knitr (204), lattice (173), markdown (3), Metrics (64), minerva (1), mlr (17,19,116), mlr3 (116), mlrmbo (19), multilabel (152), nlme (151), openml (32), ParamHelpers (18), pdp (62), psych (156), purrr (69), readr (196), rpart (182), shorts (95), stringr (191), tibble (141), tidyr (195), tidyverse (193), vip (61), visreg (26), and vjsim (98).

License

Creative Commons Licence

This work, as a whole, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The code contained in this book is simultaneously available under the MIT license; this means that you are free to use it in your own packages, as long as you cite the source.

References

1. Albanese, D, Filosi, M, Visintainer, R, Riccadonna, S, Jurman, G, and Furlanello, C. Minerva and minepy: A c engine for the mine suite and its r, python and matlab wrappers. Bioinformatics bts707, 2012.

3. Allaire, J, Horner, J, Xie, Y, Marti, V, and Porte, N. Markdown: Render markdown with the c library ’sundown’. 2019.Available from: https://CRAN.R-project.org/package=markdown

17. Bischl, B, Lang, M, Kotthoff, L, Schiffner, J, Richter, J, Studerus, E, et al. mlr: Machine learning in r. Journal of Machine Learning Research 17: 1–5, 2016.Available from: http://jmlr.org/papers/v17/15-066.html

18. Bischl, B, Lang, M, Richter, J, Bossek, J, Horn, D, and Kerschke, P. ParamHelpers: Helpers for parameters in black-box optimization, tuning and machine learning. 2020.Available from: https://CRAN.R-project.org/package=ParamHelpers

19. Bischl, B, Richter, J, Bossek, J, Horn, D, Thomas, J, and Lang, M. MlrMBO: A modular framework for model-based optimization of expensive black-box functions. arXiv preprint arXiv:170303373, 2017.

26. Breheny, P and Burchett, W. Visualization of regression models using visreg. The R Journal 9: 56–71, 2017.

32. Casalicchio, G, Bossek, J, Lang, M, Kirchhoff, D, Kerschke, P, Hofner, B, et al. OpenML: An r package to connect to the machine learning platform openml. Computational Statistics 1–15, 2017.

39. Davison, AC and Hinkley, DV. Bootstrap methods and their applications. Cambridge: Cambridge University Press, 1997.Available from: http://statwww.epfl.ch/davison/BMA/

49. Fox, J. Effect displays in R for generalised linear models. Journal of Statistical Software 8: 1–27, 2003.Available from: http://www.jstatsoft.org/v08/i15/

51. Fox, J and Weisberg, S. Visualizing fit and lack of fit in complex regression models with predictor effect plots and partial residuals. Journal of Statistical Software 87: 1–27, 2018.Available from: https://www.jstatsoft.org/v087/i09

52. Fox, J, Weisberg, S, and Price, B. CarData: Companion to applied regression data sets. 2019.Available from: https://CRAN.R-project.org/package=carData

61. Greenwell, B, Boehmke, B, and Gray, B. Vip: Variable importance plots. 2020.Available from: https://CRAN.R-project.org/package=vip

62. Greenwell, BM. Pdp: An r package for constructing partial dependence plots. The R Journal 9: 421–436, 2017.Available from: https://journal.r-project.org/archive/2017/RJ-2017-016/index.html

64. Hamner, B and Frasco, M. Metrics: Evaluation metrics for machine learning. 2018.Available from: https://CRAN.R-project.org/package=Metrics

69. Henry, L and Wickham, H. Purrr: Functional programming tools. 2020.Available from: https://CRAN.R-project.org/package=purrr

70. Henry, L, Wickham, H, and Chang, W. Ggstance: Horizontal ’ggplot2’ components. 2020.Available from: https://CRAN.R-project.org/package=ggstance

76. Hernán, MA, Hsu, J, and Healy, B. A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks. CHANCE 32: 42–49, 2019.

80. Hocking, TD. Directlabels: Direct labels for multicolor plots. 2020.Available from: https://CRAN.R-project.org/package=directlabels

95. Jovanovic, M. shorts: Short sprints., 2020.Available from: https://mladenjovanovic.github.io/shorts/

96. Jovanovic, M and Hemingway, BS. dorem: Dose response modeling., 2020.Available from: https://dorem.net

97. Jovanović, M. bmbstats: Bootstrap magnitude-based statistics. Belgrade, Serbia, 2020.Available from: https://github.com/mladenjovanovic/bmbstats

98. Jovanović, M. vjsim: Vertical jump simulator., 2020.Available from: https://mladenjovanovic.github.io/vjsim/

109. Kuhn, M. Caret: Classification and regression training. 2020.Available from: https://CRAN.R-project.org/package=caret

116. Lang, M, Binder, M, Richter, J, Schratz, P, Pfisterer, F, Coors, S, et al. mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software, 2019.Available from: https://joss.theoj.org/papers/10.21105/joss.01903

117. Lang, M, Kotthaus, H, Marwedel, P, Weihs, C, Rahnenfuehrer, J, and Bischl, B. Automatic model selection for high-dimensional survival analysis. Journal of Statistical Computation and Simulation 85: 62–76, 2014.

128. Makowski, D, Ben-Shachar, MS, and Lüdecke, D. BayestestR: Describing effects and their uncertainty, existence and significance within the bayesian framework. Journal of Open Source Software 4: 1541, 2019.Available from: https://joss.theoj.org/papers/10.21105/joss.01541

141. Müller, K and Wickham, H. Tibble: Simple data frames. 2020.Available from: https://CRAN.R-project.org/package=tibble

151. Pinheiro, J, Bates, D, DebRoy, S, Sarkar, D, and R Core Team. nlme: Linear and nonlinear mixed effects models., 2020.Available from: https://CRAN.R-project.org/package=nlme

152. Probst, P, Au, Q, Casalicchio, G, Stachl, C, and Bischl, B. Multilabel classification with r package mlr. arXiv preprint arXiv:170308991, 2017.

154. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2020.Available from: https://www.R-project.org/

156. Revelle, W. Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois: Northwestern University, 2019.Available from: https://CRAN.R-project.org/package=psych

173. Sarkar, D. Lattice: Multivariate data visualization with r. New York: Springer, 2008.Available from: http://lmdvr.r-forge.r-project.org

182. Therneau, T and Atkinson, B. Rpart: Recursive partitioning and regression trees. 2019.Available from: https://CRAN.R-project.org/package=rpart

184. Vaughan, D and Kuhn, M. Hardhat: Construct modeling packages. 2020.Available from: https://CRAN.R-project.org/package=hardhat

190. Wickham, H. Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016.Available from: https://ggplot2.tidyverse.org

191. Wickham, H. Stringr: Simple, consistent wrappers for common string operations. 2019.Available from: https://CRAN.R-project.org/package=stringr

192. Wickham, H. Forcats: Tools for working with categorical variables (factors). 2020.Available from: https://CRAN.R-project.org/package=forcats

193. Wickham, H, Averick, M, Bryan, J, Chang, W, McGowan, LD, François, R, et al. Welcome to the tidyverse. Journal of Open Source Software 4: 1686, 2019.

194. Wickham, H, François, R, Henry, L, and Müller, K. Dplyr: A grammar of data manipulation. 2020.Available from: https://CRAN.R-project.org/package=dplyr

195. Wickham, H and Henry, L. Tidyr: Tidy messy data. 2020.Available from: https://CRAN.R-project.org/package=tidyr

196. Wickham, H, Hester, J, and Francois, R. Readr: Read rectangular text data. 2018.Available from: https://CRAN.R-project.org/package=readr

201. Wilke, CO. Cowplot: Streamlined plot theme and plot annotations for ’ggplot2’. 2019.Available from: https://CRAN.R-project.org/package=cowplot

202. Wilke, CO. Ggridges: Ridgeline plots in ’ggplot2’. 2020.Available from: https://CRAN.R-project.org/package=ggridges

204. Xie, Y. Dynamic documents with R and knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC, 2015.Available from: https://yihui.org/knitr/

205. Xie, Y. Bookdown: Authoring books and technical documents with R markdown. Boca Raton, Florida: Chapman; Hall/CRC, 2016.Available from: https://github.com/rstudio/bookdown

208. Zhu, H. KableExtra: Construct complex table with ’kable’ and pipe syntax. 2019.Available from: https://CRAN.R-project.org/package=kableExtra