Model Validation for Applied Data Science
November 19 @ 12:00 pm - 2:00 pm
In this workshop, we will discuss the basics of creating, comparing, and validating predictive models using a case study from the health sciences. We will demonstrate categorical prediction with logistic regression, and numerical predictions with a regression tree approach. We will calculate measurements of accuracy that are applicable to the different types of models, and use cross-validation to find the model parameters that generate the best predictions. Finally, we will interpret the results for insights about the real-world process being modeled. While this workshop features working with health data, the conceptual framework and principles discussed should be generalizable to research in other domains.
– Fit a logistic regression model
– Fit a random forest model
– Use cross-validation to tune model parameters
– Estimate the accuracy of predictions for future data
– Interpret model parameters
This workshop is open to learners at all levels, but prior experience with R is required in order to fully participate in this interactive, hands-on workshop.
Please follow the DataLab install guides (https://datalab.ucdavis.edu/install-guide/) to install R and RStudio before the workshop. DataLab office hours are held via Zoom and in-person on Wednesdays from 1:30pm–3:00pm. Prior to the workshop, drop by office hours if you need help troubleshooting the installations. See the https://datalab.ucdavis.edu/office-hours/ for details.
Instructors: Wesley Brooks, Vladimir Filkov
Wesley Brooks holds a Statistics Ph.D. from the University of Wisconsin. He works at the DataLab as a Data Scientist.
Vladimir Filkov is a Professor of Computer Science and DataLab’s director for translational data science and leads the Health Data Science and Systems research and learning cluster.
Location: Zoom. Please register to receive Zoom link.
Cost: Free of charge.