Loading Events

« All Events

Workshop – Machine Learning in R: Clustering and Classification (Part 2 of 2)

May 30 @ 10:00 am - 12:00 pm

Register now!

This two-part workshop series provides an introduction to using R for two popular machine learning techniques: clustering and classification.  Clustering involves identifying groups of similar observations (called clusters) within data. Clustering can be an effective tool for finding patterns and an important part of exploratory data analysis. Classification refers to modeling categorical variables. Classification models can provide insight into the relationship between the predictors and response, as well as a way to make predictions about new observations.

Sessions in Spring 2024 are 10 AM – 12 PM on May 23 and 30. This registration reserves your spot for both sessions. In the first session (May 23) we’ll begin with the advantages and disadvantages of several popular algorithms for clustering, and work through examples of how to run clustering algorithms in R. In the second session (May 30) we’ll provide an overview of popular classification models, and then delve into the details of actually using them. We’ll cover how to choose a model, how to partition data into training and test sets, how to use cross-validation to tune model hyperparameters, and how to evaluate the performance of models in R. We’ll also explain some strategies you can use to improve model performance. This series concludes with a brief discussion of the machine learning landscape and how you can continue to learn more about machine learning and its application it to your research.

Register to join in person or via broadcast on Zoom.

After this workshop series, learners should be able to:

  • Assess whether classification or clustering are relevant to their research problems and data sets;
  • Explain the tradeoffs between popular clustering algorithms;
  • Run a clustering algorithm on their data;
  • Build and train a classification model on their data;
  • Use cross-validation to estimate accuracy and tune hyperparameters for classification models;
  • Identify strategies to improve results from classification models.



This workshop is designed for researchers who have data that they are already working with in R. Participants must have taken DataLab’s “Overview of Statistical Machine Learning,” “R Basics,” and “Regression in R” workshop series, or have equivalent prior experience. Completion of DataLab’s “Intermediate R” series is recommended but not required. Participants must be comfortable with basic R syntax, and have the latest version of R pre-installed and running on their laptops. The focus of the workshop is on implementing clustering and classification in R, and not on learning the R language itself. Bring your laptop with the latest version of R and RStudio.

Can’t make it to this training? Check out upcoming workshop schedule. Recordings of prior workshops are also available in DataLab’s training archive.



May 30
10:00 am - 12:00 pm
Event Tags:
, , , , , ,


Shields Library, room 360


DataLab: Data Science and Informatics (DSI)