Data Analysis Collaboratory 2-Week Summer Workshop
This two-week hands-on workshop is for researchers with already functioning data analysis projects who want to “level up” their project to take advantage of bigger resources (High Performance Computing and/or the cloud), use more automation, improve their reproducibility, expand their output formats and visualizations, conduct parameter sweeps, add more data sets, and/or otherwise work to improve and extend their current work. Attendees should bring their own “pile of scripts,” workflows, large data sets, etc. This interactive workshop is intended to help you make progress on your own research!
Week one is a guided tour that will get everyone’s current Linux/R/Python-based workflows running on the UC Davis Farm compute cluster. The second week is a collaborative research computing hackyfest to level up those workflows by improving their speed, results, scalability, reproducibility, and validity.
This in-person workshop is free and open to all researchers, including graduate students. Researchers external to UCD are welcome as well! Space is limited, and DataLab Affiliates will be given priority in case of space limitations. Please note that remote participation is not available. Any materials will be made openly available after the workshop.
Workshop Information:
When: June 20th-30th, 2023, 9am-5pm each day (except weekends; includes break for lunch).
Where: Shields Library 360 (DataLab), UC Davis main campus.
Cost: Free
Contact: Titus Brown, ctbrown@ucdavis.edu
Recommended Prerequisites:
Prior to attending, learners should:
- be comfortable with the basics of remote computing & ssh (e.g., through Lesson 4 of this workshop series);
- have an already functioning project that makes use of R or Python scripts/notebooks or shell scripts on Linux or Mac OS X computers.
Experience with any other specific technology is not required!
Attendee Information:
- Attendees are expected to attend both weeks of this in person workshop. Events are scheduled from 9-5pm on weekdays, with no meetings on the weekends. There will be a flexibly scheduled break for lunch.
- The workshop will be held on UC Davis campus in the DataLab at Shields Library.
- UC Davis graduate students, postdocs, and faculty are welcome!
- External researchers are also welcome, space permitting!
- We cannot support work limited to closed or sensitive data sets as Farm is not HIPAA compliant; inquire for more information if you think this might pertain to you.
- We particularly welcome pairs of researchers from research groups who are working on related or overlapping projects!
Workshop Content:
Technologies that will be discussed and demonstrated during this workshop include, but are not limited to:
- shell scripting for automation
- conda for software installation
- slurm for HPC job coordination
- git and github for version control
- Python and JupyterLab
- R, RStudio, RMarkdown, Shiny (and maybe Posit)
- snakemake for workflow automation
- the Farm cluster for analysis (Other campus HPC will be supported on case by case basis; inquire for more information)
We are happy to work with any open source software that can be installed on remote Linux systems. Closed source or commercial software will be considered on a case-by-case basis.
Concepts that will be discussed:
- designing computational workflows for enhanced science, better reproducibility, and improved efficiency
- parameter tuning and parameter sweeps
- integrating visualization and cross-validation into your workflows
- parallelism vs. I/O vs. memory intensive workflows
- getting your data ready for machine learning and AI
Click here to apply by end of day on Wed, May 17th, 2023 for full consideration.