UC Davis is world renowned for its teaching and research in STEM (Science, Technology, Engineering, and Math), and in 2016 Forbes magazine ranked UC Davis as the “best value college for women in STEM.” Through a combination of undergraduate and graduate experimental research education opportunities, DataLab collaborated with UC Davis STEM Strategies to leverage data science tools and techniques to better understand these strengths and share them with various stakeholder communities. By webscraping and applying machine learning (specifically Natural Language Processing), we programmatically analyzed the language surrounding STEM on the UC Davis STEM Portal, a discovery tool and promotional resource for all STEM related activities at UC Davis. We compared the language from STEM Portal with STEM-identified content on other UC Davis web pages using topic modeling, an established and powerful tool for understanding the content of large collections of text collections, to identify potential opportunities for diversifying our STEM offerings and improving their equible discovery.
This project began as part of a course-based undergraduate research experience (CURE) taught by DataLab in Fall 2019. CUREs provides real world research opportunities to undergraduate students as an integral part of their coursework. The enrolled class of first year students were challenged to design and pursue a research question, gain and demonstrate technical skills in the computational data sciences (with a focus on applied skills of webscraping, data cleaning, text mining, and data visualization), and worked together to explore and communicate their findings. These pilot results were further pursued and refined by a graduate student researcher, Qiusi (Lyra) Sun, in summer-fall of 2020 to provide actionable insights to UC Davis STEM Strategies.