From the DataLab

Newsletter Archives

DataLab News

Data Challenges as Opportunities for Experiential learning: Reflection on DataLab’s CA Election 2020 Data Challenge

As a case study to explore how intentionally organized data challenges can serve as opportunities for short-format experiential learning, we discuss our experience organizing a month-long data challenge sponsored by the UC Davis DataLab: Data Science and Informatics department and the Scholars Strategy Network that coincided with the November 2020 election.

Call for 2021 Start-Up Research Project Collaborations

UC Davis DataLab: Data Science and Informatics is accepting applications from UC Davis Faculty and professional researchers for Start-Up Project Collaborations for the 2021 academic year. These exploratory, or early phase, research projects pair domain area researchers with DataLab’s data scientists in order to test basic hypotheses related to data-driven domain problems. This represents a […]

Data Driven Transit Report

A new report co-authored by former DataLab Postdoctoral Scholar Jane Carlen illuminates the factors that impact bicycling comfort. Read the full report here. In this study, researchers use survey data to analyze bicycling comfort and its relationship with socio-demographics, bicycling attitudes, and bicycling behavior. An existing survey of students, faculty, and staff at UC Davis […]

Data Challenge Winners

With the presentation of the showcase (recording available here), the California Election Data Challenge 2020 has concluded! We want to thank everyone who participated and volunteered their time to the project. Congratulations go out to the three winning teams, install.packages(“tidywitches”), MissDemeanors, and Catch-22, with honorable mention going to teams Dialysis Analysis and Wobbler Costs. You […]

CA Election 2020 Data Challenge

In collaboration with the Scholar Strategy Network we are launching the California Election 2020 Data Challenge leveraging data science and public data to help us understand this year’s ballot initiatives. All members of the UC Davis community can participate; students and postdoctoral scholars are eligible to win up to $500 awards. About the Challenge: Participants […]

Informatics for CA Water Data

Establishing data management workflows to develop and implement a database architecture for Sustainable Groundwater Management data from multiple geographies and organizations. Data fragmentation is one of the most challenging aspects of water governance and research. Data about water management organizations, infrastructure projects, permits, hydrological features, water supply, and water quality are collected via different systems, […]

Bibliographic approach to the role of science in policy making

Tracing citations in U.S. National Environmental Policy Act compliant reports and role of science in decision-making. Although science-informed policymaking is frequently touted as a solution to policy design and implementation dilemmas (e.g., Howlett 2009; Cairney 2016; Parkhurst 2017) there are few empirical studies of how scientific information informs policy making (Desmarais and Hird 2014; Newman et […]

Assessing data on services utilization of children with Autism

Harmonizing data to help identify care improvement targets for children with complex issues such as Autism. Lack of access to combined mental health, educational and developmental disabilities services data limits our ability to understand how essential services provided by these systems can affect outcomes for children. While limited research to date suggests that services in […]

Identifying minimum infrastructure needs for comfortable bicycling

We analyzed transportation survey data from the UC Davis community in which individuals were asked to rate their comfort level biking on certain streets based on 10-second videos of those streets. We implemented Bayesian models with random effects to determine which features of streets and individuals had the strongest relationships with comfort ratings. Not surprisingly, […]

Creating Co-Author Networks in R

A co-author network is a great way to get a snapshot view of the breadth and depth of an individual’s body of research. I created such graphs and corresponding visualizations to highlight and celebrate the work of UC Davis scholars. In this post I will describe the packages I used to do this, common roadblocks […]


Archive-Vision (archv or arch-v) is a collection of computer vision programs written in C++ which utilizes functions from the OpenCV library to perform analysis on large image sets. The primary function is to locate recurring patterns within each image in a set of images. Arch-v locates features from a given seed image within an imageset […]

Digitizing American Viticultural Areas (AVAs)

Collaborative project mapping wine regions for environmental sciences, history and economics of American viticulture research applications. DataLab, in conjunction with UCSB, Virginia Tech, other partner organizations, and contributions from the general public, are creating a publicly accessible geospatial version American Viticultural Areas boundaries. Using the text descriptions from the ATPF Code of regulations, we are […]

Assessing Impact of Outreach through Software Citation in Geodynamics

The Computational Infrastructure for Geodynamics is a community of software users and user-developers who model physical processes in the Earth and planetary interiors. From 2010-2018, the community of researchers published upward of 638 peer reviewed papers in more than 124 venues. We analyzed this corpus of publications to understand the impact of CIG workshops and […]

Social Networks of Citation

Tracing scholarly influence in medicine. The purpose of this project was to create a peer network of all publications and collaborations that span from a single faculty member. Through mining med-lined data, the network was successfully created. Project partners: Richard Kravitz (Researcher), Bruce Abbott (Health Sciences Librarian), Ranjodh Dhaliwal (Graduate Researcher) Facebook0Tweet0LinkedIn0

English Short Title Catalogue

This project was originally intended to create a, “machine-readable catalogue of books, pamphlets and other ephemeral material printed in English-speaking countries from 1701-1800.” Project partners: Brian Geiger (Principal Investigator), Luis Baquera (Principal Investigator), Nick Laiacona (Principal Investigator) Facebook0Tweet0LinkedIn0

Places in Walt Whitman

Merging text mining and the geospatial sciences to map the poetry of Walt Whitman. The American poet Walt Whitman worked during the period of transition from transcendentalism to realism and, due to this, many of his writings are rooted in physical spaces. Uncovering those spatial relationships provides another lens by which to understand American literature. […]

Predicting Length of Hospital Stays

One of the most significant problems that hospitals across the country are facing at the moment is the prediction of how long each patient will remain in said hospital. This project is attempting to build a better predictive model by taking into account both quantitative and qualitative data from hospitals. The main source of information […]

Gender and Citation Disparities

Leveraging bibliometrics to measure the impact of scholarly publications and explore under-representation and attribution in science. Citation counts help a research community understand the importance of a given scholarly work. But, implicit bias can affect how researchers cite one another. By employing bibliometrics and text mining, we aided researchers in the social sciences to explore […]


BIBFLOW is a two-year project that is funded by the Institute of Museum and Library Services. The purpose of this project is to investigate the future of library services that can include cataloging and related workflows, new data models, and new encoding and exchange formats. At the end of the two-year time table, there will […]

Play the Knave Modlab

The project, in coordination with the DSI, involves the creation of a gaming environment in which students recreate scenes from many works of Shakespeare. With this project, movement and vocal data are gathered as participants act out a given scene. From here, the data is taken and created into a video of the production and […]

The Pioneering Punjabis Digital Archive

The Pioneering Punjabis Digital Archive ( offers a window into the story of South Asian immigrants from the Punjab region in north India to California since the turn of the twentieth century. Explore over 700 video interviews, speeches, diaries, photographs, articles, and letters in which Punjabi Americans share their life stories, values, and contributions to […]