Project Archive

This page contains all the past projects and collaborations at the UC Davis DataLab. These include major grant-funded partnerships and smaller exploratory projects.

If you would like to see our current projects, please look at our projects page.

  • BML

    Activating Ocean Acidification Research

    DataLab is partnering with Professor Tessa Hill at the Bodega Marine Laboratory to support efforts to develop an oceanographic synthesis of publicly available datasets to understand how changing ocean conditions will impact coastal habitats along the western USA. This effort aims to provide detailed interpretations of how ocean acidification and related stressors can be managed by decision makers along the U.S. West Coast.

    Read More>>

  • Two volunteer surveyors stand on a sandy beach with clipboards and look out over the pacific ocean


    In recent years, public participation in data collection has made a variety of new data sets available. These community and citizen science efforts not only provide increased opportunity to generate novel insights, but also engage the public in the research process and grow widespread appreciation for local systems. For instance, through the MPA Watch program, over 1,500 volunteer surveyors played a crucial role in collecting data about California’s marine protected areas (MPAs). These data help California policymakers conserve and manage the state’s ocean and coastal habitats and improve our understanding of how MPAs are used by the public, what ecosystem services they provide, and their effectiveness in providing healthy habitats for wildlife.

    Read More>>

  • Arch-V


    Archive-Vision (archv or arch-v) is a collection of computer vision programs written in C++ which utilizes functions from the OpenCV library to perform analysis on large image sets. The primary function is to locate recurring patterns within each image in a set of images. Arch-v locates features from a given seed image within an imageset and outputs the image(s) with the most similarities. The first program, processImages.cpp, generates text files containing the keypoints and their mathmatical descriptors; with the keypoints, analysis can be done to compare images and find matches. The second program, scanDatabase.cpp, finds the images that are most similar to a given seed image. The third program, drawMatches.cpp, compares two images, locates their matches based on homography, then draws the keypoints and their relative match; this is most useful when the best matches have already been found.

    Read More>>

  • Datalab-image

    Assessing data on services utilization of children with Autism

    Lack of access to combined mental health, educational and developmental disabilities services data limits our ability to understand how essential services provided by these systems can affect outcomes for children. While limited research to date suggests that services in one sector may affect utilization in another, identifying cross system patterns of care that lead to better outcomes for children with Autism Spectrum Disorder (ASD) is even more complex due to differences in classification processes and eligibility definitions. In particular, thus far neither researchers nor community agencies have leveraged educational outcomes, which are key for all children, including those with ASD, to help understand how we can improve coordinated care for children with complex mental health concerns.

    Read More>>

  • aspect_author_plot_no_labels

    Assessing Impact of Outreach through Software Citation in Geodynamics

    The Computational Infrastructure for Geodynamics is a community of software users and user-developers who model physical processes in the Earth and planetary interiors. From 2010-2018, the community of researchers published upward of 638 peer reviewed papers in more than 124 venues. We analyzed this corpus of publications to understand the impact of CIG workshops and tutorials, measured through software citation. We automated article analysis using text extraction and tokenization techniques. Patterns in co-mentioned software suggest that usage for some tools cross-cuts many domains.

    Read More>>

  • Shields


    BIBFLOW is a two-year project that is funded by the Institute of Museum and Library Services. The purpose of this project is to investigate the future of library services that can include cataloging and related workflows, new data models, and new encoding and exchange formats. At the end of the two-year time table, there will be a roadmap for the academic and library communities that would serve as a guide for the changes that are occurring in academia.

    Read More>>

  • Monarch butterfly (Jim Hudgins/USFWS)

    Bibliographic Approach to the Role of Science in Policy Making

    The U.S. National Environmental Policy Act (NEPA) mandates every federal agency to analyze and document the potential environmental impacts of their proposed projects using the best available science. Documents produced in compliance with NEPA thus provide a unique opportunity to evaluate the extent that science informs government decision-making. DataLab worked with researchers in Environmental Science and Policy to help quantify and characterize the use of science and federal agencies’ environmental assessments.

    Read More>>

  • Book_PublicLibraries

    Cartography for Publications

    DataLab’s geospatial team led by Geospatial Data Specialist, Michele Tobias, collaborates with researchers to produce publication-quality map graphics for inclusion in journal articles and books. Below are recent map collaborations with topics in general terms.

    Read More>>

  • library_of_congress

    Chronicling the rise of “creativity”

    The Creativeness Digital Scholarship Group (CDSG) is composed of a team of researchers uncovering and exploring the forgotten sources, meanings, and social worlds of creativeness prior to the meteoric rise of a scientific “creativity” in the 1970s. The CDSG’s focuses on applying a range of Natural Language Processing and Machine Learning techniques to perform an archaeology of discourses of creativeness and related concepts, unearthing new finds, making new connections, and interpreting its cultural and political relevance for the time period in which they were embedded. Most of our sources are from the post-Civil War period to the end of the Space Race, roughly the century between 1870-1970. This was a period in which the noun “creativity” rarely appeared and took its current form only toward this century’s end, especially during the 1950s.

    Read More>>

  • Capture_Cities

    City General Plans Topic Modeling & Mapping

    This project was a collaboration between Catherine Brinkley, professor in the UC Davis Department of Human Ecology, and DataLab. Catherine sought to understand the general plan documents for the cities in the state of California through topic modeling. Catherine and her team assembled the many general plan documents and DataLab staff performed a topic modeling analysis on the text of the documents and joined the resulting table to a spatial vector data containing city boundaries to allow the dataset to be easily mapping in a GIS.

    Read More>>

  • kellog_interactive_scrot

    Creating Co-Author Networks in R

    A co-author network is a great way to get a snapshot view of the breadth and depth of an individual’s body of research. I created such graphs and corresponding visualizations to highlight and celebrate the work of UC Davis scholars.

    In this post I will describe the packages I used to do this, common roadblocks and ways around them. I will highlight the use of interactive and dynamic co-author networks, which are especially useful for visualizing large co-author networks. I will assume some familiarity with R, and experience working with data structures likes lists and vectors, but no prior familiarity with packages for working with networks.

    Read More>>

  • Grass Valley Fire Districts Map from 1908

    Digitized Maps Demonstration

    Special Collections is undertaking a project to identify and digitize unique maps in our collection. Library volunteer, Scott Sibbett, is working with Map Assitant, Dawn Collings, to identify which of the library’s holdings are unique in the University of California system. The pilot focuses on out-of-copyright maps. After the list list of maps is complete, high quality scanning will begin, starting with the smaller maps that can be scanned on our existing scanning equipment, followed by larger maps that will be scanned off site.

    Read More>>

  • bioportal_2

    Disease BioPortal

    The Disease BioPortal dashboard provides data to researchers, veterinarians, and farmers interested in tracking and analyzing disease outbreaks in livestock. Currently, researchers at BioPortal are interested in expanding the data they collect and provide through their platform, particularly with a view toward making predictive assessments of outbreak events.  The DataLab worked with project partners Beatriz Martinez (Vet Medicine) and Xin Liu (Computer Science) to incorporate two new capabilities into BioPortal: the first, regularly updated weather data for selected geographies to check for potentially outbreak-inducing weather conditions, and the second, live monitoring of social media posts to watch for early warnings of developing outbreaks.

    Read More>>

  • 2009-5602

    English Short Title Catalogue

    This project was originally intended to create a, “machine-readable catalogue of books, pamphlets and other ephemeral material printed in English-speaking countries from 1701-1800.”

    Read More>>

  • wine_featured_image

    Extracting wine price data from historical catalogs

    This project is a collaboration between the DataLab and UC Davis Library funded by the Sloan Foundation to extract historical price data from an archive of wine catalogs published by Sherry Lehmann. The primary goal of the project was to create a database of historical price information that could help wine economists study wine markets over time. Secondary goals included the development of open-source table-extraction software for images built upon the Rtesseract package (an R interface to the tesseract OCR – Optical Character Recognition – system), and hosting hackathons promoting authentic data science skills for UC Davis students. 

    Read More>>

  • Brinkley-story-magnified2-Oct-2017-02-960x600-c-center

    Gender and Citation Disparities

    Leveraging bibliometrics to measure the impact of scholarly publications and explore under-representation and attribution in science. Citation counts help a research community understand the importance of a given scholarly work. But, implicit bias can affect how researchers cite one another. By employing bibliometrics and text mining, we aided researchers in the social sciences to explore the disparity between citation counts and scholarly influence for two pivotal case studies: Rachel Carson’s Silent Spring and Jane Jacobs’ The Life and Death of the Great American City.

    Read More>>

  • collector_summary

    Identifying minimum infrastructure needs for comfortable bicycling

    We implemented Bayesian models with random effects to determine which features of streets and individuals had the strongest relationships with comfort ratings. Not surprisingly, we found a mix of street-level and individual characteristics to be important predictors. We found random effects to be important for controlling for individual tendencies to rank low or high, and for interactions between street-level variables that we couldn’t put explicitly in our models.

    Read More>>

  • Capture_InternetNewsArchive

    Immigration in the Media: TV News Archive Scraping

    This project was a collaboration with Professor Caitlin Patler and postdoctoral scholar Robin Savinar in the UC Davis Department of Sociology and DataLab to scrape the Internet Archive’s TV News database for metadata on TV news programs with keywords related to immigration. While the Internet Archive has an API for searching some parts of their databases, at the time of this research and publication of this story, there was no way to use the available APIs to search the transcripts of the news stories. We solved this problem by using traditional webscraping methods to first search the captions database and then scrape the needed metadata from the links appearing in the search results. We combined the scraped metadata results with data from the FCC about station locations to assess the possibility of mapping the results and determined that a local TV schedule data would be needed to complete the mapping in a comprehensive way.

    Read More>>

  • Aerial view Aerial view at 2500 feet looking south of the Dutch Slough Tidal Marsh Restoration Project the construction site, in the Sacramento-San Joaquin Delta near Oakley, California. 
The restoration project implemented by the California Department of Water Resources will restore 1,187 acres into a tidal marsh to provide habitat for salmon and other native fish and wildlife. Photo taken March 08, 2018.
Ken James/ California Department of Water Resources, FOR EDITORIAL USE ONLY
California Department of Water Resources
Public Domain
20 Mar 2019
20 Mar 2019
Date Taken
08 Mar 2019
Image Size
5568 x 3712 / 10.93MB

    Informatics for CA Water Data

    This collaboration between DataLab and researchers in UC Davis’ Environmental Science and Policy department is establishing data management workflows to develop and implement a database architecture that can be used to assemble water data at different levels of aggregation, extend to new datasets, visualize and map data in different ways for policy stakeholders, and eventually become available to other researchers and government agencies. This Start-Up project focuses on sustainable groundwater management datasets, specifically the 2014 Sustainable Groundwater Management Act (SGMA) in California.

    Read More>>

  • bikes and buses on campus

    Micromobility impacts on personal automobile usage

    North American cities are experiencing a rising demand for bicycling and bike share services, especially for dockless bike and scooter shares (also known as micromobility services). Because micromobility services can reduce the use of vehicles in cities, they have great potential to reduce emissions, increase accessibility, positively affect the climate and human health, and ensure equity, among many other benefits. By quantifying the scale of micromobility service use, cities and regions can better understand the widespread effects of ensuring access to such services in their communities.

    Read More>>

  • netreport

    The DataLab Network Report

    The DataLab Network Report takes in a network dataset, and generates common network metrics, along with accompanying interactive visualizations. Each measure includes an explanation of how it is generated and what it looks like in the context of the specific network used for the report. In the BIS2A network, we quickly found that one learning concept, an exercise of walking through the energy story of a reaction, was central to both student and expert understanding of the course materials.

    Read More>>

  • The Pioneering Punjabis Digital Archive

    The Pioneering Punjabis Digital Archive ( offers a window into the story of South Asian immigrants from the Punjab region in north India to California since the turn of the twentieth century. Explore over 700 video interviews, speeches, diaries, photographs, articles, and letters in which Punjabi Americans share their life stories, values, and contributions to California’s history over the last hundred and twenty years.

    Read More>>

  • WaltWhitman2

    Places in Walt Whitman

    Merging text mining and the geospatial sciences to map the poetry of Walt Whitman. The American poet Walt Whitman worked during the period of transition from transcendentalism to realism and, due to this, many of his writings are rooted in physical spaces. Uncovering those spatial relationships provides another lens by which to understand American literature. This project used text mining to extract all locations mentioned in Whitman’s works, which were then assembled into a visual map for further exploration.

    Read More>>

  • Play the Knave

    Play the Knave Modlab

    The project, in coordination with the DSI, involves the creation of a gaming environment in which students recreate scenes from many works of Shakespeare. With this project, movement and vocal data are gathered as participants act out a given scene. From here, the data is taken and created into a video of the production and can be shared with others. This is an exploratory project in which the researchers are trying to not only bring about a better understanding of Shakespeare’s works but also recognizing speech and movement patterns.

    Read More>>

  • Screen Shot 2017-01-24 at 2.03.14 PM

    Social Networks of Citation

    Tracing scholarly influence in medicine. The purpose of this project was to create a peer network of all publications and collaborations that span from a single faculty member. Through mining med-lined data, the network was successfully created.

    Read More>>

  • STEM

    STEM Portal

    UC Davis is world renowned for its teaching and research in STEM (Science, Technology, Engineering, and Math), and in 2016 Forbes magazine ranked UC Davis as the “best value college for women in STEM.” Through a combination of undergraduate and graduate experimental research education opportunities, DataLab collaborated with UC Davis STEM Strategies to leverage data science tools and techniques to better understand these strengths and share them with various stakeholder communities.

    Read More>>

  • COVID phases plot

    UC Davis Medical Center COVID models

    The UC Davis DataLab has been worked with the UC Davis Medical Center (UCDMC) to improve their models for predicting COVID-19. With a new surge of admissions underway caused by the Delta variant, accurate prediction of bed occupancy and admissions are important for planning and resource allocation amid changing conditions. The latest iteration extends the horizon of predictions from two days to seven and has reduced the error rate of predictions for admissions by 8%.

    Read More>>

  • Covid_map

    Unlocking Insights from Public Data: A case study with COVID-19 exposure data

    To help our community further engage with public data and to illustrate how data science workflows and tools can be combined to create  impactful visualizations, the DataLab team focused on a dataset that has not yet been visualized – the AB 685 COVID-19 exposure data.

    Read More>>

  • Understanding...

    Vineyard Research Cartography

    California’s topography, sustainability, and climate diversity contribute to its worldwide renown in viticulture and enable researchers at UC Davis to closely examine the state’s numerous wine varieties and the conditions of their production. In a series of such projects, chemical engineer Ron C. Runnebaum and his team examine the chemical and sensorial properties of west coast-grown Pinot noirs and how they are affected by the coast’s varied topography and climate. To help readers better understand the geographic scale of this research, DataLab created a set of maps portraying the vineyard study sites for three of these papers.

    Read More>>

  • water review countries dendrogram


    wateReview is an interactive, inclusive, and collaboratively designed platform that provides a comprehensive look at the landscape of water research in Latin America and the Caribbean (LAC). DataLab initially assisted the LAWR researchers as part of our start-up project program. The collaboration expanded, and DataLab staff assisted with expanding the literature review, performing text analysis on the corpus of research papers, and visualizing the results in a publicly accessible way. 

    Read More>>