DataLab Projects

Current Projects

  • BML

    Activating Ocean Acidification Research

    DataLab is partnering with Professor Tessa Hill at the Bodega Marine Laboratory to support efforts to develop an oceanographic synthesis of publicly available datasets to understand how changing ocean conditions will impact coastal habitats along the western USA. This effort aims to provide detailed interpretations of how ocean acidification and related stressors can be managed by decision makers along the U.S. West Coast.

    Read More>>

  • Datalab-image

    Assessing data on services utilization of children with Autism

    Lack of access to combined mental health, educational and developmental disabilities services data limits our ability to understand how essential services provided by these systems can affect outcomes for children. While limited research to date suggests that services in one sector may affect utilization in another, identifying cross system patterns of care that lead to better outcomes for children with Autism Spectrum Disorder (ASD) is even more complex due to differences in classification processes and eligibility definitions. In particular, thus far neither researchers nor community agencies have leveraged educational outcomes, which are key for all children, including those with ASD, to help understand how we can improve coordinated care for children with complex mental health concerns.

    Read More>>

  • Monarch butterfly (Jim Hudgins/USFWS)

    Bibliographic approach to the role of science in policy making

    Although science-informed policymaking is frequently touted as a solution to policy design and implementation dilemmas (e.g., Howlett 2009; Cairney 2016; Parkhurst 2017) there are few empirical studies of how scientific information informs policy making (Desmarais and Hird 2014; Newman et al. 2017). DataLab is working with researchers in Environmental Science and Policy to help quantify and characterize the use of science and federal agencies’ environmental assessments.

    Read More>>

  • AVA_image

    Digitizing American Viticultural Areas (AVAs)

    DataLab, in conjunction with UCSB, Virginia Tech, other partner organizations, and contributions from the general public, are creating a publicly accessible geospatial version American Viticultural Areas boundaries. Using the text descriptions from the ATPF Code of regulations, we are building this dataset from official descriptions. These data are freely available in geoJSON and shapefile format. This dataset provides wine researchers with an important tool as they examine the scientific, economic and historical aspects of viticulture.

    Read More>>

  • Digital Humanities

    English Broadside Ballad Archive

    The English Broadside Ballad Archive (EBBA) was created to catalog and showcase all surviving ballads from 17th century England--currently around 10 thousand unique ballads. EBBA was started in 2003 at the University of California, Santa Barbara, its institutional home, by Dr. Patricia Fumerton, who continues to serve as the director of the Archive.  DataLab’s Executive Director, Carl Stahmer, has served as the archive’s Associate Director since 2008 and is responsible for overseeing the archive’s technical development. As EBBA’s collection of ballads has grown, the DataLab has worked to expand the capabilities of the archive by providing functionality that allows users to apply computational methods to perform advanced analysis of the materials archived in the collection.

    Read More>>

  • Aerial view Aerial view at 2500 feet looking south of the Dutch Slough Tidal Marsh Restoration Project the construction site, in the Sacramento-San Joaquin Delta near Oakley, California. 
The restoration project implemented by the California Department of Water Resources will restore 1,187 acres into a tidal marsh to provide habitat for salmon and other native fish and wildlife. Photo taken March 08, 2018.
Ken James/ California Department of Water Resources, FOR EDITORIAL USE ONLY
Filename
KJ_Delta_Aerials_1750_03_08_19.JPG
Credit/Provider
California Department of Water Resources
Copyright
Public Domain
Uploaded
20 Mar 2019
Modified
20 Mar 2019
Date Taken
08 Mar 2019
Image Size
5568 x 3712 / 10.93MB

    Informatics for CA Water Data

    This collaboration between DataLab and researchers in UC Davis’ Environmental Science and Policy department is establishing data management workflows to develop and implement a database architecture that can be used to assemble water data at different levels of aggregation, extend to new datasets, visualize and map data in different ways for policy stakeholders, and eventually become available to other researchers and government agencies. This Start-Up project focuses on sustainable groundwater management datasets, specifically the 2014 Sustainable Groundwater Management Act (SGMA) in California.

    Read More>>

  • Getty-trust-logo

    Shared Cataloging of Early Printed Images

    Through the generous support of The Getty Foundation, DataLab is working to develop an infrastructure that leverages Content Based Image Recognition (CBIR) to facilitate shared cataloging of early printed images from the early modern period.  Our vision is to develop an environment in which a cataloger or archivist who is describing an image can use CBIR to search across collections and institutions for copies of the same or similar images, retrieve the cataloging records for matched images, and easily ingest retrieved cataloging data into the local datastore.  In short, we intend to provide an infrastructure that allows image catalogers to quickly and easily ask, “Has anyone else described an image like this?” and, if so, “How was it described?” Such a system would improve the quality and interoperability of descriptive metadata and speed up image cataloging efforts, thereby improving access to collections worldwide.  

    Read More>>

Past Projects

  • Arch-V

    Archive-Vision

    Archive-Vision (archv or arch-v) is a collection of computer vision programs written in C++ which utilizes functions from the OpenCV library to perform analysis on large image sets. The primary function is to locate recurring patterns within each image in a set of images. Arch-v locates features from a given seed image within an imageset and outputs the image(s) with the most similarities. The first program, processImages.cpp, generates text files containing the keypoints and their mathmatical descriptors; with the keypoints, analysis can be done to compare images and find matches. The second program, scanDatabase.cpp, finds the images that are most similar to a given seed image. The third program, drawMatches.cpp, compares two images, locates their matches based on homography, then draws the keypoints and their relative match; this is most useful when the best matches have already been found.

    Read More>>

  • aspect_author_plot_no_labels

    Assessing Impact of Outreach through Software Citation in Geodynamics

    The Computational Infrastructure for Geodynamics is a community of software users and user-developers who model physical processes in the Earth and planetary interiors. From 2010-2018, the community of researchers published upward of 638 peer reviewed papers in more than 124 venues. We analyzed this corpus of publications to understand the impact of CIG workshops and tutorials, measured through software citation. We automated article analysis using text extraction and tokenization techniques. Patterns in co-mentioned software suggest that usage for some tools cross-cuts many domains.

    Read More>>

  • Shields

    BIBFLOW

    BIBFLOW is a two-year project that is funded by the Institute of Museum and Library Services. The purpose of this project is to investigate the future of library services that can include cataloging and related workflows, new data models, and new encoding and exchange formats. At the end of the two-year time table, there will be a roadmap for the academic and library communities that would serve as a guide for the changes that are occurring in academia.

    Read More>>

  • library_of_congress

    Chronicling the rise of “creativity”

    The Creativeness Digital Scholarship Group (CDSG) is composed of a team of researchers uncovering and exploring the forgotten sources, meanings, and social worlds of creativeness prior to the meteoric rise of a scientific “creativity” in the 1970s. The CDSG’s focuses on applying a range of Natural Language Processing and Machine Learning techniques to perform an archaeology of discourses of creativeness and related concepts, unearthing new finds, making new connections, and interpreting its cultural and political relevance for the time period in which they were embedded. Most of our sources are from the post-Civil War period to the end of the Space Race, roughly the century between 1870-1970. This was a period in which the noun “creativity” rarely appeared and took its current form only toward this century’s end, especially during the 1950s.

    Read More>>

  • Capture_Cities

    City General Plans Topic Modeling & Mapping

    This project was a collaboration between Catherine Brinkley, professor in the UC Davis Department of Human Ecology, and DataLab. Catherine sought to understand the general plan documents for the cities in the state of California through topic modeling. Catherine and her team assembled the many general plan documents and DataLab staff performed a topic modeling analysis on the text of the documents and joined the resulting table to a spatial vector data containing city boundaries to allow the dataset to be easily mapping in a GIS.

    Read More>>

  • kellog_interactive_scrot

    Creating Co-Author Networks in R

    A co-author network is a great way to get a snapshot view of the breadth and depth of an individual’s body of research. I created such graphs and corresponding visualizations to highlight and celebrate the work of UC Davis scholars.

    In this post I will describe the packages I used to do this, common roadblocks and ways around them. I will highlight the use of interactive and dynamic co-author networks, which are especially useful for visualizing large co-author networks. I will assume some familiarity with R, and experience working with data structures likes lists and vectors, but no prior familiarity with packages for working with networks.

    Read More>>

  • Grass Valley Fire Districts Map from 1908

    Digitized Maps Demonstration

    Special Collections is undertaking a project to identify and digitize unique maps in our collection. Library volunteer, Scott Sibbett, is working with Map Assitant, Dawn Collings, to identify which of the library’s holdings are unique in the University of California system. The pilot focuses on out-of-copyright maps. After the list list of maps is complete, high quality scanning will begin, starting with the smaller maps that can be scanned on our existing scanning equipment, followed by larger maps that will be scanned off site.

    Read More>>

  • 2009-5602

    English Short Title Catalogue

    This project was originally intended to create a, “machine-readable catalogue of books, pamphlets and other ephemeral material printed in English-speaking countries from 1701-1800.”

    Read More>>

  • wine_featured_image

    Extracting wine price data from historical catalogs

    This project is a collaboration between the DataLab and UC Davis Library funded by the Sloan Foundation to extract historical price data from an archive of wine catalogs published by Sherry Lehmann. The primary goal of the project was to create a database of historical price information that could help wine economists study wine markets over time. Secondary goals included the development of open-source table-extraction software for images built upon the Rtesseract package (an R interface to the tesseract OCR – Optical Character Recognition – system), and hosting hackathons promoting authentic data science skills for UC Davis students. 

    Read More>>

  • Brinkley-story-magnified2-Oct-2017-02-960x600-c-center

    Gender and Citation Disparities

    Leveraging bibliometrics to measure the impact of scholarly publications and explore under-representation and attribution in science. Citation counts help a research community understand the importance of a given scholarly work. But, implicit bias can affect how researchers cite one another. By employing bibliometrics and text mining, we aided researchers in the social sciences to explore the disparity between citation counts and scholarly influence for two pivotal case studies: Rachel Carson’s Silent Spring and Jane Jacobs’ The Life and Death of the Great American City.

    Read More>>

  • collector_summary

    Identifying minimum infrastructure needs for comfortable bicycling

    We implemented Bayesian models with random effects to determine which features of streets and individuals had the strongest relationships with comfort ratings. Not surprisingly, we found a mix of street-level and individual characteristics to be important predictors. We found random effects to be important for controlling for individual tendencies to rank low or high, and for interactions between street-level variables that we couldn’t put explicitly in our models.

    Read More>>

  • Capture_InternetNewsArchive

    Immigration in the Media: TV News Archive Scraping

    This project was a collaboration with Professor Caitlin Patler and postdoctoral scholar Robin Savinar in the UC Davis Department of Sociology and DataLab to scrape the Internet Archive’s TV News database for metadata on TV news programs with keywords related to immigration. While the Internet Archive has an API for searching some parts of their databases, at the time of this research and publication of this story, there was no way to use the available APIs to search the transcripts of the news stories. We solved this problem by using traditional webscraping methods to first search the captions database and then scrape the needed metadata from the links appearing in the search results. We combined the scraped metadata results with data from the FCC about station locations to assess the possibility of mapping the results and determined that a local TV schedule data would be needed to complete the mapping in a comprehensive way.

    Read More>>

  • The Pioneering Punjabis Digital Archive

    The Pioneering Punjabis Digital Archive (http://pioneeringpunjabis.ucdavis.edu/) offers a window into the story of South Asian immigrants from the Punjab region in north India to California since the turn of the twentieth century. Explore over 700 video interviews, speeches, diaries, photographs, articles, and letters in which Punjabi Americans share their life stories, values, and contributions to California’s history over the last hundred and twenty years.

    Read More>>

  • WaltWhitman2

    Places in Walt Whitman

    Merging text mining and the geospatial sciences to map the poetry of Walt Whitman. The American poet Walt Whitman worked during the period of transition from transcendentalism to realism and, due to this, many of his writings are rooted in physical spaces. Uncovering those spatial relationships provides another lens by which to understand American literature. This project used text mining to extract all locations mentioned in Whitman’s works, which were then assembled into a visual map for further exploration.

    Read More>>

  • Medical_Center3

    Predicting Length of Hospital Stays

    One of the most significant problems that hospitals across the country are facing at the moment is the prediction of how long each patient will remain in said hospital. This project is attempting to build a better predictive model by taking into account both quantitative and qualitative data from hospitals. The main source of information is coming from classifying and mining doctors and nurses notes and using that information to create a model that better provides an estimate on each patients duration of stay.

    Read More>>

  • Play the Knave

    Play the Knave Modlab

    The project, in coordination with the DSI, involves the creation of a gaming environment in which students recreate scenes from many works of Shakespeare. With this project, movement and vocal data are gathered as participants act out a given scene. From here, the data is taken and created into a video of the production and can be shared with others. This is an exploratory project in which the researchers are trying to not only bring about a better understanding of Shakespeare’s works but also recognizing speech and movement patterns.

    Read More>>

  • Screen Shot 2017-01-24 at 2.03.14 PM

    Social Networks of Citation

    Tracing scholarly influence in medicine. The purpose of this project was to create a peer network of all publications and collaborations that span from a single faculty member. Through mining med-lined data, the network was successfully created.

    Read More>>