Unlocking Insights from Public Data: A case study with COVID-19 exposure data
Map of Potential COVID-19 Exposures on UC Davis Main Campus; Updated Twice Daily
The Power of Data Visualization
Throughout the pandemic we have seen a proliferation of public data reporting and sharing, which for many may have been their first exposure to dashboards and real-time data visualization. Data visualizations are arguably the most powerful tool we have to enable and share insights from our data. Good data visualizations strive to accurately and faithfully display data, allowing us to quickly and efficiently detect patterns, make connections, and draw conclusions. Good data visualizations make data accessible. But what goes into turning public data into an impactful visual?
UC Davis has made available and visualized a plethora of data on its COVID-19 dashboard, where we can explore our university’s high vaccination rates and low positivity rates. To help our community further engage with these public data and to illustrate how data science workflows and tools can be combined to create similar impactful visualizations, the DataLab team focused on a dataset that has not yet been visualized – the AB 685 COVID-19 exposure data. All the code from this project is available, so you too can learn to turn public tabular data into your own interactive data visualization.
About AB 685
UC Davis makes all known potential worksite exposures to COVID-19 on campus publicly available through the Potential Worksite Exposure Reporting (AB 685) webportal. These data only report known potential exposures, and do not directly address exposure risk. While the exposure data tables in the webportal are comprehensive, it can be difficult to interpret these data beyond identifying potential exposures for specific buildings on specific days. Visualizing these data by layering the exposures onto a campus map can help those unfamiliar with campus geography (including our first and second year students who are attending in person for the first time this fall) better assess what potential exposures may be more relevant to them. Adding an interactive time series component further enhances our ability to quickly assess the extent of that relevance, and explore other patterns in the data. Making the visualization dynamic allows for real time public access to these insights.
The Making of an Interactive and Dynamic Visualization
To create this data visualization, the UC Davis DataLab wrote computer scripts to collect the exposure data from the public AB 685 webportal, and automated these scripts to run twice daily to ensure near real-time consistency with the campus’ online dataset. The potential exposure worksites are then paired with known building names on campus, and the potential exposure dates are converted into date ranges. These spatial and temporal components are then combined with a map of the campus, and shown on a timeline. Campus buildings on this map are shown in gold during the time frames of potential exposures, helping students and the broader campus community understand where and when a COVID-19 exposure may have occurred.
As with nearly all data science projects, a majority of the effort in creating the final data visualization centered on “data munging” — the process of cleaning and transforming data from its “raw” form into formats that can be analyzed and displayed. For example, a lack of standardization in worksite names from the AB 685 webportal meant writing additional cleaning scripts to unify the dataset with the underlying spatial geography. While “Activities and Recreation Center” is the official name of the campus fitness center, the webportal exposure dataset includes more colloquial and variable names for this building including “ARC” and “Activities and Recreation Center (ARC)”. The ARC example was common enough that we could build a dictionary to recognize it, but a large number of worksites in the dataset had to be checked by hand. As data scientists we don’t always get to design our database, and so we develop clever techniques for parsing and aligning data. Future work to fully automate the entirety of this visualization workflow would include leveraging additional text mining techniques to fully automate name matching efforts.
The UC Davis DataLab hopes the community finds this visualization useful as an exploration tool and as an example of the power of a good data visualization. If you would like to learn more about the code we used to make this visualization see the publicly available project repository. Want to learn how you can use these same tools to make your own interactive data visualization? Keep an eye out for our upcoming workshops and check out the DataLab training archives for recordings and learner guides from our past workshops.
Stay safe Aggies, and remember to follow campus COVID-19 policies, keep an eye on campus announcements and sign up for CA Notify to get personalized COVID-19 alerts as we start this fall quarter back on campus.
Resources
- More about how DataLab is helping UC Davis fight COVID-19: Predicting COVID-19 hospital admissions at UC Davis Medical Center project description and blog post.
- More about how DataLab is helping teach students data visualization see our training archive, and specifically our recent Data Visualization Principles and Critical Approach to Data Visualization workshops.
This post was written by graduate student Jared Joseph and Dr. Pamela Reynolds with contributions from Dr. Michele Tobias and Jessica Nusbaum. The data visualization project was initiated and led by Dr. Tobias, DataLab’s geospatial data specialist, with significant contributions from undergraduates Elijah Stockwell and Sebastian Lopez, and feedback from the wider DataLab team. Questions? Contact us.