The Internet Archive’s TV News search interface provides the ability to search closed captioning transcripts unavailable with the Internet Archive’s current APIs.

This project was a collaboration with Professor Caitlin Patler and postdoctoral scholar Robin Savinar in the UC Davis Department of Sociology and DataLab to scrape the Internet Archive’s TV News database for metadata on TV news programs with keywords related to immigration. While the Internet Archive has an API for searching some parts of their databases, at the time of this research and publication of this story, there was no way to use the available APIs to search the transcripts of the news stories. We solved this problem by using traditional webscraping methods to first search the captions database and then scrape the needed metadata from the links appearing in the search results. We combined the scraped metadata results with data from the FCC about station locations to assess the possibility of mapping the results and determined that a local TV schedule data would be needed to complete the mapping in a comprehensive way.