What are some of the new initiatives from the U.S. Government Publishing Office? Director Hugh Halpern offers an update, which has been incorporated into our program for tonight’s Building Democracy’s Library event.
Many thanks to Director Halpern and the U.S. Government Publishing Office for sharing this update!
Join us at the Internet Archive this Saturday January 7 for a government data hackathon! We are hosting an informal hackathon working with White House social media data, government web data, and data from election-related collections. We will provide more gov data than you can shake a script at! If you are interested in attending, please register using this form. The event will take place at our 300 Funston Avenue headquarters from 10am-5pm.
We have been working with the White House on their admirable project to provide public access to eight years of White House social media data for research and creative reuse. Read more on their efforts at this blog post. Copies of this data will be publicly accessible at archive.org. We have also been furiously archiving the federal government web as part of our collaborative End of Term Web Archive and have also collected a voluminous amount of media and web data as part of the 2016 election cycle. Data from these projects — and others — will be made publicly accessible for folks to analyze, study, and do fun, interesting things with.
At Saturday’s hackathon, we will give an overview of the datasets available, have short talks from affiliated projects and services, and point to tools and methods for analyzing the hackathon’s data. We plan for a loose, informal event. Some datasets that will be available for the event and publicly accessible online:
- Obama Administration White House social media from 2009-current, including Twitter, Tumblr, Vine, Facebook, and (possibly) YouTube
- Comprehensive web archive data of current White House websites: whitehouse.gov, petitions.whitehouse.gov, letsmove.gov and other .gov websites
- The End of Term Web Archives, a large-scale collaborative effort to preserve the federal government web ( .gov/.mil) at presidential transitions, including web data from 2008, 2012, and our current 2016 project
- Special sub-collections of government data, such as every powerpoint in the Internet Archive’s web archive from the .mil web domain
- Extensive archives of of social media data related to the 2016 election including data from candidates, pundits, and media
- Full text transcripts of Trump candidate speeches
- Python notebooks, cluster computing tools, and pointers to methods for playing with data at scale.
Much of this data was collected in partnership with other libraries and with the support of external funders. We thank, foremost, the current White House Office of Digital Strategy staff for their advocacy for open access and working with us and others to make their social media open to the public. We also thank our End of Term Web Archive partners and related community efforts helping preserve the .gov web, as well as the funders that have supported many of the collecting and engineering efforts that makes all this data publicly accessible, including the Institute of Museum and Library Services, Altiscale, the Knight Foundation, the Democracy Fund, the Kahle-Austin Foundation, and others.