Preserving U.S. Government Websites and Data as the Obama Term Ends

Long before the 2016 Presidential election cycle librarians have understood this often-overlooked fact: vast amounts of government data and digital information are at risk of vanishing when a presidential term ends and administrations change.  For example, 83% of .gov pdf’s disappeared between 2008 and 2012.

That is why the Internet Archive, along with partners from the Library of Congress, University of North Texas, George Washington University, Stanford University, California Digital Library, and other public and private libraries, are hard at work on the End of Term Web Archive, a wide-ranging effort to preserve the entirety of the federal government web presence, especially the .gov and .mil domains, along with federal websites on other domains and official government social media accounts.

While not the only project the Internet Archive is doing to preserve government websites, ftp sites, and databases at this time, the End of Term Web Archive is a far reaching one.

The Internet Archive is collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. The effort is likely to preserve hundreds of millions of individual government webpages and data and could end up totaling well over 100 terabytes of data of archived materials. Over its full history of web archiving, the Internet Archive has preserved over 3.5 billion URLs from the .gov domain including over 45 million PDFs.

This end-of-term collection builds on similar initiatives in 2008 and 2012 by original partners Internet Archive, Library of Congress, University of North Texas, and California Digital Library to document the “gov web,” which has no mandated, domain-wide single custodian. For instance, here is the National Institute of Literacy (NIFL) website in 2008. The domain went offline in 2011. Similarly, the Sustainable Development Indicators (SDI) site was later taken down. Other websites, such as invasivespecies.gov were later folded into larger agency domains. Every web page archived is accessible through the Wayback Machine and past and current End of Term specific collections are full-text searchable through the main End of Term portal. We have also worked with additional partners to provide access to the full data for use in data-mining research and projects.

The project has received considerable press attention this year, with related stories in The New York Times, Politico, The Washington Post, Library Journal, Motherboard, and others.

“No single government entity is responsible for archiving the entire federal government’s web presence,” explained Jefferson Bailey, the Internet Archive’s Director of Web Archiving.  “Web data is already highly ephemeral and websites without a mandated custodian are even more imperiled. These sites include significant amounts of publicly-funded federal research, data, projects, and reporting that may only exist or be published on the web. This is tremendously important historical information. It also creates an amazing opportunity for libraries and archives to join forces and resources and collaborate to archive and provide permanent access to this material.”

This year has also seen a significant increase in citizen and librarian driven “hackathons” and “nomination-a-thons” where subject experts and concerned information professionals crowdsource lists of high-value or endangered websites for the End of Term archiving partners to crawl. Librarian groups in New York City are holding nomination events to make sure important sites are preserved. And universities such as  The University of Toronto are holding events for “guerrilla archiving” focused specifically on preserving climate related data.

We need your help too! You can use the End of Term Nomination Tool to nominate any .gov or government website or social media site and it will be archived by the project team.   If you have other ideas, please comment here or send ideas to info@archive.org.   And you can also help by donating to the Internet Archive to help our continued mission to provide “Universal Access to All Knowledge.”

14 thoughts on “Preserving U.S. Government Websites and Data as the Obama Term Ends

  1. Pingback: Preserving U.S. Government Websites and Data as the Obama Term Ends | Library Stuff

  2. Pingback: The Internet Archive aims to preserve 100 terabytes of government website data… just in case | GamingSoFun

  3. Pingback: Q&A: Michelle Murphy, the U of T professor who's racing to preserve climate-change data before Donald Trump takes office

  4. Mike H.

    Thank you for doing this. Preserving all possible data and documentation is a fundamentally essential task, especially ahead of the US government’s transition to a new administration that has already stated they plan to dismantle entire agencies.

  5. Dc

    Amen! When the Amican citizens who are the forgotten, neglected and somewhat punished, are in charge of the government and pay them to work for us and to be open and honest! You CANNOT depend on it! So, Big Thank You for keeping our Gernment transparent!

  6. Pingback: Would Like to Archive Government Web Services, not just Web Sites– Please help | Internet Archive Blogs

  7. Pingback: Would Like to Archive Government Web Services, not just Web Sites– Please help | Internet Archive Blogs

  8. Pingback: A new President and the End of Term Web Archive | Reading, Writing, Research

  9. Kevin Tinsley

    Have you considered that the “government data” is not yours to hide , copy , or manipulate? There is no such thing as “settled science”, and you are doing the nation an extreme disservice by your behavior. The data and studies were paid for with my, i.e. the American taxpayer’s, dollars. You have no more of a right to try to keep it from the public servants whom elected representatives have placed over you than they do to erase it. Make it public, with no little tricks to “adjust” the data to fit your goals, and maybe the people who doubt you will believe you. Your claims of Trump putting an “ideologue” in charge rings false, for you are behaving as an ideologue by claiming your, and only your, ideas are the correct ones.

  10. I love Internet Archive

    Everyone was worried about climate change data being erased, but what about Trump’s own views on climate change? Seems someone already did the job:

    https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/418542137899491328 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/349973299889057792 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/316252016190054400 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/475668993928212480 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/435574043354611712 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/270628609817976834 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/435393088383889408 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/412159674042294272 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/326875628966117376 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/349973845228269569 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/512246203967619072 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/338448296022511618 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/488825209189711873 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/427226424987385856 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/417818392826232832 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/488926006225285120 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/431018674695442432 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/428418323660165120 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/653385381526806528 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/404420095113715712 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/408977616926830592 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/319377285687939072 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/428416406280241153 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/408380302206443520 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/521862351218573312 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/489381851350319107 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/407505938774757376 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/568387798924963840 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/493935815207043072 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/420333882597466112 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/450964791985971200 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/326874524576526337 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/422819593120256000 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/568021533131718656 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/408018451362766849 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/416909004984844288 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/334254335116587008 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/535102735830773760 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/338978381636984832 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/428954382915223552 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/417816035107299328 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/264010129106665472 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/488813607958757376 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/264007296970018816 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/427556692109574146 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/412162068989874176 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/372781203239104512 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/440811151283486720 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/326781792340299776 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/408983789830815744 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/416539702096052224 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/338429342646423553 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/402217536751951872 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/423179182198104064 https://web.archive.org/web/https://twitter.com/realdonaldtrump/status/314744479821205505

  11. Carol Delahoyde

    I am not sure the Library of Congress can be trusted. In the past I discovered what is called the 10,000 Name Petition at the Library of Congress. That particular petition was signed by by 10,000 people of the Commonwealth of Virginia, petitioning for religious freedom. My ancestor was a signor and obviously I was working on Genealogy. That being said I had also used that petition in discussions of separation of Church and State. The link I had saved to the Library of Congress page no longer leads to said petition.

    Thinking further, Maybe this just proves your point. My question would be where is the copy of the 10,000 Name Petition? Removed from view so as to further the argument for theocracy?

    1. Carol Delahoyde

      Upon further research in old genealogical work I have found this original reference to the petition.
      The Ten Thousand Name Petition, Miscellaneous petitions, 1776-1777, Legislative Petitions, General Assembly, Record Group 78, Archival and Information Services Division, Library of Virginia. (This document has been Microfilmed)

      This is Published in the Virginia Genealogical Society, Vol 37, No. 1

Comments are closed.