Preserving U.S. Government Websites and Data as the Obama Term Ends

Long before the 2016 Presidential election cycle librarians have understood this often-overlooked fact: vast amounts of government data and digital information are at risk of vanishing when a presidential term ends and administrations change.  For example, 83% of .gov pdf’s disappeared between 2008 and 2012.

That is why the Internet Archive, along with partners from the Library of Congress, University of North Texas, George Washington University, Stanford University, California Digital Library, and other public and private libraries, are hard at work on the End of Term Web Archive, a wide-ranging effort to preserve the entirety of the federal government web presence, especially the .gov and .mil domains, along with federal websites on other domains and official government social media accounts.

While not the only project the Internet Archive is doing to preserve government websites, ftp sites, and databases at this time, the End of Term Web Archive is a far reaching one.

The Internet Archive is collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. The effort is likely to preserve hundreds of millions of individual government webpages and data and could end up totaling well over 100 terabytes of data of archived materials. Over its full history of web archiving, the Internet Archive has preserved over 3.5 billion URLs from the .gov domain including over 45 million PDFs.

This end-of-term collection builds on similar initiatives in 2008 and 2012 by original partners Internet Archive, Library of Congress, University of North Texas, and California Digital Library to document the “gov web,” which has no mandated, domain-wide single custodian. For instance, here is the National Institute of Literacy (NIFL) website in 2008. The domain went offline in 2011. Similarly, the Sustainable Development Indicators (SDI) site was later taken down. Other websites, such as were later folded into larger agency domains. Every web page archived is accessible through the Wayback Machine and past and current End of Term specific collections are full-text searchable through the main End of Term portal. We have also worked with additional partners to provide access to the full data for use in data-mining research and projects.

The project has received considerable press attention this year, with related stories in The New York Times, Politico, The Washington Post, Library Journal, Motherboard, and others.

“No single government entity is responsible for archiving the entire federal government’s web presence,” explained Jefferson Bailey, the Internet Archive’s Director of Web Archiving.  “Web data is already highly ephemeral and websites without a mandated custodian are even more imperiled. These sites include significant amounts of publicly-funded federal research, data, projects, and reporting that may only exist or be published on the web. This is tremendously important historical information. It also creates an amazing opportunity for libraries and archives to join forces and resources and collaborate to archive and provide permanent access to this material.”

This year has also seen a significant increase in citizen and librarian driven “hackathons” and “nomination-a-thons” where subject experts and concerned information professionals crowdsource lists of high-value or endangered websites for the End of Term archiving partners to crawl. Librarian groups in New York City are holding nomination events to make sure important sites are preserved. And universities such as  The University of Toronto are holding events for “guerrilla archiving” focused specifically on preserving climate related data.

We need your help too! You can use the End of Term Nomination Tool to nominate any .gov or government website or social media site and it will be archived by the project team.   If you have other ideas, please comment here or send ideas to   And you can also help by donating to the Internet Archive to help our continued mission to provide “Universal Access to All Knowledge.”

This entry was posted in Announcements, News and tagged , , , . Bookmark the permalink.

14 Responses to Preserving U.S. Government Websites and Data as the Obama Term Ends

  1. Pingback: Preserving U.S. Government Websites and Data as the Obama Term Ends | Library Stuff

  2. Pingback: The Internet Archive aims to preserve 100 terabytes of government website data… just in case | GamingSoFun

  3. Pingback: Q&A: Michelle Murphy, the U of T professor who's racing to preserve climate-change data before Donald Trump takes office

  4. Mike H. says:

    Thank you for doing this. Preserving all possible data and documentation is a fundamentally essential task, especially ahead of the US government’s transition to a new administration that has already stated they plan to dismantle entire agencies.

  5. Thanks for sharing your thoughts about meta_keyword. Regards

  6. Dc says:

    Amen! When the Amican citizens who are the forgotten, neglected and somewhat punished, are in charge of the government and pay them to work for us and to be open and honest! You CANNOT depend on it! So, Big Thank You for keeping our Gernment transparent!

  7. Cameron says:

    Where is the archive for Hillary Clinton’s 33,000 deleted emails?

  8. Pingback: Would Like to Archive Government Web Services, not just Web Sites– Please help | Internet Archive Blogs

  9. Pingback: Would Like to Archive Government Web Services, not just Web Sites– Please help | Internet Archive Blogs

  10. Pingback: A new President and the End of Term Web Archive | Reading, Writing, Research

  11. Kevin Tinsley says:

    Have you considered that the “government data” is not yours to hide , copy , or manipulate? There is no such thing as “settled science”, and you are doing the nation an extreme disservice by your behavior. The data and studies were paid for with my, i.e. the American taxpayer’s, dollars. You have no more of a right to try to keep it from the public servants whom elected representatives have placed over you than they do to erase it. Make it public, with no little tricks to “adjust” the data to fit your goals, and maybe the people who doubt you will believe you. Your claims of Trump putting an “ideologue” in charge rings false, for you are behaving as an ideologue by claiming your, and only your, ideas are the correct ones.

  12. I love Internet Archive says:

    Everyone was worried about climate change data being erased, but what about Trump’s own views on climate change? Seems someone already did the job:

  13. Carol Delahoyde says:

    I am not sure the Library of Congress can be trusted. In the past I discovered what is called the 10,000 Name Petition at the Library of Congress. That particular petition was signed by by 10,000 people of the Commonwealth of Virginia, petitioning for religious freedom. My ancestor was a signor and obviously I was working on Genealogy. That being said I had also used that petition in discussions of separation of Church and State. The link I had saved to the Library of Congress page no longer leads to said petition.

    Thinking further, Maybe this just proves your point. My question would be where is the copy of the 10,000 Name Petition? Removed from view so as to further the argument for theocracy?

    • Carol Delahoyde says:

      Upon further research in old genealogical work I have found this original reference to the petition.
      The Ten Thousand Name Petition, Miscellaneous petitions, 1776-1777, Legislative Petitions, General Assembly, Record Group 78, Archival and Information Services Division, Library of Virginia. (This document has been Microfilmed)

      This is Published in the Virginia Genealogical Society, Vol 37, No. 1

Comments are closed.