Web Archiving with National Libraries


After the Internet Archive started web archiving in the late 1990s, National libraries also took their first steps towards systematic preservation of the web. Over 30 national libraries currently have a web archiving programme. Many among them archive the web under a legal mandate, which is an extension of the Legal Deposit system to cover non-print publication and enable heritage institutions such as a national library to collect copies of online publications within a country or state.

The Internet Archive has a long tradition of working with national libraries. As a key provider of web archiving technologies and services, Internet Archive has made available open source software for crawling and access, enabling national bodies to undertake web archiving locally. The Internet Archive also runs a global web archiving service for the general public, a tailored broad crawling service for national libraries and Archive-It, a subscription service for creating, managing, accessing and storing web archive collections. Many national libraries are partners of these services.

The Internet Archive conducted a stakeholders’ consultation exercise between November 2015 and March 2016, with the aim to understand current practices, and then review Internet Archive’s current services in this light and explore new aspects for national libraries. Thirty organizations and individuals were consulted, representing national libraries, archives, researchers, independent consultants and web archiving service providers.

The main findings of the consultation are summarized below, which give an overview of the current practices of web archiving at national libraries, as well as a general impression of the progress in web archiving and specific feedback on Internet Archive’s role and services.

  • Strategy and organization
    Web archiving has become increasingly important in national libraries’ strategy. Many have wanted to own the activity and develop the capability in-house. This requires integration of web archives with the library other collections and the traditional library practice for collection development. Budget cuts and lack of resources were observed at many national libraries, making it difficult to sustain the ongoing development of tools for web archiving.
  • Quality and comprehensiveness of collection
    There is a general frustration about the content gaps in the web archives. National libraries also have strong desires to collect the portion of Twitter, YouTube, Facebook and other social media which is considered as part of their respective national domain. They would also like to leverage web archiving as a complementary collecting tool for digital objects on the web and that are included in web archives such as eBooks, eJournals, music and maps.
  • Access and research use
    National web archives are, in general, poorly used due to access restrictions. Many national libraries wish to support research use of their web archives, by engaging with researchers to understand requirements and eventually embedding web archive collections into the research process.
  • Reflection on 20 years of web archiving
    While there is recognition of the progress in web archiving, there is also a general feeling that the community is stuck with a certain way of doing things without making any significant technological progress in the last ten years, and being outpaced by the fast evolving web.
  • Perception and expectation of Internet Archive’s services
    Aspects of Internet Archive’s currently services are unknown or misperceived. Stakeholders wish for services that are complementary to what national libraries undertake locally and help them put in place better web archives. There is a strong expectation for the Internet Archive to lead the ongoing collaborative development of (especially) Heritrix and the Wayback software. A number of national libraries have expressed the need for a service supporting the use of key software including maintenance, support and new features. There are also clearly expressed interests in services that can help libraries collect advanced content such as social media and embedded videos.

The Internet Archive would like to thank the participants again for being open with us and providing us with valuable input which will inform the development and improvement of our services.

The full consultation report can be accessed at https://archive.org/details/InternetArchiveStakeholdersConsultationFindingsPublic.

5 thoughts on “Web Archiving with National Libraries

  1. Pingback: Web Archiving with National Libraries | Library Stuff

  2. Pingback: New Report: “Web Archiving at National Libraries Findings of Stakeholders’ Consultation by the Internet Archive” | LJ INFOdocket

  3. Leigh Anne Dear

    I realize I’m on the outside looking in but it would be helpful to unaffiliated researchers such as myself (retired, educated etc.) if you’d define some terms; stakeholder for example. That way folks such as myself better understand the general scope of your reach. All in all though I’m glad to see my tax dollars at work in such an endeavor as this. Truly.

    1. helen Post author

      Thank you for your feedback. For this particular stakeholders’ consultation exercise, the aim was to understand national libraries’ web archiving requirements and review the Internet Archive’s current services in this light. The researchers involved in the exercise are actively involved in using web archives operated by the Internet Archive and national libraries.

  4. Pingback: Paul Biba’s eBook, eLibrary and ePublishing news compilation for week ending Saturday, May 28 | The Digital Reader

Comments are closed.