Every four years, before and after the U.S. presidential election, a team of libraries and research organizations, including the Internet Archive, work together to preserve material from U.S. government websites during the transition of administrations.
These โEnd of Termโ (EOT) Web Archive projects have been completed for term transitions in 2004, 2008, 2012, 2016, and 2020, with 2024 well underway. The effort preserves a record of the U.S. government as it changes over time for historical and research purposes.
With two-thirds of the process complete, the 2024/2025 EOT crawl has collected more than 500 terabytes of material, including more than 100 million unique web pages. All this information, produced by the U.S. governmentโthe largest publisher in the worldโis preserved and available for public access at the Internet Archive.
โAccess by the people to the records and output of the government is critical,โ said Mark Graham, director of the Internet Archiveโs Wayback Machine and a participant in the EOT Web Archive project. โMuch of the material published by the government has health, safety, security and education benefits for us all.โ
The EOT Web Archive project is part of the Internet Archiveโs daily routine of recording whatโs happening on the web. For more than 25 years, the Internet Archive has worked to preserve material from web-based social media platforms, news sources, governments, and elsewhere across the web. Access to these preserved web pages is provided by the Wayback Machine. โItโs just part of what we do day in and day out,โ Graham said.
To support the EOT Web Archive project, the Internet Archive devotes staff and technical infrastructure to focus on preserving U.S. government sites. The web archives are based on seed lists of government websites and nominations from the general public. Coverage includes websites in the .gov and .mil web domains, as well as government websites hosted on .org, .edu, and other top level domains.
The Internet Archive provides a variety of discovery and access interfaces to help the public search and understand the material, including APIs and a full text index of the collection. Researchers, journalists, students, and citizens from across the political spectrum rely on these archives to help understand changes on policy, regulations, staffing and other dimensions of the U.S. government.ย
As an added layer of preservation, the 2024/2025 EOT Web Archive will be uploaded to the Filecoin network for long-term storage, where previous term archives are already stored. While separate from the EOT collaboration, this effort is part of the Internet Archiveโs Democracyโs Library project. Filecoin Foundation (FF) and Filecoin Foundation for the Decentralized Web (FFDW) support Democracyโs Library to ensure public access to government research and publications worldwide.
According to Graham, the large volume of material in the 2024/2025 EOT crawl is because the team gets better with experience every term, and an increasing use of the web as a publishing platform means more material to archive. He also credits the EOT Web Archiveโs success to the support and collaboration from its partners.
Web archiving is more than just preserving historyโitโs about ensuring access to information for future generations.The End of Term Web Archive serves to safeguard versions of government websites that might otherwise be lost. By preserving this information and making it accessible, the EOT Web Archive has empowered researchers, journalists and citizens to trace the evolution of government policies and decisions.
More questions? Visit https://eotarchive.org/ to learn more about the End of Term Web Archive.
The following guest post from Aaron OโDonovan (aodonovan@columbuslibrary.org), Columbus Metropolitan Library Special Collections Manager, is part of a series written by members of the Internet Archiveโs Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices.ย
As a local history and genealogy department in a public library, our materials run the gamut from books from the 1700s about the creation of our country to yearbooks of local high schools that patrons like to peruse for nostalgiaโs sake. In addition to our approximately 90,000 reference books, our archives room holds approximately 2,500 linear feet of photographic material, records, and manuscript material. We are constantly seeking new opportunities to expand access to our collections for our patrons, and when the opportunity arose to digitize materials as part of the Community Webs program, I knew what I wanted to digitize first: local neighborhood newspapers of Columbus.
We joined the Community Webs program in 2017 to archive important cultural and local government websites of Columbus, Ohio. The catalyst for the project was the belief that we had done a good job of telling the story of Columbus in its first 150 years, but we were missing telling the story of the evolution of the city of the more recent past, as well as failing to record the present. With the object of capturing more recent changes to our city, we focused on archiving our city government website, as well as archiving social service websites, especially those helping new immigrants in our city. Because of the Community Webs program, we were able to take a snapshot of the diverse populations that were making their homes in Columbus, and the medium of web archiving was the only way we were able to tell the stories of these new immigrant communities including the Somalian, Nepalese, Bhutanese, and Mexican populations. To further this focus on migration patterns into Columbus, we felt it was important to make our neighborhood newspapers that we had on microfilm accessible because the neighborhood newspapers featured stories and obituaries on immigrant populations who came to Columbus in the mid-19th century and early 20th century.
The newspapers had been preserved on microfilm for decades, but we were never able to digitize them due the time commitment involved for a project that size. During my time in the local history field in Columbus, it has become clear to me that our library patrons crave hyper-local history material that personally connects their stories to the place they live. While general local history topics about Columbus are popular, nothing is more popular in our library than content generated from Columbus neighborhoods. To finally get an opportunity to digitize neighborhood newspapers and make them accessible to our patrons was one that I could not pass up.
The most important newspaper for the library to digitize was the Columbus Call and Post, a historic Black newspaper that served Columbus from 1962-2007. For years patrons have asked us if the newspaper was digitized, but unfortunately all the library had was microfilm starting in 1972, which was very difficult to browse and ultimately did not serve our patronโs needs for accessibility. Because the Internet Archive performed optical character recognition (OCR) on the text of the newspapers, researchers can now use keyword searching to find an address, a business name, or search for personal names to find news stories that mention the people and places that they hold in their memory.
Digitizing the microfilm of the Call and Post also complemented another project we began several years ago when we partnered with the King Arts Complex to digitize the photograph archive of the iconic newspaper, which was donated to the organization in the mid-1990s. Many of the photographs in the collection have little to no information attached to them (information written on the back of the photographic prints, the name of the photographer, etc.). Digitization of the Call and Post provided additional information to match and apply to the photographs in the archive, adding an enhanced level of searchability and accessibility to this collection. The collections work together to preserve Black history in a way that was not possible before because much of the content from the Call and Post was unique and rare. Being able to bring this newspaper back into the public consciousness has been a thrilling experience for us.
As the project continued to take shape, we felt it was important to represent Columbus neighborhoods geographically, which also enabled us to represent different economic and ethnically diverse communities throughout Columbus history. Our most accessed newspaper thus far has been the Hilltop Record, a title which focused on a local neighborhood with strong Appalachian ties and has a long history of covering the issues of working-class citizens on the westside of Columbus. Other digitized community newspapers include :
ยท TheLinden NE News showcases stories from north Columbus, an area that has experienced several demographic shifts throughout its 100 years of history.
The rarest newspapers digitized for this project were also some of the oldest newspapers that were preserved on microfilm in our collection. Among those titles are the Ohio Columbian(1853-1856), an anti-slavery newspaper that reported on Underground Railroad activities as they were happening in Ohio and surrounding states. It has potential for illuminating our understanding and knowledge of individuals that were involved in assisting enslaved people seeking freedom in the 1850s.ย Other newspapers with great research potential include early (and shorter) runs of Black newspapers that have not been digitized before this project including The Columbus Recorder (1927),The Columbus Voice (1929), which was edited by Florence W. Oakfield,and The Ohio Torch (1928-1930), the longest running newspaper for the Black community during the 1920s. We are excited to report that researchers are already using these resources to better understand Columbus history more objectively and completely.
With this support from the Internet Archive and the National Historical Publications and Records Commission, we have been able to help our local users find information that was not available elsewhere. Recently, we had a researcher request an obituary from June of 1964 when our two major newspapers were on strike. Thankfully, the South Side Spectator had been digitized and was available through the Internet Archive. Our librarian was able to locate the obituary that was only available in that newspaper. We also got this enthusiastic email from a regular library patron after we informed them that we had digitized the Hilltop Record and it was now keyword searchable on the Internet Archive: “OH MY GOSH! ARE YOU SERIOUS!?! THAT’S FANTASTIC! Have I told you lately how much I love you guys? You rock my world! Thank you so much for everything you do. I am so grateful for everyone in Local History & Genealogy.”
Moreover, the librarians are using the digitized newspapers in regular programming, furthering our promotion of these new digitized collections. Every month the library hosts a virtual Black Heritage Collection Spotlight on a notable person or topic from Black history in Columbus. The images and news articles from the digitized Call and Post are used frequently for the program, and we look forward to learning about more ways the digitized newspapers are used in local research to highlight and deepen our communityโs connections to Columbusโ past.
The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.
Last week, along with a DDOS attack and exposure of patron email addresses and encrypted passwords, the Internet Archiveโs website javascript was defaced, leading us to bring the site down to access and improve our security.
The stored data of the Internet Archive is safe and we are working on resuming services safely. This new reality requires heightened attention to cyber security and we are responding. We apologize for the impact of these library services being unavailable.
The Wayback Machine, Archive-It, scanning, and national library crawls have resumed, as well as email, blog, helpdesk, and social media communications. Our team is working around the clock across time zones to bring other services back online. In coming days more services will resume, some starting in read-only mode as full restoration will take more time.
Weโre taking a cautious, deliberate approach to rebuild and strengthen our defenses. Our priority is ensuring the Internet Archive comes online stronger and more secure.
The following guest post from Dee Bowers (they/them), Archives Manager at the Brooklyn Public Library Center for Brooklyn History, is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices.
Some say as many as one in seven Americans have family roots in Brooklyn, and I expect the newly digitized Brooklyn city directories now available through the Internet Archive will get heavy use from genealogists, historians, authors, journalists, students, and even artists to trace connections to the diverse and ever-changing borough.
Title page, Spoonerโs Brooklyn Directory 1822. Brooklyn Public Library, Center for Brooklyn History.
In addition to helping us preserve this web-based content, Community Webs has now also made it possible to increase access to our physical collections through digitization. As part of the Collaborative Access to Diverse Public Library Local History Collections project, made possible by a grant from the National Historical Publications and Records Commission, we were able to partner with the Internet Archive to digitize 236 microfiche sheets of Brooklyn city directories.
Microfiche sheet from the Brooklyn city directories, 1822. Brooklyn Public Library, Center for Brooklyn History.
These directories show the movement, growth, and changing nature of immigrant populations in Brooklyn in the early to mid 19th century and help document the immigrant experience by providing data on the residency and, in some cases, ethnicities of Brooklynites over time. We knew that expanding digital access would be extremely useful to the many researchers who use our online resources, especially since our number one research topic is genealogy. The project is also directly in line with our mission:
“Democratize access to Brooklynโs history and be dedicated to expanding and diversifying representation of the history of the borough by unifying resources and expertise, and broadening reach and impact.“
By increasing the visibility of these collections through digitization and freely available public access, researchers and historians will have a richer, more accessible view into the diversity of American history. The history of Brooklyn is extraordinarily diverse but, like many archives, our collections donโt always tell the fullness of those stories. By expanding access to our city directories, we provide insight into earlier residents of Brooklyn and enable diverse communities to trace their Brooklyn roots to a greater degree.
Screenshot of the early Brooklyn directories in the Internet Archive.
Hereโs an example of how the directories look in theInternet Archive. In this screenshot above, they include content outside of just directory listings. In this case, thereโs a chronological listing of โmemorandaโ โ notable moments in Brooklyn history โ including โJune 11, 1812 โ News received in Brooklyn, of the declaration of war between the United States and Great Britain.โ
One example of research that can be conducted with these directories is finding out more about early Black Brooklynites. Slavery was abolished in New York State in 1827, so the earliest days of post-enslavement Brooklyn are represented in the digitized directories.
Screenshot of 1857 directory on the Internet Archive with the highlighted surname โHodges.โ
By searching the text of the directories using keywords, I picked out an individual to learn more about, Rev. William J. Hodges, who lived on Broadway in Brooklyn in 1857. By cross-referencing with our digitized newspapers, I was able to find out more about him and his abolitionist activism in Brooklyn and beyond. It turns out he was not born in Brooklyn, nor did he reside there very long, but he did make an impact during his time there, as he founded the Colored Political Association of Kings County (which is the modern-day borough of Brooklyn).
โLocal Items,โ June 5 1856, Brooklyn Times Union, page 2.
If not for the digitized city directories, I doubt I ever would have learned of Rev. Hodges and his time in Brooklyn. I hope that many more stories like these will emerge once researchers start digging into these directories.
Directory advertisement for T. Reeve, Architect and Builder.
The directories also contain items like this – an advertisement showing this architect and builderโs office on Schermerhorn Street in Downtown Brooklyn. This part of Brooklyn looks very different now, and this insight into what it looked like pre-photography is invaluable, particularly for people conducting house, building, and neighborhood research.
The directories are linked on our Search Our Collections page. We also have a tutorial for using the digitized directories. Additionally, we have several related research guides which assist researchers in exploring various topics. These materials are in the public domain, and we hope they will be used for a broad spectrum of applications, from family research to demographic research to writing to artwork. We are grateful to Community Webs, the Internet Archive, and the National Historical Publications and Records Commission for making this material available and searchable online and allowing us to expand access across the borough, city, and beyond.
The Internet Archive and Community Webs are thankful for the support from theย National Historical Publications & Records Commissionย forย Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.
On August 13, Community Webs members from all over the US and Canada gathered in Chicago for the 2024 Community Webs National Symposium. Launched in 2017, Internet Archiveโs Community Webs program empowers public libraries and other cultural heritage organizations to document their communities. Members of the program receive access to Internet Archiveโs Archive-It web archiving service and Vault digital preservation service as well as training, technical support, and opportunities for professional development.
Members of Internet Archive’s Community Webs program at the Community Webs National Symposium
This event was made possible in part by support from the Mellon Foundation. Held at the Museum of Contemporary Art Chicago, this yearโs symposium was an opportunity for members to learn together and connect with each other. The day was organized around two workshops designed to support the community archiving and digital preservation work happening across Community Webs member institutions.
The first workshop, โCollective Wisdom: Collaborative Learning to Support Your Community Archiving Projects,โ was taught by Natalie Milbrodt, CUNY University Archivist and co-founder of the Queens Memory Project. Attendees spent time working in small groups to create definitions of โCommunity Archivingโ and reflect on some of the shared challenges and opportunities they were experiencing when engaging in community-centered work. This workshop emphasized the value of the collective wisdom of Community Webs members and will inform future educational opportunities. The community archives focus of this workshop also supported the Community Webs Affiliates Program, which encourages relationship-building among public libraries and other community-focused cultural heritage and social service organizations to broaden access to archiving tools for documenting the lives of their patrons.
Attendees work together to discuss strategies for documenting their communities
In the second half of the day, Stacey Erdman and Jaime Schumacher of Digital POWRR led a โWalk the Workflowโ workshop which demonstrated a step-by-step digital preservation process using a variety of free preservation tools including Internet Archiveโs Vault digital preservation system.
A main goal for the symposium was to provide an opportunity for Community Webs members to connect and learn from each other. Throughout the day, attendees discussed projects, shared ideas, described lessons learned, and brainstormed possible avenues for future collaboration.
A digital preservation workshop provided attendees with strategies for supporting long term preservation of digital collections
The following day, Community Webs members toured the Chicago Public Library Special Collections. Johanna Russ, Unit Head for Special Collections, gave a presentation about the complex, multi-year project CPL undertook to preserve and provide access to the records of the Chicago Park District. Highlights from this collection were available for attendees to view in the reading room.
That afternoon, the Archive-It Partner Meeting provided opportunities for Community Webs members and other Archive-It users to spend some time with Internet Archive staff to discuss topics such as strategies for capturing social media and making web archives more useful.
Community Webs members view highlights from Chicago Public Library’s special collections
In-person events like this are instrumental in achieving a key goal of the program: offering opportunities for networking and professional development for Community Webs members. Internet Archiveโs support for this national network of practitioners empowers their work on a local level to preserve and provide access to digital heritage sources reflecting the unique life and culture of their communities.
Partners on the NEH supported, Increasing Access to Diverse Public Library Local History Collections
Since 2017, Community Webs has partnered with public libraries and heritage organizations to document and diversify the historical record. These organizations have collectively archived over 100 terabytes of web-based community heritage materials, including more than 800 collections documenting the lives of those often underrepresented in history. In 2023, Community Webs began offering collection digitization and access with support from theย National Historical Publications and Records Commission (NHPRC). Today, Community Webs is happy to announce $345,000 in additional support from the National Endowment for the Humanities to digitize and provide open access to more than 411,000 local history collection items from seven Community Webs partners: Athens-Clarke County Library, Belen Public Library, District of Columbia Public Library, Evanston History Center, Jersey City Free Public Library, San Francisco Public Library, and William B. Harlan Memorial Library.ย
Community Webs partner collections include a diverse range of content from across the country representing the life of immigrants, Black, and minority communities throughout US history. This includes records created by and for them, such as the Julius Hobson Papers from District of Columbia Public Library, the Belen Harvey House Collection from Belen Public Library, and the Local and Regional Family Histories collection from the William B. Harlan Memorial Library.
ACE Newsletter, Vol. 1, No. 3, Julius Hobson Papers on Federal Job Discrimination (source)
The collections also contain items that document city and municipal agencies that significantly impact minority communities. Digitization of this material will produce a deeper understanding of how systems of power and legal structures can regulate or even erase minority community histories, especially in regards to housing and economic opportunities. For example, the Athens City Engineer Records from Athens-Clarke County Library, the African American Housing and History collection from Evanston History Center, and the San Francisco Redevelopment Agency Records from San Francisco Public Library show the impact of urban redevelopment on Black and minority neighborhoods. The Municipal Records and agency scrapbooks from Jersey City Free Public Library show the ways that politics and economic changes impacted immigrant and minority communities.
Ashley Shull, Collections Coordinator, Athens-Clarke County Library shares what this project means to the community:
โThe opportunity to be involved in a project proposal like this with the Internet Archive and our other library partners is invaluable to our community. The increased access to our Athens City Engineer collection will provide, not only local citizens, but academic researchers from around the world as well as current Athens-Clarke County Government officials insight into the past planning activities of our community. This is especially important as our local government embarks on a new Comprehensive Community Plan.โ
John Beekman, Chief Librarian, Jersey City Free Public Library, also emphasized the impact of access to important city records:
โThe Jersey City Free Public Library is honored to work with esteemed libraries from across the country on this innovative project spearheaded by the Internet Archive’s Community Webs program. The municipal minutes and records that make up the bulk of our contribution contain a wealth of information, not only on the workings of city government and agencies, but the people whose work is recorded there. Names and activities present in these records that never made the news will now be discoverable through search rather than the needle-in-a-haystack experience of poring over individual volumes of minutes. Making these materials accessible will provide a tool for enriching the record of city life across the 19th and 20th centuries.โ
Hunters Point housing phase one map with unit totals, an Francisco Redevelopment Agency Records. Hunters Point Project Area A. Photographs (source)
The Community Webs programโs core goals are to increase the diversity of voices represented in the accessible historical record and to forge authentic partnerships between public libraries and heritage organizations that are members of Community Webs and the communities, individuals, and researchers they serve. Digitizing these collections will expand the overall amount and diversity of locally-focused community archives available online to users, and will augment the web and digital collections that are already aggregated by Community Webs. Records will also be shared with the Digital Public Library of America, further strengthening collection discovery.ย
Learn more about Community Webs members, projects, and collections on our blog. Get in touch with us at commwebs@archive.org to discover ways to partner to preserve local history!
Started in 2017, our Community Webs program has over 175 public libraries and local cultural organizations working to build digital archives documenting the experiences of their communities, especially those patrons often underrepresented in traditional archives. Participating public libraries have created over 1,400 collections documenting local civic life totaling nearly 100 terabytes and tens of millions of individual documents, images, audio/video files, blogs, websites, social media, and more. You can browse many of these collections at the Community Webs website. Participants have also collaborated on digitization efforts to bring minority newspapers online, held public programming and outreach events, and formed local partnerships to help preservation efforts at other mission-aligned organizations. The program has conducted numerous workshops and national symposia to help public librarians gain expertise in digital preservation and cohort members have done dozens of presentations at professional conferences showcasing their work. In the past, Community Webs has received support from the Institute of Museum and Library Services, the Mellon Foundation, the Kahle Austin Foundation, and the National Historical Publications and Records Commission.
We are excited to announce that Community Webs has received $750,000 in funding from The Mellon Foundation to continue expanding the program. The award will allow additional public libraries to join the program and will enable new and existing members to continue their web archiving collection building using our Archive-It service. In addition, the funding will also provide members access to Internet Archiveโs new Vault digital preservation service, enabling them to build and preserve collections of any type of digital materials. Lastly, leveraging membersโ prior success in local partnerships, Community Webs will now include an โAffiliatesโ program so member public libraries can nominate local nonprofit partners that can also receive access to archiving services and resources. Funding will also support the continuation of the programโs professional development training in digital preservation and community archiving and its overall cohort and community building activities of workshops, events, and symposia.
We thank The Andrew W. Mellon Foundation for their generous support of Community Webs. We are excited to continue to expand the program and empower hundreds of public librarians to build archives that document the voices, lives, and events of their communities and to ensure this material is permanently available to patrons, students, scholars, and citizens.
Last summer, Internet Archive launched ARCH (Archives Research Compute Hub), a research service that supports creation, computational analysis, sharing, and preservation of research datasets from terabytes and even petabytes of data from digital collections – with an initial focus on web archive collections. In line with Internet Archiveโs mission to provide โuniversal access to all knowledgeโ we aim to make ARCH as universally accessible as possible.ย
Computational research and education cannot remain solely accessible to the world’s most well-resourced organizations. ย With philanthropic support, Internet Archive is initiating Advancing Inclusive Computational Research with ARCH, a pilot program specifically designed to support an initial cohort of five less well-resourced organizations throughout the world.ย
Opportunity
Organizational access to ARCH for 1 year – supporting research teams, pedagogical efforts, and/or library, archive, and museum worker experimentation.ย ย
Access to thousands of curated web archive collections – abundant thematic range with potential to drive multidisciplinary research and education.ย
Enhanced Internet Archive training and support – expert synchronous and asynchronous support from Internet Archive staff.
Cohort experience – opportunities to share challenges and successes with a supportive group of peers.
Eligibility
Demonstrated need-based rationale for participation in Advancing Inclusive Computational Research with Archives Research Compute Hub: we will take a number of factors into consideration, including but not limited to stated organizational resources relative to peer organizations, ongoing experience contending with historic and contemporary inequities, as well as levels of national development as assessed by the United Nations Least Developed Countries effort and Human Development Index.ย
Organization type: universities, research institutes, libraries, archives, museums, government offices, non-governmental organizations.
This is a guest post from Teresa Soleau (Digital Preservation Manager), Anders Pollack (Software Engineer), and Neal Johnson (Senior IT Project Manager) from the J. Paul Getty Trust.
Project Background
Getty pursues its mission in Los Angeles and around the world through the work of its constituent programsโGetty Conservation Institute, Getty Foundation, J. Paul Getty Museum, and Getty Research Instituteโserving the general interested public and a wide range of professional communities to promote a vital civil society through an understanding of the visual arts.
In 2019, Getty began a website redesign project, changing the technology stack and updating the way we interact with our communities online. The legacy website contained more than 19,000 web pages and we knew many were no longer useful or relevant and should be retired, possibly after being archived. This led us to leverage the content weโd captured using the Internet Archiveโs Archive-It service.
Weโd been crawling our site since 2017, but had treated the results more as a record of institutional change over time than as an archival resource to be consulted after deletion of a page. We needed to direct traffic to our Wayback Machine captures thus ensuring deleted pages remain accessible when a user requests a deprecated URL. We decided to dynamically display a link to the archived page from our siteโs 404 error โPage not foundโ page.
Getty.edu 404 error โPage not foundโ message including the dynamically generated instructions and Internet Archive page link.
The project to audit all existing pages required us to educate content owners across the institution about web archiving practices and purpose. We developed processes for completing human reviews of large amounts of captured content. This work is described in more detail in a 2021 Digital Preservation Coalition blog post that mentions the Web Archives Collecting Policy we developed.
In this blog post weโll discuss the work required to use the Internet Archiveโs data API to add the necessary link on our 404 pages pointing to the most recent Wayback Machine capture of a deleted page.
Technical Underpinnings
Implementation of our Wayback Machine integration was very straightforward from a technical point of view. The first example provided in the Wayback Machine APIs documentation page provided the technical guidance needed for our use case to display a link to the most recent capture of any page deleted from our website. With no requirements for authentication or management of keys or platform-specific software development kit (SDK) dependencies, our development process was simplified. We chose to incorporate the Wayback API using Nuxt.js, the web framework used to build the new Getty.edu site.
Since the Wayback Machine API is highly performant for simple queries, with a typical response delay in milliseconds, we are able to query the API before rendering the page using a Nuxt route middleware module. API error handling and a request timeout were added to ensure that edge cases such as API failures or network timeouts do not block rendering of the 404 response page.
The only Internet Archive API feature missing for our initial list of requirements was access to snapshot page thumbnails in the JSON data payload received from the API. Access to these images would allow us to enhance our 404 page with a visual cue of archived page content.
Results and Next Steps
Our ability to include a link to an archived version of a deleted web page on our 404 response page helped ease the tough decisions content stakeholders were obliged to make about what content to archive and then delete from the website. We could guarantee availability of content in perpetuity without incurring the long term cost of maintaining the information ourselves.
The API brings back the most recent Wayback Machine capture by default which is sometimes not created by us and hasnโt necessarily passed through our archive quality assurance process. We intend to develop our application further so that we privilege the display of Gettyโs own page captures. This will ensure weโre delivering the highest quality capture to users.
Google Analytics has been configured to report on traffic to our 404 pages and will track clicks on links pointing to Internet Archive pages, providing useful feedback on what portion of archived page traffic is referred from our 404 error page.
To work around the challenge of providing navigational affordances to legacy content and ensure web page titles of old content remains accessible to search engines, we intend to provide an up-to-date index of all archived getty.edu pages.
As we continue to retire obsolete website pages and complete this monumental content archiving and retirement effort, weโre grateful for the Internet Archive API which supports our goal of making archived content accessible in perpetuity.
In June, we announced the official launch of Archives Research Compute Hub (ARCH) our platform for supporting computational research with digital collections. The Archiving & Data Services group at IA has long provided computational research services via collaborations, dataset services, product features, and other partnerships and software development. In 2020, in partnership with our close collaborators at the Archives Unleashed project, and with funding from the Mellon Foundation, we pursued cooperative technical and community work to make text and data mining services available to any institution building, or researcher using, archival web collections. This led to the release of ARCH, with more than 35 libraries and 60 researchers and curators participating in beta testing and early product pilots. Additional work supported expanding the community of scholars doing computational research using contemporary web collections by providing technical and research support to multi-institutional research teams.
We are pleased to announce that ARCH recently received funding from the Institute of Museum and Library Services (IMLS), via their National Leadership Grants program, supporting ARCH expansion. The project, โExpanding ARCH: Equitable Access to Text and Data Mining Services,โ entails two broad areas of work. First, the project will create user-informed workflows and conduct software development that enables a diverse set of partner libraries, archives, and museums to add digital collections of any format (e.g., image collections, text collections) to ARCH for users to study via computational analysis. Working with these partners will help ensure that ARCH can support the needs of organizations of any size that aim to make their digital collections available in new ways. Second, the project will work with librarians and scholars to expand the number and types of data analysis jobs and resulting datasets and data visualizations that can be created using ARCH, including allowing users to build custom research collections that are aggregated from the digital collections of multiple institutions. Expanding the ability for scholars to create aggregated collections and run new data analysis jobs, potentially including artificial intelligence tools, will enable ARCH to significantly increase the type, diversity, scope, and scale of research it supports.
Collaborators on the Expanding ARCH project include a set of institutional partners that will be closely involved in guiding functional requirements, testing designs, and using the newly-built features intended to augment researcher support. Primary institutional partners include University of Denver, University of North Carolina at Chapel Hill, Williams College Museum of Art, and Indianapolis Museum of Art, with additional institutional partners joining in the projectโs second year.
Thousands of libraries, archives, museums, and memory organizations work with Internet Archive to build and make openly accessible digitized and born-digital collections. Making these collections available to as many users in as many ways as possible is critical to providing access to knowledge. We are thankful to IMLS for providing the financial support that allows us to expand the ARCH platform to empower new and emerging types of access and research.