Category Archives: Web & Data Services

Columbus Neighborhood Newspapers Showcase the City’s Diverse Communities

The following guest post from Aaron O’Donovan (aodonovan@columbuslibrary.org), Columbus Metropolitan Library Special Collections Manager, is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. 

As a local history and genealogy department in a public library, our materials run the gamut from books from the 1700s about the creation of our country to yearbooks of local high schools that patrons like to peruse for nostalgia’s sake. In addition to our approximately 90,000 reference books, our archives room holds approximately 2,500 linear feet of photographic material, records, and manuscript material. We are constantly seeking new opportunities to expand access to our collections for our patrons, and when the opportunity arose to digitize materials as part of the Community Webs program, I knew what I wanted to digitize first: local neighborhood newspapers of Columbus.  

Ohio Columbian, February 24, 1853

We joined the Community Webs program in 2017 to archive important cultural and local government websites of Columbus, Ohio. The catalyst for the project was the belief that we had done a good job of telling the story of Columbus in its first 150 years, but we were missing telling the story of the evolution of the city of the more recent past, as well as failing to record the present. With the object of capturing more recent changes to our city, we focused on archiving our city government website, as well as archiving social service websites, especially those helping new immigrants in our city. Because of the Community Webs program, we were able to take a snapshot of the diverse populations that were making their homes in Columbus, and the medium of web archiving was the only way we were able to tell the stories of these new immigrant communities including the Somalian, Nepalese, Bhutanese, and Mexican populations. To further this focus on migration patterns into Columbus, we felt it was important to make our neighborhood newspapers that we had on microfilm accessible because the neighborhood newspapers featured stories and obituaries on immigrant populations who came to Columbus in the mid-19th century and early 20th century.

The newspapers had been preserved on microfilm for decades, but we were never able to digitize them due the time commitment involved for a project that size. During my time in the local history field in Columbus, it has become clear to me that our library patrons crave hyper-local history material that personally connects their stories to the place they live. While general local history topics about Columbus are popular, nothing is more popular in our library than content generated from Columbus neighborhoods. To finally get an opportunity to digitize neighborhood newspapers and make them accessible to our patrons was one that I could not pass up.

Columbus Call and Post, May 3, 1975

The most important newspaper for the library to digitize was the Columbus Call and Post, a historic Black newspaper that served Columbus from 1962-2007. For years patrons have asked us if the newspaper was digitized, but unfortunately all the library had was microfilm starting in 1972, which was very difficult to browse and ultimately did not serve our patron’s needs for accessibility. Because the Internet Archive performed optical character recognition (OCR) on the text of the newspapers, researchers can now use keyword searching to find an address, a business name, or search for personal names to find news stories that mention the people and places that they hold in their memory.

Digitizing the microfilm of the Call and Post also complemented another project we began several years ago when we partnered with the King Arts Complex to digitize the photograph archive of the iconic newspaper, which was donated to the organization in the mid-1990s. Many of the photographs in the collection have little to no information attached to them (information written on the back of the photographic prints, the name of the photographer, etc.). Digitization of the Call and Post provided additional information to match and apply to the photographs in the archive, adding an enhanced level of searchability and accessibility to this collection. The collections work together to preserve Black history in a way that was not possible before because much of the content from the Call and Post was unique and rare. Being able to bring this newspaper back into the public consciousness has been a thrilling experience for us.

Congressman Adam Clayton Powell Jr. in Columbus, Columbus Call and Post Photograph Collection

As the project continued to take shape, we felt it was important to represent Columbus neighborhoods geographically, which also enabled us to represent different economic and ethnically diverse communities throughout Columbus history. Our most accessed newspaper thus far has been the Hilltop Record, a title which focused on a local neighborhood with strong Appalachian ties and has a long history of covering the issues of working-class citizens on the westside of Columbus. Other digitized community newspapers include :

· The Eastern Spectator and Eastern Review offers perspectives from the city’s Jewish community.

· The Southside Booster and Southside Leader, shares the industrial and union history of Columbus.

· The Linden NE News showcases stories from north Columbus, an area that has experienced several demographic shifts throughout its 100 years of history.

Hilltop Record, November 8, 1928

The rarest newspapers digitized for this project were also some of the oldest newspapers that were preserved on microfilm in our collection. Among those titles are the Ohio Columbian (1853-1856), an anti-slavery newspaper that reported on Underground Railroad activities as they were happening in Ohio and surrounding states. It has potential for illuminating our understanding and knowledge of individuals that were involved in assisting enslaved people seeking freedom in the 1850s.  Other newspapers with great research potential include early (and shorter) runs of Black newspapers that have not been digitized before this project including The Columbus Recorder (1927), The Columbus Voice (1929), which was edited by Florence W. Oakfield, and The Ohio Torch (1928-1930), the longest running newspaper for the Black community during the 1920s. We are excited to report that researchers are already using these resources to better understand Columbus history more objectively and completely.

With this support from the Internet Archive and the National Historical Publications and Records Commission, we have been able to help our local users find information that was not available elsewhere. Recently, we had a researcher request an obituary from June of 1964 when our two major newspapers were on strike. Thankfully, the South Side Spectator had been digitized and was available through the Internet Archive. Our librarian was able to locate the obituary that was only available in that newspaper. We also got this enthusiastic email from a regular library patron after we informed them that we had digitized the Hilltop Record and it was now keyword searchable on the Internet Archive: “OH MY GOSH! ARE YOU SERIOUS!?! THAT’S FANTASTIC! Have I told you lately how much I love you guys? You rock my world! Thank you so much for everything you do. I am so grateful for everyone in Local History & Genealogy.”

Moreover, the librarians are using the digitized newspapers in regular programming, furthering our promotion of these new digitized collections. Every month the library hosts a virtual Black Heritage Collection Spotlight on a notable person or topic from Black history in Columbus. The images and news articles from the digitized Call and Post are used frequently for the program, and we look forward to learning about more ways the digitized newspapers are used in local research to highlight and deepen our community’s connections to Columbus’ past. 

Browse the Columbus Neighborhood Newspapers Collection on archive.org.


The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Internet Archive Services Update: 2024-10-17

[Washinton Post piece]

Last week, along with a DDOS attack and exposure of patron email addresses and encrypted passwords, the Internet Archive’s website javascript was defaced, leading us to bring the site down to access and improve our security. 

The stored data of the Internet Archive is safe and we are working on resuming services safely. This new reality requires heightened attention to cyber security and we are responding. We apologize for the impact of these library services being unavailable.

The Wayback Machine, Archive-It, scanning, and national library crawls have resumed, as well as email, blog, helpdesk, and social media communications.  Our team is working around the clock across time zones to bring other services back online. In coming days more services will resume, some starting in read-only mode as full restoration will take more time. 

We’re taking a cautious, deliberate approach to rebuild and strengthen our defenses. Our priority is ensuring the Internet Archive comes online stronger and more secure.

As a library community, we are seeing other cyber attacks—for instance the British Library, Seattle Public Library, Toronto Public Library, and now Calgary Public Library. We hope these attacks are not indicative of a trend.

For the latest updates, please check this blog and our official social media accounts: X/Twitter, Bluesky and Mastodon.

Thank you for your patience and ongoing support.

Illuminating the Stories of Brooklynites Through Digitized Directories

The following guest post from Dee Bowers (they/them), Archives Manager at the Brooklyn Public Library Center for Brooklyn History, is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices.

Some say as many as one in seven Americans have family roots in Brooklyn, and I expect the newly digitized Brooklyn city directories now available through the Internet Archive will get heavy use from genealogists, historians, authors, journalists, students, and even artists to trace connections to the diverse and ever-changing borough.

Black and white two-page spread of directory title page including map of Brooklyn.
Title page, Spooner’s Brooklyn Directory 1822. Brooklyn Public Library, Center for Brooklyn History.

What is now the Center for Brooklyn History first joined the Internet Archive’s Community Webs program in 2017 as part of the original cohort. This program gave us the tools and training we needed to save over 2TB of web-based Brooklyn history content, including over 1,000 individual URLs. We also host our digitized high school newspapers and audiovisual material on the Internet Archive.

In addition to helping us preserve this web-based content, Community Webs has now also made it possible to increase access to our physical collections through digitization. As part of the Collaborative Access to Diverse Public Library Local History Collections project, made possible by a grant from the National Historical Publications and Records Commission, we were able to partner with the Internet Archive to digitize 236 microfiche sheets of Brooklyn city directories. 

Microfiche sheet from the Brooklyn city directories, 1822. Brooklyn Public Library, Center for Brooklyn History.

These directories show the movement, growth, and changing nature of immigrant populations in Brooklyn in the early to mid 19th century and help document the immigrant experience by providing data on the residency and, in some cases, ethnicities of Brooklynites over time. We knew that expanding digital access would be extremely useful to the many researchers who use our online resources, especially since our number one research topic is genealogy. The project is also directly in line with our mission:

Democratize access to Brooklyn’s history and be dedicated to expanding and diversifying representation of the history of the borough by unifying resources and expertise, and broadening reach and impact.

By increasing the visibility of these collections through digitization and freely available public access, researchers and historians will have a richer, more accessible view into the diversity of American history. The history of Brooklyn is extraordinarily diverse but, like many archives, our collections don’t always tell the fullness of those stories. By expanding access to our city directories, we provide insight into earlier residents of Brooklyn and enable diverse communities to trace their Brooklyn roots to a greater degree.

Screenshot of digitized directory page in Internet Archive viewer.
Screenshot of the early Brooklyn directories in the Internet Archive.

Here’s an example of how the directories look in the Internet Archive. In this screenshot above, they include content outside of just directory listings. In this case, there’s a chronological listing of “memoranda” – notable moments in Brooklyn history – including “June 11, 1812 – News received in Brooklyn, of the declaration of war between the United States and Great Britain.”

One example of research that can be conducted with these directories is finding out more about early Black Brooklynites. Slavery was abolished in New York State in 1827, so the earliest days of post-enslavement Brooklyn are represented in the digitized directories.

Screenshot of digitized directory page in Internet Archive viewer with the purple highlighted surname “Hodges.”
Screenshot of 1857 directory on the Internet Archive with the highlighted surname “Hodges.”

By searching the text of the directories using keywords, I picked out an individual to learn more about, Rev. William J. Hodges, who lived on Broadway in Brooklyn in 1857. By cross-referencing with our digitized newspapers, I was able to find out more about him and his abolitionist activism in Brooklyn and beyond. It turns out he was not born in Brooklyn, nor did he reside there very long, but he did make an impact during his time there, as he founded the Colored Political Association of Kings County (which is the modern-day borough of Brooklyn).

Black and white newspaper clipping describing a “colored indignation meeting” in which William Hodges took part.
“Local Items,” June 5 1856, Brooklyn Times Union, page 2.

If not for the digitized city directories, I doubt I ever would have learned of Rev. Hodges and his time in Brooklyn. I hope that many more stories like these will emerge once researchers start digging into these directories.

Black and white image of buildings on a tree-lined street with information about T. Reeve, architect.
Directory advertisement for T. Reeve, Architect and Builder.

The directories also contain items like this – an advertisement showing this architect and builder’s office on Schermerhorn Street in Downtown Brooklyn. This part of Brooklyn looks very different now, and this insight into what it looked like pre-photography is invaluable, particularly for people conducting house, building, and neighborhood research.

The directories are linked on our Search Our Collections page. We also have a tutorial for using the digitized directories. Additionally, we have several related research guides which assist researchers in exploring various topics. These materials are in the public domain, and we hope they will be used for a broad spectrum of applications, from family research to demographic research to writing to artwork. We are grateful to Community Webs, the Internet Archive, and the National Historical Publications and Records Commission for making this material available and searchable online and allowing us to expand access across the borough, city, and beyond.

Browse the Brooklyn City Directories on archive.org.


The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Public Libraries Meet to Advance Community Archiving

On August 13, Community Webs members from all over the US and Canada gathered in Chicago for the 2024 Community Webs National Symposium. Launched in 2017, Internet Archive’s Community Webs program empowers public libraries and other cultural heritage organizations to document their communities. Members of the program receive access to Internet Archive’s Archive-It web archiving service and Vault digital preservation service as well as training, technical support, and opportunities for professional development.

Members of Internet Archive’s Community Webs program at the Community Webs National Symposium


This event was made possible in part by support from the Mellon Foundation. Held at the Museum of Contemporary Art Chicago, this year’s symposium was an opportunity for members to learn together and connect with each other. The day was organized around two workshops designed to support the community archiving and digital preservation work happening across Community Webs member institutions.

The first workshop, “Collective Wisdom: Collaborative Learning to Support Your Community Archiving Projects,” was taught by Natalie Milbrodt, CUNY University Archivist and co-founder of the Queens Memory Project. Attendees spent time working in small groups to create definitions of “Community Archiving” and reflect on some of the shared challenges and opportunities they were experiencing when engaging in community-centered work. This workshop  emphasized the value of the collective wisdom of Community Webs members and will inform future educational opportunities. The community archives focus of this workshop also supported  the Community Webs Affiliates Program, which encourages relationship-building among public libraries and other community-focused cultural heritage and social service organizations to broaden access to archiving tools for documenting the lives of their patrons.

Attendees work together to discuss strategies for documenting their communities

In the second half of the day, Stacey Erdman and Jaime Schumacher of Digital POWRR led a “Walk the Workflow” workshop which demonstrated a step-by-step digital preservation process using a variety of free preservation tools including  Internet Archive’s Vault digital preservation system.

A main goal for the symposium was to provide an opportunity for Community Webs members to connect and learn from each other. Throughout the day, attendees discussed projects, shared ideas, described lessons learned, and brainstormed possible avenues for future collaboration.

A digital preservation workshop provided attendees with strategies for supporting long term preservation of digital collections

The following day, Community Webs members toured the Chicago Public Library Special Collections. Johanna Russ, Unit Head for Special Collections, gave a presentation about the complex, multi-year project CPL undertook to preserve and provide access to the records of the Chicago Park District. Highlights from this collection were available for attendees to view in the reading room.

That afternoon, the Archive-It Partner Meeting provided opportunities for Community Webs members and other Archive-It users to spend some time with Internet Archive staff to discuss topics such as strategies for capturing social media and making web archives more useful. 

Community Webs members view highlights from Chicago Public Library’s special collections

In-person events like this are instrumental in achieving a key goal of the program: offering opportunities for networking and professional development for Community Webs members. Internet Archive’s support for this national network of practitioners empowers their work on a local level to preserve and provide access to digital heritage sources reflecting the unique life and culture of their communities.

Interested in learning more about Community Webs? Explore Community Webs collections, read the latest program news, or apply to join!

Diversifying Access to the Local Historical Record with Community Webs

Community Webs partners on the NEH supported,  Increasing Access to Diverse Public Library Local History Collections
Partners on the NEH supported, Increasing Access to Diverse Public Library Local History Collections

Since 2017, Community Webs has partnered with public libraries and heritage organizations to document and diversify the historical record. These organizations have collectively archived over 100 terabytes of web-based community heritage materials, including more than 800 collections documenting the lives of those often underrepresented in history. In 2023, Community Webs began offering collection digitization and access with support from the  National Historical Publications and Records Commission (NHPRC). Today, Community Webs is happy to announce $345,000 in additional support from the National Endowment for the Humanities to digitize and provide open access to more than 411,000 local history collection items from seven Community Webs partners: Athens-Clarke County Library, Belen Public Library, District of Columbia Public Library, Evanston History Center, Jersey City Free Public Library, San Francisco Public Library, and William B. Harlan Memorial Library. 

Community Webs partner collections include a diverse range of content from across the country representing the life of immigrants, Black, and minority communities throughout US history. This includes records created by and for them, such as the Julius Hobson Papers from District of Columbia Public Library, the Belen Harvey House Collection from Belen Public Library, and the Local and Regional Family Histories collection from the William B. Harlan Memorial Library. 

ACE Newsletter, Vol. 1, No. 3, Julius Hobson Papers on Federal Job Discrimination
ACE Newsletter, Vol. 1, No. 3, Julius Hobson Papers on Federal Job Discrimination (source)

The collections also contain items that document city and municipal agencies that significantly impact minority communities. Digitization of this material will produce a deeper understanding of how systems of power and legal structures can regulate or even erase minority community histories, especially in regards to housing and economic opportunities. For example, the Athens City Engineer Records from Athens-Clarke County Library, the African American Housing and History collection from Evanston History Center, and the San Francisco Redevelopment Agency Records from San Francisco Public Library show the impact of urban redevelopment on Black and minority neighborhoods. The Municipal Records and agency scrapbooks from Jersey City Free Public Library show the ways that politics and economic changes impacted immigrant and minority communities. 

Ashley Shull, Collections Coordinator, Athens-Clarke County Library shares what this project means to the community:

“The opportunity to be involved in a project proposal like this with the Internet Archive and our other library partners is invaluable to our community. The increased access to our Athens City Engineer collection will provide, not only local citizens, but academic researchers from around the world as well as current Athens-Clarke County Government officials insight into the past planning activities of our community. This is especially important as our local government embarks on a new Comprehensive Community Plan.”

John Beekman, Chief Librarian, Jersey City Free Public Library, also emphasized the impact of access to important city records:

“The Jersey City Free Public Library is honored to work with esteemed libraries from across the country on this innovative project spearheaded by the Internet Archive’s Community Webs program. The municipal minutes and records that make up the bulk of our contribution contain a wealth of information, not only on the workings of city government and agencies, but the people whose work is recorded there. Names and activities present in these records that never made the news will now be discoverable through search rather than the needle-in-a-haystack experience of poring over individual volumes of minutes. Making these materials accessible will provide a tool for enriching the record of city life across the 19th and 20th centuries.”

Hunters Point housing phase one map with unit totals, an Francisco Redevelopment Agency Records. Hunters Point Project Area A. Photograph
Hunters Point housing phase one map with unit totals, an Francisco Redevelopment Agency Records. Hunters Point Project Area A. Photographs (source)

The Community Webs program’s core goals are to increase the diversity of voices represented in the accessible historical record and to forge authentic partnerships between public libraries and heritage organizations that are members of Community Webs and the communities, individuals, and researchers they serve. Digitizing these collections will expand the overall amount and diversity of locally-focused community archives available online to users, and will augment the web and digital collections that are already aggregated by Community Webs. Records will also be shared with the Digital Public Library of America, further strengthening collection discovery. 


The Internet Archive and Community Webs are thankful for support from the National Endowment for the Humanities.

Learn more about Community Webs members, projects, and collections on our blog. Get in touch with us at commwebs@archive.org to discover ways to partner to preserve local history!

Community Webs Receives $750,000 Grant to Expand Community Archiving by Public Libraries

Started in 2017, our Community Webs program has over 175 public libraries and local cultural organizations working to build digital archives documenting the experiences of their communities, especially those patrons often underrepresented in traditional archives. Participating public libraries have created over 1,400 collections documenting local civic life totaling nearly 100 terabytes and tens of millions of individual documents, images, audio/video files, blogs, websites, social media, and more. You can browse many of these collections at the Community Webs website. Participants have also collaborated on digitization efforts to bring minority newspapers online, held public programming and outreach events, and formed local partnerships to help preservation efforts at other mission-aligned organizations. The program has conducted numerous workshops and national symposia to help public librarians gain expertise in digital preservation and cohort members have done dozens of presentations at professional conferences showcasing their work. In the past, Community Webs has received support from the Institute of Museum and Library Services, the Mellon Foundation, the Kahle Austin Foundation, and the National Historical Publications and Records Commission.

We are excited to announce that Community Webs has received $750,000 in funding from The Mellon Foundation to continue expanding the program. The award will allow additional public libraries to join the program and will enable new and existing members to continue their web archiving collection building using our Archive-It service. In addition, the funding will also provide members access to Internet Archive’s new Vault digital preservation service, enabling them to build and preserve collections of any type of digital materials. Lastly, leveraging members’ prior success in local partnerships, Community Webs will now include an “Affiliates” program so member public libraries can nominate local nonprofit partners that can also receive access to archiving services and resources. Funding will also support the continuation of the program’s professional development training in digital preservation and community archiving and its overall cohort and community building activities of workshops, events, and symposia.

We thank The Andrew W. Mellon Foundation for their generous support of Community Webs. We are excited to continue to expand the program and empower hundreds of public librarians to build archives that document the voices, lives, and events of their communities and to ensure this material is permanently available to patrons, students, scholars, and citizens.

Call for Proposals: Advancing Inclusive Computational Research with Archives Research Compute Hub

Last summer, Internet Archive launched ARCH (Archives Research Compute Hub), a research service that supports creation, computational analysis, sharing, and preservation of research datasets from terabytes and even petabytes of data from digital collections – with an initial focus on web archive collections. In line with Internet Archive’s mission to provide “universal access to all knowledge” we aim to make ARCH as universally accessible as possible. 

Computational research and education cannot remain solely accessible to the world’s most well-resourced organizations.  With philanthropic support, Internet Archive is initiating Advancing Inclusive Computational Research with ARCH, a pilot program specifically designed to support an initial cohort of five less well-resourced organizations throughout the world. 

Opportunity

  • Organizational access to ARCH for 1 year – supporting research teams, pedagogical efforts, and/or library, archive, and museum worker experimentation.  
  • Access to thousands of curated web archive collections – abundant thematic range with potential to drive multidisciplinary research and education. 
  • Enhanced Internet Archive training and support – expert synchronous and asynchronous support from Internet Archive staff. 
  • Cohort experience – opportunities to share challenges and successes with a supportive group of peers. 

Eligibility

  • Demonstrated need-based rationale for participation in Advancing Inclusive Computational Research with Archives Research Compute Hub: we will take a number of factors into consideration, including but not limited to stated organizational resources relative to peer organizations, ongoing experience contending with historic and contemporary inequities, as well as levels of national development as assessed by the United Nations Least Developed Countries effort and Human Development Index
  • Organization type: universities, research institutes, libraries, archives, museums, government offices, non-governmental organizations. 

Apply

Submission deadline: 2/26/2024

Decisions communicated to applications: 3/11/2024

Program begins: 3/25/2024

Apply here. 

Moving Getty.edu “404-ward” With Help From The Internet Archive API

This is a guest post from Teresa Soleau (Digital Preservation Manager), Anders Pollack (Software Engineer), and Neal Johnson (Senior IT Project Manager) from the J. Paul Getty Trust.

Project Background

Getty pursues its mission in Los Angeles and around the world through the work of its constituent programs—Getty Conservation Institute, Getty Foundation, J. Paul Getty Museum, and Getty Research Institute—serving the general interested public and a wide range of professional communities to promote a vital civil society through an understanding of the visual arts. 

In 2019, Getty began a website redesign project, changing the technology stack and updating the way we interact with our communities online. The legacy website contained more than 19,000 web pages and we knew many were no longer useful or relevant and should be retired, possibly after being archived. This led us to leverage the content we’d captured using the Internet Archive’s Archive-It service.

We’d been crawling our site since 2017, but had treated the results more as a record of institutional change over time than as an archival resource to be consulted after deletion of a page. We needed to direct traffic to our Wayback Machine captures thus ensuring deleted pages remain accessible when a user requests a deprecated URL. We decided to dynamically display a link to the archived page from our site’s 404 error “Page not found” page.

Getty.edy 404 page
Getty.edu 404 error “Page not found” message including the dynamically generated instructions and Internet Archive page link.

The project to audit all existing pages required us to educate content owners across the institution about web archiving practices and purpose. We developed processes for completing human reviews of large amounts of captured content. This work is described in more detail in a 2021 Digital Preservation Coalition blog post that mentions the Web Archives Collecting Policy we developed.

In this blog post we’ll discuss the work required to use the Internet Archive’s data API to add the necessary link on our 404 pages pointing to the most recent Wayback Machine capture of a deleted page.

Technical Underpinnings

getty workflow

Implementation of our Wayback Machine integration was very straightforward from a technical point of view. The first example provided in the Wayback Machine APIs documentation page provided the technical guidance needed for our use case to display a link to the most recent capture of any page deleted from our website. With no requirements for authentication or management of keys or platform-specific software development kit (SDK) dependencies, our development process was simplified. We chose to incorporate the Wayback API using Nuxt.js, the web framework used to build the new Getty.edu site.

Since the Wayback Machine API is highly performant for simple queries, with a typical response delay in milliseconds, we are able to query the API before rendering the page using a Nuxt route middleware module. API error handling and a request timeout were added to ensure that edge cases such as API failures or network timeouts do not block rendering of the 404 response page.

The only Internet Archive API feature missing for our initial list of requirements was access to snapshot page thumbnails in the JSON data payload received from the API. Access to these images would allow us to enhance our 404 page with a visual cue of archived page content.

Results and Next Steps

Our ability to include a link to an archived version of a deleted web page on our 404 response page helped ease the tough decisions content stakeholders were obliged to make about what content to archive and then delete from the website. We could guarantee availability of content in perpetuity without incurring the long term cost of maintaining the information ourselves.

The API brings back the most recent Wayback Machine capture by default which is sometimes not created by us and hasn’t necessarily passed through our archive quality assurance process. We intend to develop our application further so that we privilege the display of Getty’s own page captures. This will ensure we’re delivering the highest quality capture to users.

Google Analytics has been configured to report on traffic to our 404 pages and will track clicks on links pointing to Internet Archive pages, providing useful feedback on what portion of archived page traffic is referred from our 404 error page.

To work around the challenge of providing navigational affordances to legacy content and ensure web page titles of old content remains accessible to search engines, we intend to provide an up-to-date index of all archived getty.edu pages.

As we continue to retire obsolete website pages and complete this monumental content archiving and retirement effort, we’re grateful for the Internet Archive API which supports our goal of making archived content accessible in perpetuity.

IMLS National Leadership Grant Supports Expansion of the ARCH Computational Research Platform

In June, we announced the official launch of Archives Research Compute Hub (ARCH) our platform for supporting computational research with digital collections. The Archiving & Data Services group at IA has long provided computational research services via collaborations, dataset services, product features, and other partnerships and software development. In 2020, in partnership with our close collaborators at the Archives Unleashed project, and with funding from the Mellon Foundation, we pursued cooperative technical and community work to make text and data mining services available to any institution building, or researcher using, archival web collections. This led to the release of ARCH, with more than 35 libraries and 60 researchers and curators participating in beta testing and early product pilots. Additional work supported expanding the community of scholars doing computational research using contemporary web collections by providing technical and research support to multi-institutional research teams.

We are pleased to announce that ARCH recently received funding from the Institute of Museum and Library Services (IMLS), via their National Leadership Grants program, supporting ARCH expansion. The project, “Expanding ARCH: Equitable Access to Text and Data Mining Services,” entails two broad areas of work. First, the project will create user-informed workflows and conduct software development that enables a diverse set of partner libraries, archives, and museums to add digital collections of any format (e.g., image collections, text collections) to ARCH for users to study via computational analysis. Working with these partners will help ensure that ARCH can support the needs of organizations of any size that aim to make their digital collections available in new ways. Second, the project will work with librarians and scholars to expand the number and types of data analysis jobs and resulting datasets and data visualizations that can be created using ARCH, including allowing users to build custom research collections that are aggregated from the digital collections of multiple institutions. Expanding the ability for scholars to create aggregated collections and run new data analysis jobs, potentially including artificial intelligence tools, will enable ARCH to significantly increase the type, diversity, scope, and scale of research it supports.

Collaborators on the Expanding ARCH project include a set of institutional partners that will be closely involved in guiding functional requirements, testing designs, and using the newly-built features intended to augment researcher support. Primary institutional partners include University of Denver, University of North Carolina at Chapel Hill, Williams College Museum of Art, and Indianapolis Museum of Art, with additional institutional partners joining in the project’s second year.

Thousands of libraries, archives, museums, and memory organizations work with Internet Archive to build and make openly accessible digitized and born-digital collections. Making these collections available to as many users in as many ways as possible is critical to providing access to knowledge. We are thankful to IMLS for providing the financial support that allows us to expand the ARCH platform to empower new and emerging types of access and research.

Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)

We are excited to announce the public availability of ARCH (Archives Research Compute Hub), a new research and education service that helps users easily build, access, and analyze digital collections computationally at scale. ARCH represents a combination of the Internet Archive’s experience supporting computational research for more than a decade by providing large-scale data to researchers and dataset-oriented service integrations like ARS (Archive-it Research Services) and a collaboration with the Archives Unleashed project of the University of Waterloo and York University. Development of ARCH was generously supported by the Mellon Foundation.

ARCH Dashboard

What does ARCH do?

ARCH helps users easily conduct and support computational research with digital collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those datasets. In line with best practices in reproducibility, ARCH supports open publication and preservation of user-generated datasets. ARCH is currently optimized for working with tens of thousands of web archive collections, covering a broad range of subjects, events, and timeframes, and the platform is actively expanding to include digitized text and image collections. ARCH also works with various portions of the overall Wayback Machine global web archive totaling 50+ PB going back to 1996, representing an extensive archive of contemporary history and communication.

ARCH, In-Browser Visualization

Who is ARCH for? 

ARCH is for any user that seeks an accessible approach to working with digital collections computationally at scale. Possible users include but are not limited to researchers exploring disciplinary questions, educators seeking to foster computational methods in the classroom, journalists tracking changes in web-based communication over time, to librarians and archivists seeking to support the development of computational literacies across disciplines. Recent research efforts making use of ARCH include but are not limited to analysis of COVID-19 crisis communications, health misinformation, Latin American women’s rights movements, and post-conflict societies during reconciliation. 

ARCH, Generate Datasets

What are core ARCH features?

Build: Leverage ARCH capabilities to build custom research collections that are well scoped for specific research and education purposes.

Access: Generate more than a dozen different research-ready datasets (e.g., full text, images, pdfs, graph data, and more) from digital collections with the click of a button. Download generated datasets directly in-browser or via API. 

Analyze: Easily work with research-ready datasets in interactive computational environments and applications like Jupyter Notebooks, Google CoLab, Gephi, and Voyant and produce in-browser visualizations.

Publish and Preserve: Openly publish datasets in line with best practices in reproducible research. All published datasets will be preserved in perpetuity. 

Support: Make use of synchronous and asynchronous technical support, online trainings, and extensive help center documentation.

How can I learn more about ARCH?

To learn more about ARCH please reach out via the following form