Category Archives: Web & Data Services

Community Webs Digitization Grant Reveals Stories of San Francisco’s Immigrant Communities

The following guest post from Christina Moretta, Photo Curator and Acting San Francisco History Center Manager at San Francisco Public Library, is part of a series written by members of Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories.

San Francisco History Center (SFHC) of the San Francisco Public Library (SFPL) is the official archive for the City and County of San Francisco. SFHC serves all library users and levels of interest, from the merely curious to those engaging in scholarly research. Because of the Center’s archival function, it also administers the archival collections of the James C. Hormel LGBTQIA Center.

Internet Archive has supported our work to preserve and provide access to San Francisco’s history in many ways. Since 2007, Internet Archive has hosted SFPL digitized content, including local documents and city directories. In 2017, SFPL became one of the first members of Internet Archive’s Community Webs program. This program has provided us with the tools we need to preserve local web-based content that will be important for future researchers investigating San Francisco’s history.

In 2023, the Community Webs program was awarded a grant from the National Historical Publication and Records Commission for the “Collaborative Access to Diverse Public Library Local History Collections” project. This grant supported the digitization of local history collections from libraries across the country, including SFPL. With this support, 23 bound volumes of a Chinese/English language newspaper East/West and 4 cartons of oral histories from the Paul Radin Papers were digitized by Internet Archive.

Cover of East/West, 1968, Vol. 2, no. 23

East/West

The East/West (Dong xi bao) newspaper was acquired the easy way – original subscription by the SFPL’s Periodical Department in the late 1960s. There are only a handful of institutions that have East/West in their holdings as microfilm only. SFHC has the complete run in paper format.

In late 1966, Gordon Lew and two Chinese newspaper colleagues, Kenneth Joe and Ken Wong, began the idea of East/West, a bilingual weekly newspaper published out of San Francisco’s Chinatown. The inaugural issue was in January 1967 and the newspaper ran for over twenty-two years with the last issue in September 1989. Lew became the publisher and editor, Joe worked in the Chinese section, and Wong was the principal writer in the English section. East/West was an important community newspaper, with extensive coverage of local Chinatown news, social activities, the work of Chinese American political figures, and international developments such as the normalization of China ties.

East/West was published in English and Chinese, and for many years, the two sections had approximately the same number of pages. The editorial and perhaps the main news article in the English section would be translated into Chinese. The Chinese section tended to focus more on culture, arts, and history, and it often reprinted articles from other sources. Advertisements filled both sections from the very beginning for local businesses and services. Most were community ads as the newspaper served non-profit organizations that arose in the wake of the Chinese American and Asian American empowerment movements of the 1960s and 1970s.

Miss Chinatown, East/West, 1977, Vol. 11, no. 9, p. 14

Researchers and scholars of 20th-century Chinese American communities in the United States will appreciate the online availability of this unique resource. Many important issues cropped up in Chinese America and Asian America starting in the late 1960s and these can be found in East/West from the community perspective. By being a bilingual publication, the newspaper captured and shared the voice of the community. In addition, San Francisco Chinese Americans had limited political power in the 1960s. East/West focused on emerging Chinese American political figures and urged the community to increase its voting and general political participation.

Browse East/West on archive.org

Paul Radin Papers

In 2003, the Paul Radin Papers were donated to the SFHC by Professor Luis S. Kemnitzer of San Francisco State University on behalf of Calvin Fast Wolf and Mary Sacharoff-Fast Wolf. Mary Wolf was a would-be biographer of Radin who had acquired original papers from her friend and Radin’s widow, Doris Woodward Radin, as well as colleagues.

Dr. Paul Radin (1883-1959) is considered to be one of the formative influences in contemporary anthropology and ethnography in the United States and Europe. The bulk of the Paul Radin Papers consists of surveys from Radin’s supervision of over 200 workers who interviewed ethnic groups in the San Francisco Bay Area for the State Emergency Relief Administration of California (SERA) over a period of nine months in 1934-1935. Known as SERA project 2-F2-98 (3-F2-145), its abstract was published in 1935 as The Survey of San Francisco’s Minorities: Its Purpose and Results. The stated purpose was a cultural survey to find employment for “white collar” unemployed workers on temporary relief. Radin’s focus was “to study the steps in the adjustment and assimilation of minority groups in San Francisco and Alameda counties.” Bypassing a typical questionnaire method, Radin instead had the amateur interviewers record anything and everything which the interviewees wished to say. The results appear in a narrative format—sometimes in the form of poetry and short stories—and encompass all manner of immigrant experiences. Survey materials include typed and handwritten interviews and research on ethnic groups. Some interviewers identify themselves, and their report appears in their own hand.

Jon Y. Lee’s notes, Paul Radin Papers

A portion of the Paul Radin Papers includes SERA worker Jon Y. Lee’s papers including material for The Golden Mountain. Lee was the son of Chinese immigrants who settled in Oakland, California. Radin hired Lee as a fieldworker to collect Chinatown traditions in Oakland, California. Today, Lee is recognized as the first Asian American to work professionally as a folklorist.

With this collection online, international scholars can now easily access narratives about the immigrant experience from their country/region to assist with their diaspora studies. The typed descriptions allow for OCR discovery and for one to gather more information on the San Francisco immigrant experience in the 1910s and 1920s.

Mrs. R narrative, Paul Radin Papers

Browse the Paul Radin Papers on archive.org


Internet Archive and Community Webs are thankful for the support from the National Historical Publications and Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Update on the 2024/2025 End of Term Web Archive

Whitehouse.gov captures from: 2008 Sept. 15; 2013 Mar. 21; 2017 Feb. 3; and 2021 Feb. 25

Every four years, before and after the U.S. presidential election, a team of libraries and research organizations, including the Internet Archive, work together to preserve material from U.S. government websites during the transition of administrations.

These “End of Term” (EOT) Web Archive projects have been completed for term transitions in 2004, 2008, 2012, 2016, and 2020, with 2024 well underway. The effort preserves a record of the U.S. government as it changes over time for historical and research purposes.

With two-thirds of the process complete, the 2024/2025 EOT crawl has collected more than 500 terabytes of material, including more than 100 million unique web pages. All this information, produced by the U.S. government—the largest publisher in the world—is preserved and available for public access at the Internet Archive.

“Access by the people to the records and output of the government is critical,” said Mark Graham, director of the Internet Archive’s Wayback Machine and a participant in the EOT Web Archive project. “Much of the material published by the government has health, safety, security and education benefits for us all.”

The EOT Web Archive project is part of the Internet Archive’s daily routine of recording what’s happening on the web. For more than 25 years, the Internet Archive has worked to preserve material from web-based social media platforms, news sources, governments, and elsewhere across the web. Access to these preserved web pages is provided by the Wayback Machine. “It’s just part of what we do day in and day out,” Graham said. 

To support the EOT Web Archive project, the Internet Archive devotes staff and technical infrastructure to focus on preserving U.S. government sites. The web archives are based on seed lists of government websites and nominations from the general public. Coverage includes websites in the .gov and .mil web domains, as well as government websites hosted on .org, .edu, and other top level domains. 

The Internet Archive provides a variety of discovery and access interfaces to help the public search and understand the material, including APIs and a full text index of the collection. Researchers, journalists, students, and citizens from across the political spectrum rely on these archives to help understand changes on policy, regulations, staffing and other dimensions of the U.S. government. 

As an added layer of preservation, the 2024/2025 EOT Web Archive will be uploaded to the Filecoin network for long-term storage, where previous term archives are already stored. While separate from the EOT collaboration, this effort is part of the Internet Archive’s Democracy’s Library project. Filecoin Foundation (FF) and Filecoin Foundation for the Decentralized Web (FFDW) support Democracy’s Library to ensure public access to government research and publications worldwide.

According to Graham, the large volume of material in the 2024/2025 EOT crawl is because the team gets better with experience every term, and an increasing use of the web as a publishing platform means more material to archive. He also credits the EOT Web Archive’s success to the support and collaboration from its partners.

Web archiving is more than just preserving history—it’s about ensuring access to information for future generations.The End of Term Web Archive serves to safeguard versions of government websites that might otherwise be lost. By preserving this information and making it accessible, the EOT Web Archive has empowered researchers, journalists and citizens to trace the evolution of government policies and decisions.

More questions? Visit https://eotarchive.org/ to learn more about the End of Term Web Archive.

Columbus Neighborhood Newspapers Showcase the City’s Diverse Communities

The following guest post from Aaron O’Donovan (aodonovan@columbuslibrary.org), Columbus Metropolitan Library Special Collections Manager, is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. 

As a local history and genealogy department in a public library, our materials run the gamut from books from the 1700s about the creation of our country to yearbooks of local high schools that patrons like to peruse for nostalgia’s sake. In addition to our approximately 90,000 reference books, our archives room holds approximately 2,500 linear feet of photographic material, records, and manuscript material. We are constantly seeking new opportunities to expand access to our collections for our patrons, and when the opportunity arose to digitize materials as part of the Community Webs program, I knew what I wanted to digitize first: local neighborhood newspapers of Columbus.  

Ohio Columbian, February 24, 1853

We joined the Community Webs program in 2017 to archive important cultural and local government websites of Columbus, Ohio. The catalyst for the project was the belief that we had done a good job of telling the story of Columbus in its first 150 years, but we were missing telling the story of the evolution of the city of the more recent past, as well as failing to record the present. With the object of capturing more recent changes to our city, we focused on archiving our city government website, as well as archiving social service websites, especially those helping new immigrants in our city. Because of the Community Webs program, we were able to take a snapshot of the diverse populations that were making their homes in Columbus, and the medium of web archiving was the only way we were able to tell the stories of these new immigrant communities including the Somalian, Nepalese, Bhutanese, and Mexican populations. To further this focus on migration patterns into Columbus, we felt it was important to make our neighborhood newspapers that we had on microfilm accessible because the neighborhood newspapers featured stories and obituaries on immigrant populations who came to Columbus in the mid-19th century and early 20th century.

The newspapers had been preserved on microfilm for decades, but we were never able to digitize them due the time commitment involved for a project that size. During my time in the local history field in Columbus, it has become clear to me that our library patrons crave hyper-local history material that personally connects their stories to the place they live. While general local history topics about Columbus are popular, nothing is more popular in our library than content generated from Columbus neighborhoods. To finally get an opportunity to digitize neighborhood newspapers and make them accessible to our patrons was one that I could not pass up.

Columbus Call and Post, May 3, 1975

The most important newspaper for the library to digitize was the Columbus Call and Post, a historic Black newspaper that served Columbus from 1962-2007. For years patrons have asked us if the newspaper was digitized, but unfortunately all the library had was microfilm starting in 1972, which was very difficult to browse and ultimately did not serve our patron’s needs for accessibility. Because the Internet Archive performed optical character recognition (OCR) on the text of the newspapers, researchers can now use keyword searching to find an address, a business name, or search for personal names to find news stories that mention the people and places that they hold in their memory.

Digitizing the microfilm of the Call and Post also complemented another project we began several years ago when we partnered with the King Arts Complex to digitize the photograph archive of the iconic newspaper, which was donated to the organization in the mid-1990s. Many of the photographs in the collection have little to no information attached to them (information written on the back of the photographic prints, the name of the photographer, etc.). Digitization of the Call and Post provided additional information to match and apply to the photographs in the archive, adding an enhanced level of searchability and accessibility to this collection. The collections work together to preserve Black history in a way that was not possible before because much of the content from the Call and Post was unique and rare. Being able to bring this newspaper back into the public consciousness has been a thrilling experience for us.

Congressman Adam Clayton Powell Jr. in Columbus, Columbus Call and Post Photograph Collection

As the project continued to take shape, we felt it was important to represent Columbus neighborhoods geographically, which also enabled us to represent different economic and ethnically diverse communities throughout Columbus history. Our most accessed newspaper thus far has been the Hilltop Record, a title which focused on a local neighborhood with strong Appalachian ties and has a long history of covering the issues of working-class citizens on the westside of Columbus. Other digitized community newspapers include :

· The Eastern Spectator and Eastern Review offers perspectives from the city’s Jewish community.

· The Southside Booster and Southside Leader, shares the industrial and union history of Columbus.

· The Linden NE News showcases stories from north Columbus, an area that has experienced several demographic shifts throughout its 100 years of history.

Hilltop Record, November 8, 1928

The rarest newspapers digitized for this project were also some of the oldest newspapers that were preserved on microfilm in our collection. Among those titles are the Ohio Columbian (1853-1856), an anti-slavery newspaper that reported on Underground Railroad activities as they were happening in Ohio and surrounding states. It has potential for illuminating our understanding and knowledge of individuals that were involved in assisting enslaved people seeking freedom in the 1850s.  Other newspapers with great research potential include early (and shorter) runs of Black newspapers that have not been digitized before this project including The Columbus Recorder (1927), The Columbus Voice (1929), which was edited by Florence W. Oakfield, and The Ohio Torch (1928-1930), the longest running newspaper for the Black community during the 1920s. We are excited to report that researchers are already using these resources to better understand Columbus history more objectively and completely.

With this support from the Internet Archive and the National Historical Publications and Records Commission, we have been able to help our local users find information that was not available elsewhere. Recently, we had a researcher request an obituary from June of 1964 when our two major newspapers were on strike. Thankfully, the South Side Spectator had been digitized and was available through the Internet Archive. Our librarian was able to locate the obituary that was only available in that newspaper. We also got this enthusiastic email from a regular library patron after we informed them that we had digitized the Hilltop Record and it was now keyword searchable on the Internet Archive: “OH MY GOSH! ARE YOU SERIOUS!?! THAT’S FANTASTIC! Have I told you lately how much I love you guys? You rock my world! Thank you so much for everything you do. I am so grateful for everyone in Local History & Genealogy.”

Moreover, the librarians are using the digitized newspapers in regular programming, furthering our promotion of these new digitized collections. Every month the library hosts a virtual Black Heritage Collection Spotlight on a notable person or topic from Black history in Columbus. The images and news articles from the digitized Call and Post are used frequently for the program, and we look forward to learning about more ways the digitized newspapers are used in local research to highlight and deepen our community’s connections to Columbus’ past. 

Browse the Columbus Neighborhood Newspapers Collection on archive.org.


The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Internet Archive Services Update: 2024-10-17

[Washinton Post piece]

Last week, along with a DDOS attack and exposure of patron email addresses and encrypted passwords, the Internet Archive’s website javascript was defaced, leading us to bring the site down to access and improve our security. 

The stored data of the Internet Archive is safe and we are working on resuming services safely. This new reality requires heightened attention to cyber security and we are responding. We apologize for the impact of these library services being unavailable.

The Wayback Machine, Archive-It, scanning, and national library crawls have resumed, as well as email, blog, helpdesk, and social media communications.  Our team is working around the clock across time zones to bring other services back online. In coming days more services will resume, some starting in read-only mode as full restoration will take more time. 

We’re taking a cautious, deliberate approach to rebuild and strengthen our defenses. Our priority is ensuring the Internet Archive comes online stronger and more secure.

As a library community, we are seeing other cyber attacks—for instance the British Library, Seattle Public Library, Toronto Public Library, and now Calgary Public Library. We hope these attacks are not indicative of a trend.

For the latest updates, please check this blog and our official social media accounts: X/Twitter, Bluesky and Mastodon.

Thank you for your patience and ongoing support.

Illuminating the Stories of Brooklynites Through Digitized Directories

The following guest post from Dee Bowers (they/them), Archives Manager at the Brooklyn Public Library Center for Brooklyn History, is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices.

Some say as many as one in seven Americans have family roots in Brooklyn, and I expect the newly digitized Brooklyn city directories now available through the Internet Archive will get heavy use from genealogists, historians, authors, journalists, students, and even artists to trace connections to the diverse and ever-changing borough.

Black and white two-page spread of directory title page including map of Brooklyn.
Title page, Spooner’s Brooklyn Directory 1822. Brooklyn Public Library, Center for Brooklyn History.

What is now the Center for Brooklyn History first joined the Internet Archive’s Community Webs program in 2017 as part of the original cohort. This program gave us the tools and training we needed to save over 2TB of web-based Brooklyn history content, including over 1,000 individual URLs. We also host our digitized high school newspapers and audiovisual material on the Internet Archive.

In addition to helping us preserve this web-based content, Community Webs has now also made it possible to increase access to our physical collections through digitization. As part of the Collaborative Access to Diverse Public Library Local History Collections project, made possible by a grant from the National Historical Publications and Records Commission, we were able to partner with the Internet Archive to digitize 236 microfiche sheets of Brooklyn city directories. 

Microfiche sheet from the Brooklyn city directories, 1822. Brooklyn Public Library, Center for Brooklyn History.

These directories show the movement, growth, and changing nature of immigrant populations in Brooklyn in the early to mid 19th century and help document the immigrant experience by providing data on the residency and, in some cases, ethnicities of Brooklynites over time. We knew that expanding digital access would be extremely useful to the many researchers who use our online resources, especially since our number one research topic is genealogy. The project is also directly in line with our mission:

Democratize access to Brooklyn’s history and be dedicated to expanding and diversifying representation of the history of the borough by unifying resources and expertise, and broadening reach and impact.

By increasing the visibility of these collections through digitization and freely available public access, researchers and historians will have a richer, more accessible view into the diversity of American history. The history of Brooklyn is extraordinarily diverse but, like many archives, our collections don’t always tell the fullness of those stories. By expanding access to our city directories, we provide insight into earlier residents of Brooklyn and enable diverse communities to trace their Brooklyn roots to a greater degree.

Screenshot of digitized directory page in Internet Archive viewer.
Screenshot of the early Brooklyn directories in the Internet Archive.

Here’s an example of how the directories look in the Internet Archive. In this screenshot above, they include content outside of just directory listings. In this case, there’s a chronological listing of “memoranda” – notable moments in Brooklyn history – including “June 11, 1812 – News received in Brooklyn, of the declaration of war between the United States and Great Britain.”

One example of research that can be conducted with these directories is finding out more about early Black Brooklynites. Slavery was abolished in New York State in 1827, so the earliest days of post-enslavement Brooklyn are represented in the digitized directories.

Screenshot of digitized directory page in Internet Archive viewer with the purple highlighted surname “Hodges.”
Screenshot of 1857 directory on the Internet Archive with the highlighted surname “Hodges.”

By searching the text of the directories using keywords, I picked out an individual to learn more about, Rev. William J. Hodges, who lived on Broadway in Brooklyn in 1857. By cross-referencing with our digitized newspapers, I was able to find out more about him and his abolitionist activism in Brooklyn and beyond. It turns out he was not born in Brooklyn, nor did he reside there very long, but he did make an impact during his time there, as he founded the Colored Political Association of Kings County (which is the modern-day borough of Brooklyn).

Black and white newspaper clipping describing a “colored indignation meeting” in which William Hodges took part.
“Local Items,” June 5 1856, Brooklyn Times Union, page 2.

If not for the digitized city directories, I doubt I ever would have learned of Rev. Hodges and his time in Brooklyn. I hope that many more stories like these will emerge once researchers start digging into these directories.

Black and white image of buildings on a tree-lined street with information about T. Reeve, architect.
Directory advertisement for T. Reeve, Architect and Builder.

The directories also contain items like this – an advertisement showing this architect and builder’s office on Schermerhorn Street in Downtown Brooklyn. This part of Brooklyn looks very different now, and this insight into what it looked like pre-photography is invaluable, particularly for people conducting house, building, and neighborhood research.

The directories are linked on our Search Our Collections page. We also have a tutorial for using the digitized directories. Additionally, we have several related research guides which assist researchers in exploring various topics. These materials are in the public domain, and we hope they will be used for a broad spectrum of applications, from family research to demographic research to writing to artwork. We are grateful to Community Webs, the Internet Archive, and the National Historical Publications and Records Commission for making this material available and searchable online and allowing us to expand access across the borough, city, and beyond.

Browse the Brooklyn City Directories on archive.org.


The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Public Libraries Meet to Advance Community Archiving

On August 13, Community Webs members from all over the US and Canada gathered in Chicago for the 2024 Community Webs National Symposium. Launched in 2017, Internet Archive’s Community Webs program empowers public libraries and other cultural heritage organizations to document their communities. Members of the program receive access to Internet Archive’s Archive-It web archiving service and Vault digital preservation service as well as training, technical support, and opportunities for professional development.

Members of Internet Archive’s Community Webs program at the Community Webs National Symposium


This event was made possible in part by support from the Mellon Foundation. Held at the Museum of Contemporary Art Chicago, this year’s symposium was an opportunity for members to learn together and connect with each other. The day was organized around two workshops designed to support the community archiving and digital preservation work happening across Community Webs member institutions.

The first workshop, “Collective Wisdom: Collaborative Learning to Support Your Community Archiving Projects,” was taught by Natalie Milbrodt, CUNY University Archivist and co-founder of the Queens Memory Project. Attendees spent time working in small groups to create definitions of “Community Archiving” and reflect on some of the shared challenges and opportunities they were experiencing when engaging in community-centered work. This workshop  emphasized the value of the collective wisdom of Community Webs members and will inform future educational opportunities. The community archives focus of this workshop also supported  the Community Webs Affiliates Program, which encourages relationship-building among public libraries and other community-focused cultural heritage and social service organizations to broaden access to archiving tools for documenting the lives of their patrons.

Attendees work together to discuss strategies for documenting their communities

In the second half of the day, Stacey Erdman and Jaime Schumacher of Digital POWRR led a “Walk the Workflow” workshop which demonstrated a step-by-step digital preservation process using a variety of free preservation tools including  Internet Archive’s Vault digital preservation system.

A main goal for the symposium was to provide an opportunity for Community Webs members to connect and learn from each other. Throughout the day, attendees discussed projects, shared ideas, described lessons learned, and brainstormed possible avenues for future collaboration.

A digital preservation workshop provided attendees with strategies for supporting long term preservation of digital collections

The following day, Community Webs members toured the Chicago Public Library Special Collections. Johanna Russ, Unit Head for Special Collections, gave a presentation about the complex, multi-year project CPL undertook to preserve and provide access to the records of the Chicago Park District. Highlights from this collection were available for attendees to view in the reading room.

That afternoon, the Archive-It Partner Meeting provided opportunities for Community Webs members and other Archive-It users to spend some time with Internet Archive staff to discuss topics such as strategies for capturing social media and making web archives more useful. 

Community Webs members view highlights from Chicago Public Library’s special collections

In-person events like this are instrumental in achieving a key goal of the program: offering opportunities for networking and professional development for Community Webs members. Internet Archive’s support for this national network of practitioners empowers their work on a local level to preserve and provide access to digital heritage sources reflecting the unique life and culture of their communities.

Interested in learning more about Community Webs? Explore Community Webs collections, read the latest program news, or apply to join!

Diversifying Access to the Local Historical Record with Community Webs

Community Webs partners on the NEH supported,  Increasing Access to Diverse Public Library Local History Collections
Partners on the NEH supported, Increasing Access to Diverse Public Library Local History Collections

Since 2017, Community Webs has partnered with public libraries and heritage organizations to document and diversify the historical record. These organizations have collectively archived over 100 terabytes of web-based community heritage materials, including more than 800 collections documenting the lives of those often underrepresented in history. In 2023, Community Webs began offering collection digitization and access with support from the  National Historical Publications and Records Commission (NHPRC). Today, Community Webs is happy to announce $345,000 in additional support from the National Endowment for the Humanities to digitize and provide open access to more than 411,000 local history collection items from seven Community Webs partners: Athens-Clarke County Library, Belen Public Library, District of Columbia Public Library, Evanston History Center, Jersey City Free Public Library, San Francisco Public Library, and William B. Harlan Memorial Library. 

Community Webs partner collections include a diverse range of content from across the country representing the life of immigrants, Black, and minority communities throughout US history. This includes records created by and for them, such as the Julius Hobson Papers from District of Columbia Public Library, the Belen Harvey House Collection from Belen Public Library, and the Local and Regional Family Histories collection from the William B. Harlan Memorial Library. 

ACE Newsletter, Vol. 1, No. 3, Julius Hobson Papers on Federal Job Discrimination
ACE Newsletter, Vol. 1, No. 3, Julius Hobson Papers on Federal Job Discrimination (source)

The collections also contain items that document city and municipal agencies that significantly impact minority communities. Digitization of this material will produce a deeper understanding of how systems of power and legal structures can regulate or even erase minority community histories, especially in regards to housing and economic opportunities. For example, the Athens City Engineer Records from Athens-Clarke County Library, the African American Housing and History collection from Evanston History Center, and the San Francisco Redevelopment Agency Records from San Francisco Public Library show the impact of urban redevelopment on Black and minority neighborhoods. The Municipal Records and agency scrapbooks from Jersey City Free Public Library show the ways that politics and economic changes impacted immigrant and minority communities. 

Ashley Shull, Collections Coordinator, Athens-Clarke County Library shares what this project means to the community:

“The opportunity to be involved in a project proposal like this with the Internet Archive and our other library partners is invaluable to our community. The increased access to our Athens City Engineer collection will provide, not only local citizens, but academic researchers from around the world as well as current Athens-Clarke County Government officials insight into the past planning activities of our community. This is especially important as our local government embarks on a new Comprehensive Community Plan.”

John Beekman, Chief Librarian, Jersey City Free Public Library, also emphasized the impact of access to important city records:

“The Jersey City Free Public Library is honored to work with esteemed libraries from across the country on this innovative project spearheaded by the Internet Archive’s Community Webs program. The municipal minutes and records that make up the bulk of our contribution contain a wealth of information, not only on the workings of city government and agencies, but the people whose work is recorded there. Names and activities present in these records that never made the news will now be discoverable through search rather than the needle-in-a-haystack experience of poring over individual volumes of minutes. Making these materials accessible will provide a tool for enriching the record of city life across the 19th and 20th centuries.”

Hunters Point housing phase one map with unit totals, an Francisco Redevelopment Agency Records. Hunters Point Project Area A. Photograph
Hunters Point housing phase one map with unit totals, an Francisco Redevelopment Agency Records. Hunters Point Project Area A. Photographs (source)

The Community Webs program’s core goals are to increase the diversity of voices represented in the accessible historical record and to forge authentic partnerships between public libraries and heritage organizations that are members of Community Webs and the communities, individuals, and researchers they serve. Digitizing these collections will expand the overall amount and diversity of locally-focused community archives available online to users, and will augment the web and digital collections that are already aggregated by Community Webs. Records will also be shared with the Digital Public Library of America, further strengthening collection discovery. 


The Internet Archive and Community Webs are thankful for support from the National Endowment for the Humanities.

Learn more about Community Webs members, projects, and collections on our blog. Get in touch with us at commwebs@archive.org to discover ways to partner to preserve local history!

Community Webs Receives $750,000 Grant to Expand Community Archiving by Public Libraries

Started in 2017, our Community Webs program has over 175 public libraries and local cultural organizations working to build digital archives documenting the experiences of their communities, especially those patrons often underrepresented in traditional archives. Participating public libraries have created over 1,400 collections documenting local civic life totaling nearly 100 terabytes and tens of millions of individual documents, images, audio/video files, blogs, websites, social media, and more. You can browse many of these collections at the Community Webs website. Participants have also collaborated on digitization efforts to bring minority newspapers online, held public programming and outreach events, and formed local partnerships to help preservation efforts at other mission-aligned organizations. The program has conducted numerous workshops and national symposia to help public librarians gain expertise in digital preservation and cohort members have done dozens of presentations at professional conferences showcasing their work. In the past, Community Webs has received support from the Institute of Museum and Library Services, the Mellon Foundation, the Kahle Austin Foundation, and the National Historical Publications and Records Commission.

We are excited to announce that Community Webs has received $750,000 in funding from The Mellon Foundation to continue expanding the program. The award will allow additional public libraries to join the program and will enable new and existing members to continue their web archiving collection building using our Archive-It service. In addition, the funding will also provide members access to Internet Archive’s new Vault digital preservation service, enabling them to build and preserve collections of any type of digital materials. Lastly, leveraging members’ prior success in local partnerships, Community Webs will now include an “Affiliates” program so member public libraries can nominate local nonprofit partners that can also receive access to archiving services and resources. Funding will also support the continuation of the program’s professional development training in digital preservation and community archiving and its overall cohort and community building activities of workshops, events, and symposia.

We thank The Andrew W. Mellon Foundation for their generous support of Community Webs. We are excited to continue to expand the program and empower hundreds of public librarians to build archives that document the voices, lives, and events of their communities and to ensure this material is permanently available to patrons, students, scholars, and citizens.

Call for Proposals: Advancing Inclusive Computational Research with Archives Research Compute Hub

Last summer, Internet Archive launched ARCH (Archives Research Compute Hub), a research service that supports creation, computational analysis, sharing, and preservation of research datasets from terabytes and even petabytes of data from digital collections – with an initial focus on web archive collections. In line with Internet Archive’s mission to provide “universal access to all knowledge” we aim to make ARCH as universally accessible as possible. 

Computational research and education cannot remain solely accessible to the world’s most well-resourced organizations.  With philanthropic support, Internet Archive is initiating Advancing Inclusive Computational Research with ARCH, a pilot program specifically designed to support an initial cohort of five less well-resourced organizations throughout the world. 

Opportunity

  • Organizational access to ARCH for 1 year – supporting research teams, pedagogical efforts, and/or library, archive, and museum worker experimentation.  
  • Access to thousands of curated web archive collections – abundant thematic range with potential to drive multidisciplinary research and education. 
  • Enhanced Internet Archive training and support – expert synchronous and asynchronous support from Internet Archive staff. 
  • Cohort experience – opportunities to share challenges and successes with a supportive group of peers. 

Eligibility

  • Demonstrated need-based rationale for participation in Advancing Inclusive Computational Research with Archives Research Compute Hub: we will take a number of factors into consideration, including but not limited to stated organizational resources relative to peer organizations, ongoing experience contending with historic and contemporary inequities, as well as levels of national development as assessed by the United Nations Least Developed Countries effort and Human Development Index
  • Organization type: universities, research institutes, libraries, archives, museums, government offices, non-governmental organizations. 

Apply

Submission deadline: 2/26/2024

Decisions communicated to applications: 3/11/2024

Program begins: 3/25/2024

Apply here. 

Moving Getty.edu “404-ward” With Help From The Internet Archive API

This is a guest post from Teresa Soleau (Digital Preservation Manager), Anders Pollack (Software Engineer), and Neal Johnson (Senior IT Project Manager) from the J. Paul Getty Trust.

Project Background

Getty pursues its mission in Los Angeles and around the world through the work of its constituent programs—Getty Conservation Institute, Getty Foundation, J. Paul Getty Museum, and Getty Research Institute—serving the general interested public and a wide range of professional communities to promote a vital civil society through an understanding of the visual arts. 

In 2019, Getty began a website redesign project, changing the technology stack and updating the way we interact with our communities online. The legacy website contained more than 19,000 web pages and we knew many were no longer useful or relevant and should be retired, possibly after being archived. This led us to leverage the content we’d captured using the Internet Archive’s Archive-It service.

We’d been crawling our site since 2017, but had treated the results more as a record of institutional change over time than as an archival resource to be consulted after deletion of a page. We needed to direct traffic to our Wayback Machine captures thus ensuring deleted pages remain accessible when a user requests a deprecated URL. We decided to dynamically display a link to the archived page from our site’s 404 error “Page not found” page.

Getty.edy 404 page
Getty.edu 404 error “Page not found” message including the dynamically generated instructions and Internet Archive page link.

The project to audit all existing pages required us to educate content owners across the institution about web archiving practices and purpose. We developed processes for completing human reviews of large amounts of captured content. This work is described in more detail in a 2021 Digital Preservation Coalition blog post that mentions the Web Archives Collecting Policy we developed.

In this blog post we’ll discuss the work required to use the Internet Archive’s data API to add the necessary link on our 404 pages pointing to the most recent Wayback Machine capture of a deleted page.

Technical Underpinnings

getty workflow

Implementation of our Wayback Machine integration was very straightforward from a technical point of view. The first example provided in the Wayback Machine APIs documentation page provided the technical guidance needed for our use case to display a link to the most recent capture of any page deleted from our website. With no requirements for authentication or management of keys or platform-specific software development kit (SDK) dependencies, our development process was simplified. We chose to incorporate the Wayback API using Nuxt.js, the web framework used to build the new Getty.edu site.

Since the Wayback Machine API is highly performant for simple queries, with a typical response delay in milliseconds, we are able to query the API before rendering the page using a Nuxt route middleware module. API error handling and a request timeout were added to ensure that edge cases such as API failures or network timeouts do not block rendering of the 404 response page.

The only Internet Archive API feature missing for our initial list of requirements was access to snapshot page thumbnails in the JSON data payload received from the API. Access to these images would allow us to enhance our 404 page with a visual cue of archived page content.

Results and Next Steps

Our ability to include a link to an archived version of a deleted web page on our 404 response page helped ease the tough decisions content stakeholders were obliged to make about what content to archive and then delete from the website. We could guarantee availability of content in perpetuity without incurring the long term cost of maintaining the information ourselves.

The API brings back the most recent Wayback Machine capture by default which is sometimes not created by us and hasn’t necessarily passed through our archive quality assurance process. We intend to develop our application further so that we privilege the display of Getty’s own page captures. This will ensure we’re delivering the highest quality capture to users.

Google Analytics has been configured to report on traffic to our 404 pages and will track clicks on links pointing to Internet Archive pages, providing useful feedback on what portion of archived page traffic is referred from our 404 error page.

To work around the challenge of providing navigational affordances to legacy content and ensure web page titles of old content remains accessible to search engines, we intend to provide an up-to-date index of all archived getty.edu pages.

As we continue to retire obsolete website pages and complete this monumental content archiving and retirement effort, we’re grateful for the Internet Archive API which supports our goal of making archived content accessible in perpetuity.