Tag Archives: web archiving

Internet Archive and New York Art Resources Consortium Receive Grant for a National Forum to Advance Web Archiving in Art and Museum Libraries

We are pleased to announce that the Institute of Museum and Library Services (IMLS) has recently awarded a collaborative grant to the New York Art Resources Consortium and our Archive-It group to host a national forum event, along with associated workshops and stakeholder meetings, to catalyze collaboration among art libraries in the stewardship of historically valuable art-related materials published on the web. The New York Art Resources Consortium (NYARC) consists of the research libraries and archives of three leading art museums in New York City: The Brooklyn Museum, The Frick Collection, and The Museum of Modern Art. Archive-It is the web archiving service of the Internet Archive that works with hundreds of heritage organizations, including an international set of museums and art libraries, to preserve and provide access to web-published resources. Archive-It and NYARC will jointly run the project, Advancing Art Libraries and Curated Web Archives: A National Forum.

This National Leadership Grant in the Curating Collections program category to conduct a National Forum and affiliated meetings builds on NYARC’s and Archive-It’s work together expanding web archiving amongst art and museum libraries and archives, including through the ARLIS/NA Web Archiving Special Interest Group, as well as their individual efforts to advance born-digital collection building. In Reframing Collections for the Digital Age, NYARC focused on web archiving program development, including technical work to integrate Archive-It and its discovery services that can inform work in similar institutions. Archive-It, with its Community Webs program, is working with dozens of public libraries on cohort building, educational resources, and network development supporting community history web archiving — a model that can be adopted by the national art library community to scale out its coordinated efforts. In addition, Archive-It has led, and NYARC operationalized, collaborative efforts towards joint API-based systems integrations research and development to further joint services and interoperability. 

By mobilizing a broad effort through an invitational forum, the project aims to achieve national scale through network building and shared infrastructure planning that the project team will foster through a program of discussion, training, and strategic roadmapping. The project will include the contribution of a diverse group of members of the art library community, lead to published outputs on strategic directions and community-specific training materials, and launch a multi-institutional effort to scale the extent of web-published, born-digital materials preserved and accessible for art scholarship and research. Thank you to IMLS for their continued support of work advancing web archiving and the overall national digital platform initiative.

Andrew W. Mellon Foundation Awards Grant to the Internet Archive for Long Tail Journal Preservation

The Andrew W. Mellon Foundation has awarded a research and development grant to the Internet Archive to address the critical need to preserve the “long tail” of open access scholarly communications. The project, Ensuring the Persistent Access of Long Tail Open Access Journal Literature, builds on prototype work identifying at-risk content held in web archives by using data provided by identifier services and registries. Furthermore, the project expands on work acquiring missing open access articles via customized web harvesting, improving discovery and access to this materials from within extant web archives, and developing machine learning approaches, training sets, and cost models for advancing and scaling this project’s work.

The project will explore how adding automation to the already highly automated systems for archiving the web at scale can help address the need to preserve at-risk open access scholarly outputs. Instead of specialized curation and ingest systems, the project will work to identify the scholarly content already collected in general web collections, both those of the Internet Archive and collaborating partners, and implement automated systems to ensure at-risk scholarly outputs on the web are well-collected and are associated with the appropriate metadata. The proposal envisages two opposite but complementary approaches:

  • A top-down approach involves taking journal metadata and open data sets from identifier and registry sources such as ISSN, DOAJ, Unpaywall, CrossRef, and others and examining the content of large-scale web archives to ask “is this journal being collected and preserved and, if not, how can collection be improved?”
  • A bottom-up approach involves examining the content of general domain-scale and global-scale web archives to ask “is this content a journal and, if so, can it be associated with external identifier and metadata sources for enhanced discovery and access?”

The grant will fund work to use the output of these approaches to generate training sets and test them against smaller web collections in order to estimate how effective this approach would be at identifying the long-tail content, how expensive a full-scale effort would be, and what level of computing infrastructure is needed to perform such work. The project will also build a model for better understanding the costs for other web archiving institutions to do similar analysis upon their collection using the project’s algorithms and tools. Lastly, the project team, in the Web Archiving and Data Services group with Director Jefferson Bailey as Principal Investigator,  will undertake a planning process to determine resource requirements and work necessary to build a sustainable workflow to keep the results up-to-date incrementally as publication continues.

In combination, these approaches will both improve the current state of preservation for long-tail journal materials as well as develop models for how this work can be automated and applied to existing corpora at scale. Thanks to the Mellon Foundation for their support of this work and we look forward to sharing the project’s open-source tools and outcomes with a broad community of partners.

27 Public Libraries and the Internet Archive Launch “Community Webs” for Local History Web Archiving

The lives and activities of communities are increasingly documented online; local news, events, disasters, celebrations — the experiences of citizens are now largely shared via social media and web platforms. As these primary sources about community life move to the web, the need to archive these materials becomes an increasingly important activity of the stewards of community memory. And in many communities across the nation, public libraries, as one of their many responsibilities to their patrons, serve the vital role of stewards of local history. Yet public libraries have historically been a small fraction of the growing national and international web archiving community.

With generous support from the Institute of Museum and Library Services, as well as the Kahle/Austin Foundation and the Archive-It service, the Internet Archive and 27 public library partners representing 17 different states have launched a new program: Community Webs: Empowering Public Libraries to Create Community History Web Archives. The program will provide education, applied training, cohort network development, and web archiving services for a group of public librarians to develop expertise in web archiving for the purpose of local memory collecting. Additional partners in the program include OCLC’s WebJunction training and education service and the public libraries of Queens, Cleveland and San Francisco will serve as “lead libraries” in the cohort. The program will result in dozens of terabytes of public library administered local history web archives, a range of open educational resources in the form of online courses, videos, and guides, and a nationwide network of public librarians with expertise in local history web archiving and the advocacy tools to build and expand the network. A full listing of the participating public libraries is below and on the program website.

In November 2017, the cohort gathered together at the Internet Archive for a kickoff meeting of brainstorming, socializing, and, of course, talking all things web archiving.  Partners shared details on their existing local history programs and ideas for collection development around web materials. Attendees talked about building collections documenting their demographic diversity or focusing on local issues, such as housing availability or changes in community profile. As an example, Abbie Zeltzer from the Patagonia Public Library, spoke about the changes in her community of 913 residents as the town redevelops a long dormant mining industry. Zeltzer intends on developing a web archive documenting this transition and the related community reaction and changes.

Since the kickoff meeting, the Community Webs cohort has been actively building collections, from hyper-local media sites in Kansas City, to neighborhood blogs in Washington D.C., to Mardi Gras in East Baton Rouge. In addition, program staff, cohort members, and WebJunction have been building out an extensive online course space with educational materials for training on web archiving for local history. The full course space and all open educational resources will be released in early 2019 and a second full in-person meeting of the cohort will take place in Fall 2018.

For further information on the Community Webs program, contact Maria Praetzellis, Program Manager, Web Archiving [maria at archive.org] or Jefferson Bailey, Director, Web Archiving [jefferson at archive.org].

Public Library City State
Athens Regional Library System Athens GA
Birmingham Public Library Birmingham AL
Brooklyn Public Library – Brooklyn Collection New York City NY
Buffalo & Erie County Public Library Buffalo NY
Cleveland Public LIbrary Cleveland OH
Columbus Metropolitan Library Columbus OH
County of Los Angeles Public Library Los Angeles CA
DC Public Library Washington DC
Denver Public Library – Western History and Genealogy Department and Blair-Caldwell African American Research Library Denver CO
East Baton Rouge Parish Library East Baton Rouge LA
Forbes Library Northampton MA
Grand Rapids Public Library Grand Rapids MI
Henderson District Public Libraries Henderson NV
Kansas City Public Library Kansas City MO
Lawrence Public Library Lawrence KS
Marshall Lyon County Library Marshall MN
New Brunswick Free Public Library New Brunswick NJ
Schomburg Center for Research in Black Culture (NYPL) New York City NY
Patagonia Library Patagonia AZ
Pollard Memorial Library Lowell MA
Queens Library New York City NY
San Diego Public Library San Diego CA
San Francisco Public Library San Francisco CA
Sonoma County Public Library Santa Rosa CA
The Urbana Free Library Urbana IL
West Hartford Public Library West Hartford CT
Westborough Public Library Westborough MA

Military Industrial Powerpoint Complex Karaoke! — Tuesday, March 6

The Internet Archive presents the first ever Military Powerpoint Karaoke: a night of “Powerpoint Karaoke” using presentations in the Military Industrial Powerpoint Complex collection at archive.org that were extracted by the Internet Archive from its public web archive and converted into a special collection of PDFs/epubs. The event will take place on Tuesday, March 6th at 7:30pm at our headquarters in San Francisco. The show will be preceded by a reception at 6:30 pm, when doors will also open.

Get Free Tickets Here

Also known as “Battle Decks,” Powerpoint Karaoke is an improvisational and art event where audience members give a presentation using a set of Powerpoint slides that they’ve never seen before. There are three rules: 1) The presenter cannot see the slides before presenting; 2) The presenter delivers each slide in succession without skipping slides or going back; and 3) The presentation ends when all slides are presented, or after 5 minutes (whichever comes first). We’re thrilled to have Rick Prelinger, creator of Lost Landscapes and Prelinger Archive, and Avery Trufelman of 99% Invisible, joining us to deliver headlining Powerpoint decks. The rest of the presentations will be delivered by you — the audience members who sign up.

This event will use, as its source material, a curated collection of the Internet Archive’s Military Industrial Powerpoint Complex, a special project alongside GifCities that was originally created for the Internet Archive’s 20th Anniversary in October 2016. For the project, IA staff extracted all the Powerpoint files from its archive of the government’s public .mil web domain. The collection was expanded in early 2017 to include materials collected during the End of Term project, which archived a snapshot of the .gov and .mil web domains during the administration change. The Military Industrial Powerpoint Complex collection contains over 57,000 Powerpoint decks, each charged with material that ranges from the violent to the banal, featuring attack modes, leadership styles, harness types, and modes for requesting vacation days from the US Military. The project was originally inspired by writer Paul Ford’s article, “Amazing Military Infographics” which can be found in the Wayback Machine. As a whole, this collection forms a unique snapshot into our government’s Military Industrial Complex.

This event is organized by artists/archivists Liat Berdugo and Charlie Macquarie in partnership with the Internet Archive.

Tuesday, March 6
6:30 pm Reception
7:30 pm Program

Internet Archive
300 Funston Avenue
San Francisco, CA 94118

Get Free Tickets Here