Celebrate with the Internet Archive on October 11th & 12th

Join us on October 11th & 12th to help celebrate AI @ IA : Research in the Age of Artificial Intelligence!

October 11: Tour of the physical archive

Please join us October 11th @ 6-8pm as we take a peek behind the doors of the physical archive in Richmond, California.

We are excited to offer a behind-the-scenes tour of our physical collections of books, music, film, and video in Richmond, California.

With this special insider event we are opening the doors to an often unseen place. See the lifecycle of physical books – donation, preservation, digitization, and access. Also, samples from generous donations and acquisitions of books, records, microfiche, and more are presented.

Register now for the physical archive tour


October 12: Join our annual celebration – in-person & virtual

Artificial Intelligence rocking your boat? Join us October 12th to see how the Internet Archive is using AI to build new capabilities into our library, and how students and scholars all over the world use the Archive’s petabytes of data to inform their own research.

This year’s event is hybrid. We will be celebrating in-person at our main library in San Francisco, and will be livestreaming the event itself from 7pm-8pm PT for those who want to celebrate with us from afar!

Register now for in-person or virtual attendance

Event details

5pm: Entertainment and food trucks
7pm: Program in our Great Room
8pm: Dancing in the streets

Location: 300 Funston Ave. at Clement St., San Francisco

Registration is required: Register now for in-person or virtual attendance.

Live Music Archive Collection Now Tops 250,000 Recordings

For fans wanting to relive an epic concert or discover upcoming bands, there are now more than 250,000 recordings in the Live Music Archive to enjoy. 

The collection has steadily grown over the past 20 years as a collaborative effort between Internet Archive staff and dedicated, music-loving volunteers. At a pace of uploading nearly 30 items a day, the Live Music Archive reached the one-quarter million recording mark in June, and now takes up more than 250 terabytes of data on Internet Archive servers.

“It’s a huge victory for the open web,” said founder of the Internet Archive Brewster Kahle, about the Live Music Archive, which he describes as “fantastically popular” with the public. “Fans have helped build it. Bands have supported it. And the Internet Archive has continued to scale it to be able to meet the demand.”

For years, concert-goers recorded and traded tapes, but in 2002, the Internet Archive offered a reliable infrastructure to preserve performances files. Partnering with the etree music community, the Live Music Archive was established to provide ongoing, free access to lossless and MP3-encoded audio recordings. 

(For more on its history, see https://blog.archive.org/2022/08/12/celebrating-20-years-of-the-live-music-archive/.)

“It shouldn’t cost to give something away,” said Kahle, lamenting fees that can be charged to host items online. “We wanted to make it possible for people to make things permanently available without having to sell their souls to a platform that is going to exploit it for advertising. That just seemed like the world that should exist, and we thought we could play a role.”

Since its launch two decades ago, more than 8,000 artists have given permission to have recordings of their shows archived on the Live Music Archive, and users from around the world have listened to files more than 600 million times. The collection includes the iconic Grateful Dead, as well as aspiring musicians trying to garner attention from the free outlet that spans jambands, folk singers, bluegrass, rock, pop, jazz, classical and experimental music.

The 250,000th item was a Dead and Company show from June 18, 2023.

In 2002, Jonathan Aizen, a technology entrepreneur who helped build the Live Music Archive, said having a free, non-profit, forever host for concert recordings was embraced by music fans. “Until working with the Internet Archive, there were no coordinated and reliable means to preserve and distribute the recordings,” Aizen said. “The only way that these things were being preserved was by copying them — and that was very haphazard, so the music community was very excited.”

Over time, Aizen said it’s been impressive just how many artists have allowed their concerts to be recorded and the organic way the Live Music Archive has grown. “When we started, I had no sense it would last two decades,” he said. “I think it’s really compelling that these recordings are being preserved for posterity. I also didn’t expect the breadth of artists. It’s fair to say that it’s exceeded my expectations by quite a bit.”

In addition to being a resource for fans, the Live Music Archive has been a way for musicians to be discovered. “There’s no doubt in my mind that the accessibility of the recordings on the Internet Archive is exposing bands and drawing people in who then go to the show,” he said. Devoted listeners can track the progress of a band’s career and follow the way songs are played differently on different nights, noting the improvisational element of live recordings, Aizen added.

The passion of the volunteers to curate the collection has been at the heart of the Live Music Archive and is a testament to the strength of the live music community supporting bands. 

David Mallick began uploading to the Live Music Archive in the early days and then came on board as a volunteer curator for about 10 years. He helped recruit bands to participate and helped troubleshoot recordings that others had uploaded. Mallick said free unlimited bandwidth and storage is appealing to musicians, especially for smaller bands just getting started and those who don’t mind sharing their unvarnished recordings. 

“It’s a ‘no ego’ project for the band,” Mallick said. “These are bands that are comfortable enough with their live performances to just say ‘Yeah, put up whatever’ – even if they flubbed a note, screwed up a song, or a fan grabbed a mic.”

Every time Mallick added a recording to the Live Music Archive, he said it was rewarding to know it would always be there for others to hear. “It’s so well organized. Archivists are hosting it, making it uniform, searchable and easy to find things,” he said. 

Added Aizen: “Music is universal — it’s cross cultural and across time,” Aizen said. “To be able to create access, in a world where everything is so commercialized, and just having music be freely accessible, with no ads — that is also something that’s really just special.”

Empowering Anthropological Research in the Digital Age

As a doctoral student in anthropology at Yale University, Spencer Kaplan often relies on the Internet Archive for his research. He is an anthropologist of technology who studies virtual communities. Kaplan said he uses the Wayback Machine to create a living archive of data that he can analyze.

Doctoral student Spencer Kaplan

Last summer, Kaplan studied the blockchain community, which is active on Twitter and constantly changing. As people were sharing their views of the market and helping one another, he needed a way to save the data before their accounts disappeared. A failed project might have prompted the users to take down the information, but Kaplan used the Wayback Machine to preserve the social media exchanges.

In his research, Kaplan said he discovered an environment of mistrust online in the blockchain community and an abundance of scams. He followed how people were navigating the scams, warning one another online to be careful, and actually building trust in some cases. While blockchain is trying to build technologies that avoid trust in social interaction, Kaplan said it was interesting to observe blockchain enthusiasts engaging in trusting connections. He takes the texts of tweets to build a corpus that he can then code and analyze the data to track or show trends.

The Wayback Machine can be helpful, Kaplan said, in finding preserved discussions on Twitter, early versions of company websites or pages that have been taken down altogether—a start-up company that went out of business, for example. “It’s important to be able to hold on to that [information] because our research takes place at a very specific moment in time and we want to be able to capture that specific moment,” Kaplan said.

The Internet Archive’s Open Library has also been essential in Kaplan’s work. When he was recently researching the invention of the “corporate culture” concept, he had trouble finding the first editions of many business books written in the late 80s and early 90s. His campus library often bought updated volumes, but Kaplan needed the originals. “I needed the first edition because I needed to know exactly what they said first and I was able to find that on the Internet Archive,” Kaplan said.

Bowling Green State University Music Library and Bill Schurk Sound Archives partners with Internet Archive

BGSU Music Library and Bill Schurk Sound Archives partners with Internet Archive to provide digital access to thousands of historic recordings 

The BGSU Music Library and Bill Schurk Sound Archives, one of the largest collections of popular music at an academic institution in the United States, has partnered with the Internet Archive’s Great 78 Project to digitize thousands of records made in the early 20th century.  

The University’s collection of over 100,000 of these discs represents one of the largest collections processed by the Internet Archive’s Great 78 Project, a program dedicated to the preservation and dissemination of these early recordings.  

The recordings from the BGSU Music Library and Bill Schurk Sound Archives’ collection trace the history of the recording industry in the United States, including many standard popular and jazz tunes, as well as more niche materials.

Browse the Bowling Green State University 78rpm Collection at Internet Archive.

Many of the less-common items, such as recordings made by and for immigrant and minoritized groups in the United States, children’s recordings and novelty records, are only available on the original recordings since they were never released on LP, CD or digital streaming services. The collection establishes a digital record of underrepresented artists. It also reflects the cultural and political atmospheres of each time period in which the original material was pressed, meaning that some of the materials included reflect stereotypes and language that may be offensive to today’s listeners. These materials do not represent the values of Bowling Green State University, University Libraries, and Music Library and Bill Schurk Sound Archives.  

Digital versions of these records created through the partnership make it easier for BGSU to provide on-campus access to the recordings while adding those digital files to the thousands already digitized by the Great 78 Project.  

Early phonograph records were made to spin at 78 revolutions per minute (rpm) and were the most common format for sound recordings in the United States from the early 1900s until the early 1950s. Because of their age and the developing practices of the early sound recording industry, these discs require specialized equipment for modern playback and, unlike modern LPs, often require the attention of a professional audio engineer to coax optimal sound quality from the aging records.   

“The pace at which George Blood LC, the vendor working with the Great 78 Project, began processing and digitizing the collection was astounding,” said Dr. David Lewis, a former sound archivist at the Music Library and Bill Schurk Sound Archives. “Within two months, they had digitized and uploaded thousands of recordings to the Bowling Green 78 rpm Collection page. That same work would have taken years longer to complete in-house at BGSU. Working with the Great 78 Project has the added benefit of contributing the University’s materials to a global network of fans, researchers and listeners alongside many other collections of 78 rpm discs.”   

The digital files created from BGSU Music Library and Bill Schurk Sound Archives’ 78 rpm discs will be preserved by the Internet Archive and made available for online listening and downloading from the Internet Archive’s BGSU collection page.  

In addition, staff at the Music Library and Bill Schurk Sound Archives will begin work to preserve copies of selected digital items from the project at BGSU, providing additional safeguards for Ohio-related content and other rare and unique recordings that align with major collecting areas in the sound archives as well as faculty research and teaching.    

The wide access provided by the Internet Archive’s Great 78 Project, combined with the rare and unique material contributed to the project by BGSU, will add to the depth and breadth of material available both on campus and through the Internet Archive.   

Reposted with permission from BGSU News.

Book Talk: Moving Theory Into Practice

Join Internet Archive’s Chris Freeland for a discussion with Oya Y. Rieger about ‘Moving Theory Into Practice,’ the landmark digitization guide & workshop that sparked a revolution in digital libraries.
Thursday, August 24 @ 10am PT / 1pm ET

REGISTER NOW

As the digital library field emerged in the mid- to late-1990s, librarians faced numerous challenges in building the skills necessary to provide digital access to their collections. That changed in the summer of 2000, when Anne R. Kenney and Oya Y. Rieger (Cornell University Library) produced “Moving Theory Into Practice,” a groundbreaking week-long workshop & digitization guide that offered hands-on, immersive training in digitization and preservation.

The purpose of “Moving Theory Into Practice” was to skill-build librarians, archivists, curators, administrators, technologists, and other professionals who were either contemplating or already implementing digital imaging programs. Its objective was to equip participants with practical strategies that surpassed theoretical concepts, grounded in the latest standards, best practices and informed decision-making.

In our upcoming webinar, we are delighted to talk with Oya Y. Rieger, co-author of “Moving Theory Into Practice.” During the discussion, we will explore the impacts of hosting these training sessions, shedding light on their significance within the digital library community and the broader library community at the time. We will also explore related training such as Rare Book School, and reflect on large-scale digitization projects like Making of America and state-based efforts to understand the context in which this workshop occurred. Additionally, we will touch upon the evolution of digitization training since the original workshop, providing insights into how the field has matured.

REGISTER NOW

About our speakers

Oya Y. Rieger is a senior strategist on Ithaka S+R’s Libraries, Scholarly Communication, and Museums team. She spearheads projects that reexamine the nature of collections within the research library, help secure access to and preservation of the scholarly record, and explore the possibilities of open source software and open science.

Prior to joining Ithaka S+R, Oya worked at Cornell University for 25 years. For the past ten years she served as Associate University Librarian, leading strategic initiatives, building partnerships, and facilitating sustainable and user-centered projects. During her tenure at Cornell, her program areas included digital scholarship, collection development, digitization, preservation, user experience, scholarly publishing, learning technologies, research data management, digital humanities, and special collections. She spearheaded projects funded by the Institute of Museum and Library Studies (IMLS), the Henry Luce Foundation, The Andrew W. Mellon Foundation, National Endowment for the Humanities (NEH), Simons Foundation, and Sloan Foundation to develop ejournal preservation strategies, conduct research on new media archiving, implement preservation programs in Asia, design digital curation curriculums, and create sustainability models for alternative publishing models to advance science communication.

Chris Freeland is the Director of Library Services at the Internet Archive, working in support of our mission to provide “Universal access to all knowledge.” Before joining the Internet Archive, Chris was an Associate University Librarian at Washington University in St. Louis, managing Washington University Libraries’ digital initiatives and related services. He holds an M.S. in Biological Sciences from Eastern Illinois University and an M.S. in Library and Information Science from University of Missouri-Columbia. His research explores the intersections of science and technology in a cultural heritage context, having published and presented on a variety of topics relating to the use of new media and emerging technologies in libraries and museums.

Book Talk: Moving Theory Into Practice
Thursday, August 24 @ 10am PT / 1pm ET
Register now for the virtual discussion!

Preserving the Past, Empowering the Future: Unveiling the Wayback Machine’s Vital Role in Investigative Work

A precious tool. That’s how Laura Ranca describes the Wayback Machine in her work.

As a researcher at the Berlin-based organization Tactical Tech and its Exposing the Invisible Project, she helps people use technology to inform, educate and advance causes. Ranca trains journalists, human rights activists, scholars and everyday citizens to use the internet to investigate and gather evidence.

The Wayback Machine has been particularly useful in finding and retrieving lost websites, said Ranca. She also makes sure materials she produces are preserved online so future researchers can build on her work. As people try to document how the public is interacting with technology, the material stored by the Internet Archive has been essential to investigators, Ranca said.

“We face the challenge of websites and webpages being modified, altered or intentionally taken down. Sometimes it’s to hide something that was previously published, but is no longer relevant, or it now has maybe a different connotation than was intended,” Ranca said. “For us, this is very valuable to access historical records and to save different web pages and resources online using the Wayback Machine.”

When researching environmental issues, Ranca has discovered material that reflects missed early warning signs. Finding 20-year-old mining reports, video footage or other documentation affecting the climate can be important evidence in making the case for climate action. These items need to be protected, Ranca said, and the Wayback Machine provides that security. Ranca and the team at Exposing the Invisible conduct workshops on how to navigate the Wayback Machine, as well as train-the-trainer sessions on investigative skills more broadly. She also created guides on how to use Internet Archive content, available as open source through Creative Commons.

Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)

We are excited to announce the public availability of ARCH (Archives Research Compute Hub), a new research and education service that helps users easily build, access, and analyze digital collections computationally at scale. ARCH represents a combination of the Internet Archive’s experience supporting computational research for more than a decade by providing large-scale data to researchers and dataset-oriented service integrations like ARS (Archive-it Research Services) and a collaboration with the Archives Unleashed project of the University of Waterloo and York University. Development of ARCH was generously supported by the Mellon Foundation.

ARCH Dashboard

What does ARCH do?

ARCH helps users easily conduct and support computational research with digital collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those datasets. In line with best practices in reproducibility, ARCH supports open publication and preservation of user-generated datasets. ARCH is currently optimized for working with tens of thousands of web archive collections, covering a broad range of subjects, events, and timeframes, and the platform is actively expanding to include digitized text and image collections. ARCH also works with various portions of the overall Wayback Machine global web archive totaling 50+ PB going back to 1996, representing an extensive archive of contemporary history and communication.

ARCH, In-Browser Visualization

Who is ARCH for? 

ARCH is for any user that seeks an accessible approach to working with digital collections computationally at scale. Possible users include but are not limited to researchers exploring disciplinary questions, educators seeking to foster computational methods in the classroom, journalists tracking changes in web-based communication over time, to librarians and archivists seeking to support the development of computational literacies across disciplines. Recent research efforts making use of ARCH include but are not limited to analysis of COVID-19 crisis communications, health misinformation, Latin American women’s rights movements, and post-conflict societies during reconciliation. 

ARCH, Generate Datasets

What are core ARCH features?

Build: Leverage ARCH capabilities to build custom research collections that are well scoped for specific research and education purposes.

Access: Generate more than a dozen different research-ready datasets (e.g., full text, images, pdfs, graph data, and more) from digital collections with the click of a button. Download generated datasets directly in-browser or via API. 

Analyze: Easily work with research-ready datasets in interactive computational environments and applications like Jupyter Notebooks, Google CoLab, Gephi, and Voyant and produce in-browser visualizations.

Publish and Preserve: Openly publish datasets in line with best practices in reproducible research. All published datasets will be preserved in perpetuity. 

Support: Make use of synchronous and asynchronous technical support, online trainings, and extensive help center documentation.

How can I learn more about ARCH?

To learn more about ARCH please reach out via the following form

Canadian Musician Relies on Wayback Machine for Immigration Documentation

This post is part of our ongoing series highlighting how our patrons and partners use the Internet Archive to further their own research and programs.

David Samuel, a Canadian-born viola player, has lived all over the world working as a professional musician. A graduate of The Juilliard School, he lived in Europe and New Zealand before settling in San Francisco two years ago.

As Samuel works through the U.S. immigration process to get his permanent residence (green) card, he has turned to the Internet Archive for help in gathering documentation. He’s applying for residency under the “extraordinary ability” category. To make the case, he needs to put together an extensive resume of his accomplishments, awards and reviews in the arts.

Samuel performs and teaches in the Bay Area, as a member of the Alexander String Quartet and a lecturer at San Francisco State University. Using the Wayback Machine, he was able to track down website postings and programs about his past concerts to use in his application. “It was quite remarkable to find the exact dates and times of past performances,” said Samuel. “It would have been really tough otherwise, because I only have a limited number of actual physical documents with me.”

The application process is grueling, Samuel said, but being able to freely search for supporting evidence on the Wayback Machine has made it easier. “It’s been an important tool for me,” said Samuel, who heard about the Internet Archive years ago. “It’s like an encyclopedia for the history of the internet.”

David Samuel
http://violistdavidsamuel.com

Permanent Residents: A Research Guest Post

This post is part of our ongoing series highlighting how our patrons and partners use the Internet Archive to further their own research and programs.

From Patricia Rose, in her own words:

Tour guide Patricia Rose

In 2019, after retiring from an administrative career at the University of Pennsylvania, I signed up to be a tour guide at Philadelphia’s historic Laurel Hill Cemetery (now Laurel Hill East), the first American cemetery to be named a National Historic Landmark.  With more than 75,000 “permanent residents”, there are lots of opportunities to tour stopping at the graves of fascinating men and women, most from the nineteenth and first half of the twentieth century, although there are still some new burials.  It was so much fun I started leading tours at their larger sister cemetery, Laurel Hill West, itself listed on the National Registry of Historic Places, and with permanent residents mostly from the twentieth century to the modern day.

In 2020, COVID made fresh-air cemetery tours quite popular, and I led specialized tours on spiritualism, and on gay and lesbian residents called “Out of the Closet and into the Crypt.”  

Sara Yorke Stevenson

Among the stops on some of my tours was the grave of Sara Yorke Stevenson (1847 – 1921).  She was an Egyptologist, a museum curator, co-founder and leader, author, journalist and fighter for women’s suffrage.  She led a full and eventful life, born in Paris, and ending after her successful efforts to bring medical help to France during World War I, raising the equivalent of $36 million in today’s dollars. 

As part of the cemetery’s educational programming, my fellow tour guide Joe Lex (retired Professor of Emergency Medicine) created a wonderful podcast, All Bones Considered, focusing on both Laurel Hill East and West, and I jumped at the chance to present Stevenson on the podcast.

There is a wealth of information on Stevenson.  As a co-founder, curator, and board chair at the University of Pennsylvania Museum of Archaeology and Anthropology (the Penn Museum), Sara appears in numerous histories of the museum, and in volumes on the beginnings of archaeology in this country.  Luckily, in 2006, Sara’s private papers were discovered in the attic of a Philadelphia home that was being cleaned out for sale.  Those papers are now housed in the Special Collections of the LaSalle University Library, and in the Archives of the Penn Museum.  These I visited and enjoyed reading letters Sara received, a few materials she wrote, and relevant newspaper clippings she saved.

Title page from Maximilian in Mexico (1899) by Sara Yorke Stevenson

But I was still anxious to read Sara’s published writing, but who knew about the wealth of these materials at the Internet Archive?  Her book, Maximilian in Mexico: A Women’s Reminiscences of the French Intervention, 1862-1867, is in multiple copies.  Also her monograph, On Certain Symbols Used in the Decoration of Some Potsherds from Daphnae and Naukratis Now in the Museum of the University of Pennsylvania and various papers Stevenson delivered to the Oriental Club of Philadelphia, such as “The Feather and the Wing in Early Mythology,” and “Early Forms of Religious Symbolism, the Stone Axe and Flying Sun Disc.”

Fortunately, also in the Internet Archive I found relevant issues of the Bulletin of the Pennsylvania Museum from the early days of the twentieth century. (The Pennsylvania Museum became the Philadelphia Museum of Art, and its School of Industrial Art became Philadelphia’s University of the Arts.)  Sara served as a curator at the Philadelphia Museum, and also as the acting director. In the April 1908 edition of the Bulletin, the following appears:

“It is proposed to establish at the School of Industrial Art of the Pennsylvania Museum…a course in the training of curators for art, archaeological and industrial museums, under the supervision of Mrs. Cornelius Stevenson, ScD.”  

Bulletin of the Pennsylvania Museum, Number 22, April 1908.

Museums were being founded throughout the country, and there was a need for trained curators. The next issue of the Bulletin details the twelve lectures in Stevenson’s course.  She begins with The History of Museums, followed by the Modern Museum.  She covers the Museum Building, with attention to light, heat, water, workshops, repair shops and store rooms.  She addresses the Art of Collecting.  In addition to lecturing, she took her students to every museum in the city, met with directors and curators, critiqued exhibits and identified problems of preservation and conservation.  This was the first course in museum studies and curatorship offered in the United States, and luckily I could read all about it on the Internet Archive.

Finally, on the Archive I found John W. Jordan’s 1911 volume, Colonial Families of Philadelphia, which contains invaluable genealogical information on the families of Stevenson and her husband (and many others).

The Internet Archive’s Sara Yorke Stevenson collection was invaluable to me as I prepared my blog post. Going forward, I will turn to the Archive whenever I do research for my cemetery tours.  Thank you to all who have created this marvelous resource.

Should you wish to learn more about Laurel Hill East and West, please visit https://laurelhillphl.com/.  My podcast is part of episode #48, Shattering Some Glass Ceilings, on All Bones Considered, which is available at https://www.podbean.com/pu/pbblog-kty8f-780f6a, on Apple Podcast, or wherever you get your podcasts.  

Patricia Rose 
Philadelphia, PA

The Power of Preservation: How the Internet Archive Empowers Digital Investigations and Research

A part of a series: The Internet Archive as Research Library

Written by Caralee Adams

When gathering evidence for a court case or researching human rights violations, Lili Siri Spira often found that the material she needed was preserved by the Internet Archive.

Spira is the Social Media and Campaign Marketing Manager for TechEquity Collaborative, as well as the co-manager of RatedResilient.com, a platform that promotes psycho-social resilience for digital activists. She has interned at the Center for Justice & Accountability and was an open-source investigator at the Human Rights Center at UC Berkeley during college.

In Spira’s work, the Wayback Machine has played an integral role in providing stamped artifacts and metadata.

For example, when researching the Bolivian coup in 2019, she wanted to learn more about the sentiment of indigenous people toward political leadership. Spira used the Wayback Machine to examine how indigenous Bolivian websites had changed since 2009. She discovered after initial criticism, some websites seemed to have disappeared.

“The great thing about the Internet Archive is that it really protects the chain of custody,” Spira said. “It’s not only that you look back, but you can even find a website now and capture it in time with the metadata.”

In 2020, The Berkeley Protocol on Digital Open Source Violations provided global guidelines for using public digital information as evidence in international criminal and human rights investigations. Spira said this allows preserved website data to be used in court proceedings to hold parties accountable.

On other occasions, Spira has investigated companies suspected of unethical practices. Sometimes executives openly admitted to certain behaviors, only to later deny their action. Companies may attempt to erase past communication, but Spira said she can uncover the previous versions of websites through the Wayback Machine.

“Our knowledge is not being held sacred by many people in this country and around the world,” Spira said. “It’s incredibly important for research work in any field to have access to preserved [digital] information—especially when that research is making certain allegations against powerful entities and corporations.”

We thank Lili and her colleagues for sharing their story for how they use the Internet Archive’s collections in their work.