How the Internet Archive is Digitizing LPs to Preserve Generations of Audio

Last updated January 29th, 2020.

Albums available in the Boston Public Library Vinyl LP Collection on Archive.org.

Imagine if your favorite song or nostalgic recording from childhood was lost forever. This could be the fate of hundreds of thousands of audio files stored on vinyl, except that the Internet Archive is now expanding its digitization project to include LPs. 

Earlier this year, the Internet Archive began working with the Boston Public Library (BPL) to digitize more than 100,000 audio recordings from their sound collection. The recordings exist in a variety of historical formats, including wax cylinders, 78 rpms, and LPs. They span musical genres including classical, pop, rock, and jazz, and contain obscure recordings like this album of music for baton twirlers, and this record of radio’s all-time greatest bloopers

Unfortunately, many of these audio files were never translated into digital formats and are therefore locked in their physical recording. In order to prevent them from disappearing forever when the vinyl is broken, warped, or lost, the Internet Archive is digitizing these at-risk recordings so that they will remain accessible for future listeners.


“The LP was our primary musical medium for over a generation. From Elvis, to the Beatles, to the Clash, the LP was witness to the birth of both Rock & Roll and Punk Rock. It was integral to our culture from the 1950s to the 1980s and is important for us to preserve for future generations.”

– CR Saikley, Director of Special Projects, Internet Archive

Since all of the information on an LP is printed, the digitization process must begin by cataloging data. High-resolution scans are taken of the cover art, the disc itself and any inserts or accompanying materials. The record label, year recorded, track list and other metadata are supplemented and cross-checked against various external databases. 


High resolution imaging of album cover art. The boxed area is shown at high resolution at right.


“We’re really trying to capture everything about this artifact, this piece of media. As an archivist, that’s what we want to represent, the fullness of this physical object.”

– Derek Fukumori, Internet Archive Engineer

Once cataloged, the LP’s are then digitized. The Internet Archive partners with Innodata Knowledge Services, an organization focused on machine learning and digital data transformation, to complete the digitization process at their facilities in Cebu, Philippines. An Innodata worker digitizes 12 LPs at a time, setting turntables to play and record by hand, then turning each record over to the next side. Since each LP is digitized in real time, it takes a full 20 minutes to record an average LP side. By operating 12 turntables simultaneously, the team expects to be able to digitize ten LPs per hour.


Audio stations complete with turntables & recording equipment set up in Cebu, Philippines.

Once recorded, there is a large FLAC file for each side of the LP, which needs to be segmented so listeners can easily begin at the desired song. There are two different algorithms used for segmenting; the first one looks at images of the vinyl disc to locate gaps in its grooves, which usually line up with gaps between songs. A second algorithm listens to the audio file to find the silent spaces between songs. When these two algorithms align, our engineers have a good measure of confidence that the machine has found the proper tracks.

These algorithms currently predict segmenting with about 85% to 95% accuracy, but some audio files are more difficult. For example, recordings of live music fill in the spaces between songs with applause, while classical music utilizes silence as part of a song. In order to account for these anomalies, digitized LP files are always checked manually before being added to the online database.

Identifying the empty spaces between songs for segmenting.

Currently, there are more than 5,800 LPs from the Boston Public Library LP collection available on Archive.org. The Internet Archive continues to digitize the remainder of the BPL collection in addition to more than 285,000 LPs that have been donated by others. The organization aims to engage a greater community of LP and 78 rpm enthusiasts by welcoming contributions and improvements to the recorded metadata. Many of the audio files online can be listened to in full, but some of the albums are only available in 30 second snippets due to rights issues.


“The complexity of properly digitizing LPs has been an evolving challenge, but thanks to the help of friends of the Archive, our in-house expertise, and the dedication of Innodata, I’m confident we’ve nailed it.”

– Merlijn Wajer, Internet Archive Developer

For decades, vinyl records were the dominant storage medium for every type of music and are ingrained in the memories and culture of several generations. Despite the challenges, the Internet Archive is determined to preserve these at-risk records so that they can be heard online by new audiences of scholars, researchers, and music lovers around the world.


ABOUT THE AUTHOR: Faye Lessler is a California-born, Brooklyn-based freelance writer and founder of lifestyle blog, Sustaining Life. She is an expert in mission-driven communications and enjoys writing while sipping black tea in a beam of sunshine.

7 thoughts on “How the Internet Archive is Digitizing LPs to Preserve Generations of Audio

  1. Robert Glen Tischer

    I dad left me with several 100 jazz albums. I would like to bequeath them to someone or some organization who can use or archive them. Or pass them on to someone who can appreciate them. At any rate, they will end up in the garbage dump when I pass on unless I do something with them now.

    What should I do with them?

  2. jon g.

    I’m a DJ here in Boston, and I have a huge vinyl collection. I digitize tracks selectively, because it’s such a hassle, but worthwhile! I can’t say that all tracks on any given album are created equal, so I pick and choose. I have a tower of vintage audio equipment next to my computer, a TEAC reel-to-reel tape deck, cassette deck, turntable, and amplifier, which I patch into the audio input of my computer. This way I can digitize any medium I come across. I use the free “Audacity” software to capture and (if absolutely necessary) clean up the recordings, and save as a digital file. Which I then bring to my radio station for airplay. Thankfully my station lets me play this wonderfully obscure trove of music! That said, many, many items have been reissued on CD, and that’s by far the way to go… if you can.

  3. Craig Silvis

    “Many of the audio files online can be listened to in full, but some of the albums are only available in 30 second snippets due to rights issues.”
    Unless I am missing something, when I went to that link only 3 of the 1,128 recordings were unlocked, the rest are 30 second snippets.

  4. Russell Miller

    Most records are off-center to some degree. How does your workflow take this into account?

  5. Mike Frisco

    What a great project to take on! Vinyl sharity blogs used to do a lot of this work, but their methods of digitizing lining up metadata all varied quite a bit. It’s great to see this modern, uniform approach applied on a large scale!

    If you receive an album that’s full of clicks/pops/scratches, is any cleanup done to it? Or do you simply post the FLAC files as-is to preserve the spirit of the original recordings?

    I must have clicked through two dozen different recordings at random, and all of them were limited to “samples only” due to rights issues. Is there a way to filter-out these samples-only albums using the search tools?

    A lot of the albums posted have been re-released on CDs or otherwise over the years. Is that a consideration when selecting vinyl to digitize? Like would you ever say “This was remastered and re-released in 1997, so there is no need to bother digitizing this original album from 1955?”

  6. Bob Varney

    I’ve been recording and digitizing 78’s and older LPs for about a decade and contributing my work to the Archive. The 20th century saw a flood of music recordings as, for the first time in history, anyone could own a copy of a musical performance to be played back and enjoyed at any time. But the media was fragile and much will be lost if we don’t make the effort to record it now and preserve it in a manner that will be both permanent and accessible. The Archive’s LP project will make my contribution to this effort seem puny, but I’m glad to be part of this endeavor.

Comments are closed.