Author Archives: Faye Lessler

How the Internet Archive is Digitizing LPs to Preserve Generations of Audio

Last updated January 29th, 2020.

Albums available in the Boston Public Library Vinyl LP Collection on Archive.org.

Imagine if your favorite song or nostalgic recording from childhood was lost forever. This could be the fate of hundreds of thousands of audio files stored on vinyl, except that the Internet Archive is now expanding its digitization project to include LPs. 

Earlier this year, the Internet Archive began working with the Boston Public Library (BPL) to digitize more than 100,000 audio recordings from their sound collection. The recordings exist in a variety of historical formats, including wax cylinders, 78 rpms, and LPs. They span musical genres including classical, pop, rock, and jazz, and contain obscure recordings like this album of music for baton twirlers, and this record of radio’s all-time greatest bloopers

Unfortunately, many of these audio files were never translated into digital formats and are therefore locked in their physical recording. In order to prevent them from disappearing forever when the vinyl is broken, warped, or lost, the Internet Archive is digitizing these at-risk recordings so that they will remain accessible for future listeners.


“The LP was our primary musical medium for over a generation. From Elvis, to the Beatles, to the Clash, the LP was witness to the birth of both Rock & Roll and Punk Rock. It was integral to our culture from the 1950s to the 1980s and is important for us to preserve for future generations.”

– CR Saikley, Director of Special Projects, Internet Archive

Since all of the information on an LP is printed, the digitization process must begin by cataloging data. High-resolution scans are taken of the cover art, the disc itself and any inserts or accompanying materials. The record label, year recorded, track list and other metadata are supplemented and cross-checked against various external databases. 


High resolution imaging of album cover art. The boxed area is shown at high resolution at right.


“We’re really trying to capture everything about this artifact, this piece of media. As an archivist, that’s what we want to represent, the fullness of this physical object.”

– Derek Fukumori, Internet Archive Engineer

Once cataloged, the LP’s are then digitized. The Internet Archive partners with Innodata Knowledge Services, an organization focused on machine learning and digital data transformation, to complete the digitization process at their facilities in Cebu, Philippines. An Innodata worker digitizes 12 LPs at a time, setting turntables to play and record by hand, then turning each record over to the next side. Since each LP is digitized in real time, it takes a full 20 minutes to record an average LP side. By operating 12 turntables simultaneously, the team expects to be able to digitize ten LPs per hour.


Audio stations complete with turntables & recording equipment set up in Cebu, Philippines.

Once recorded, there is a large FLAC file for each side of the LP, which needs to be segmented so listeners can easily begin at the desired song. There are two different algorithms used for segmenting; the first one looks at images of the vinyl disc to locate gaps in its grooves, which usually line up with gaps between songs. A second algorithm listens to the audio file to find the silent spaces between songs. When these two algorithms align, our engineers have a good measure of confidence that the machine has found the proper tracks.

These algorithms currently predict segmenting with about 85% to 95% accuracy, but some audio files are more difficult. For example, recordings of live music fill in the spaces between songs with applause, while classical music utilizes silence as part of a song. In order to account for these anomalies, digitized LP files are always checked manually before being added to the online database.

Identifying the empty spaces between songs for segmenting.

Currently, there are more than 5,800 LPs from the Boston Public Library LP collection available on Archive.org. The Internet Archive continues to digitize the remainder of the BPL collection in addition to more than 285,000 LPs that have been donated by others. The organization aims to engage a greater community of LP and 78 rpm enthusiasts by welcoming contributions and improvements to the recorded metadata. Many of the audio files online can be listened to in full, but some of the albums are only available in 30 second snippets due to rights issues.


“The complexity of properly digitizing LPs has been an evolving challenge, but thanks to the help of friends of the Archive, our in-house expertise, and the dedication of Innodata, I’m confident we’ve nailed it.”

– Merlijn Wajer, Internet Archive Developer

For decades, vinyl records were the dominant storage medium for every type of music and are ingrained in the memories and culture of several generations. Despite the challenges, the Internet Archive is determined to preserve these at-risk records so that they can be heard online by new audiences of scholars, researchers, and music lovers around the world.


ABOUT THE AUTHOR: Faye Lessler is a California-born, Brooklyn-based freelance writer and founder of lifestyle blog, Sustaining Life. She is an expert in mission-driven communications and enjoys writing while sipping black tea in a beam of sunshine.

Preserving Bali’s Cultural & Literary History through the Palm Leaf Project

Image of Lontar palm leaf book in Balinese script. (Image by Tropenmuseum)

Lontar palm leaf book in Balinese script (Image by Tropenmuseum).

What is lost when globalization dictates modern culture? In Bali, it’s centuries of literature. The Balinese language is still commonly spoken, but the ability to read and write literary works in the Balinese script has largely been lost. Since much of Bali’s culture and history is told in written manuscripts called lontars, the Internet Archive and the linguists at PanLex are teaming up with a group of local Balinese supporters to build new technologies and tools to keep their script and literary culture alive.

Culture is made up of a million little pieces of history, ritual, and everyday life, and that’s exactly what’s written down on Bali’s lontars. These palm leaf manuscripts date back hundreds of years; their subjects include advice on how to build a temple, how to make traditional medicine, and even how to choose the best cock to bet on in a cockfight, based on the date in the Balinese calendar.

Unfortunately, these ancient teachings—which were created by etching script into dried palm leaves and blackening the words with soot—were in danger of being lost forever due to humidity and time. And although they contain vital pieces of Bali’s rich cultural heritage, the lontars are unreadable for most Balinese who conduct their modern lives more and more in Indonesian.

Photograph of lontar leaf from Carcan Meyong (a taxonomy of cats) on PalmLeaf.org

Photograph of lontar leaf from Carcan Meyong (a taxonomy of cats) on PalmLeaf.org

So in 2011, the Internet Archive launched a project with the Culture Office of Bali to photograph and upload to archive.org some 3,000 lontar manuscripts made up of 130,000 palm leaves—making up “90% of Bali’s literature,” according to Bali’s Minister of Culture. The Internet Archive has preserved these texts in the Balinese Digital Library collection, but they realized that simply digitizing the lontars was not enough, as the resulting images were not easy to share or understand.


This year, the Internet Archive began working with PanLex, an organization dedicated to keeping the world’s languages alive, to engineer methods to transcribe some 3,000 palm leaves online. The team discovered that keyboards do not easily support Balinese script and that there was no complete font for the language, so PanLex worked with a font designer to create new fonts and created a new Keyman keyboard that enables users to type Unicode Balinese script on standard keyboards. These new tools empower more people to participate in transcription and makes it easier to use another tool that auto-generates a Romanized version of Balinese. Transforming the Balinese lontars from lifeless PDFs to machine-readable text means the rich cultural information contained within the lontars will be easier to read, format, and share.

The word mantra in Balinese script, rendered correctly in the Vimala font (left) and incorrectly in the Noto Serif Balinese font (right). PanLex worked with font designers to make Balinese script keyable online.

Now thanks to more than 15 local Balinese contributors who are transcribing the lontars online, the digitized texts are published to PalmLeaf.org, where they are available for all. This community-curated Wiki encourages participation by anyone who wishes to contribute to transcribing and translating the lontars. By reading and transcribing the lontars, community partners have an opportunity to absorb the knowledge they contain and share it throughout the world. Through this work, young Balinese are finding more ownership of and connection to their cultural heritage.

“Part of what we’re hoping to change with PalmLeaf.org is to enable the community to take charge of the project and decide how it develops in the future,” says David Kamholz, Project Director at PanLex. “Our goal is to support their hopes of keeping traditional lontar techniques alive and exciting more people to read and be involved in Balinese literature (lontar).”

Image of PanLex Director David Kamholz (2nd from the left) working with local Balinese partners.

PanLex Director David Kamholz (2nd from the left) working with local Balinese partners.

In a time when so many voices around the world go unheard online due to the language they speak, community involvement in this project is imperative. By transcribing the lontars in their original script, the project team places Balinese front and center, helping to normalize the use of Balinese online. Team members in Bali say that preserving these important texts and encouraging the use of their original language supports the social, cultural, and economic well-being for the people of Bali. 

“This work is very helpful to us in Bali. Not everyone has the ability yet to read lontar. This opens access for more of us to learn about and study our literature.”

–Carma Citrawati, Balinese Transcriber

ABOUT THE AUTHOR: Faye Lessler is a California-born, Brooklyn-based freelance writer and founder of lifestyle blog, Sustaining Life. She is an expert in mission-driven communications and enjoys writing while sipping black tea in a beam of sunshine.