Category Archives: Audio Archive

Correct Metadata is Hard: a Lesson from the Great 78 Project

We have been digitizing about 8,000 78rpm record sides each month and now have 122,000 of them done. These have been posted on the net and over a million people have explored them. We have been digitizing, typing the information on the label, and linking to other information like discographies, databases, reviews and the like.

Volunteers, users, and internal QA checkers have pointing out typos, and we decided to go back over a couple of month’s metadata and found problems. And then we contracted with professional proofreaders and they found even more (2% of the records at this point had something to point out, some are matters of opinion or aesthetics, some lead to corrections).

We are going to pay the professional proofreaders to correct the 5 most important fields for all 122,000 records, but can use more help. We are pointing these out here in hopes to interest volunteer proofreaders and to share our experience in continually improving our collections.

Here are some of the issues with the primary performer field: before-the-after that we have now corrected from the June 2019 transfers (before | after) that we hope to upload in the next couple of weeks:

Jose Melis And His Latin American Ensemble | Jose Melis And His-Latin American Ensemble
Columbia-Orchestra | Columbia-Orchester
S. Formichi and T. Chelotti | S. Formichi e T. Chelotti
Dennis Daye and The Rhythmaires | Dennis Day and The Rhythmaires
Harry James and His Orchestra | Harry James and His Orch.
Charles Hart & Elliot Shaw | Charles Hart & Elliott Shaw
Peerless Quartet | Peerless Quartette

Some of the title corrections:

O Vino Fa ‘Papla (Wine Makes You Talk) | ‘O Vino Fa ‘Papla (Wine Makes You Talk)
Masked Ball Salaction | Masked Ball Selection
Moonlight and Roses (Brings Mem’ries Of You) | Moonlight and Roses (Bring Mem’ries Of You)
Que Bonita Eres Tu (You Are Beutiful) | Que Bonita Eres Tu (You Are Beautiful)
Buttered Roll | “Buttered Roll”
Paradise | “Paradise”
Got a Right to Cry | “Got a Right to Cry”
Blue Moods | “Blue Moods”
Auf Wiederseh’n Sweerheart | Auf Wiederseh’n Sweetheart
George M. Cohan Medley – Part 1 | George M. Cohan Medley – Part 2
Dewildered | Bewildered
Lolita (Seranata) | Lolita (Serenata)
Got a Right to Cry | “Got a Right to Cry” Joe Liggins and His Honeydrippers
Blue Moods | “Blue Moods”
Body and Soul | “Body and Soul”
Mais Qui Est-Ce | Mais Qui Est-Ce?
Wail Till the Sun Shines Nellie Blues | Wait Till the Sun Shines Nellie Blues
Que Te Pasa Joe (What Happens Joe) | Que Te Pasa Jose (What Happens Joe)
SAMSON AND DELILAH Softly Awakens My Heart | SAMSON AND DELILAH Softly Awakes My Heart
I’m Gonna COO, COO, COO | (I’m Gonna) COO, COO, COO

Please Donate 78rpm Records to the Internet Archive’s Great 78 Project

Good news: we have funding to preserve at least another 250,000 sides of 78rpm records, and we are looking for donations to digitize and physically preserve. We try to do a good job of digitizing and hosting the recordings and then thousands of people listen, learn, and enjoy these fabulous recordings.  

If you have 78s (or other recordings) that you would like to find a good home for, please think of us — we are a non-profit and your donations will be tax-deductible, digitized for all to hear, and physically preserved. If you are interested in donating recordings of any type or appropriate books, please start with this form and we will contact you immediately

We are looking for anything we do not already have. (We are finding 80% duplication rates sometimes, so we are trying to find larger or more niche collections).  We will physically preserve all genres, but our current funding has directed us to prioritize digitization of non-classical and non-opera.

We can pay for packing and shipping, and are getting better at the logistics for collections of a few thousand and up.  These are fragile objects and we are having good luck avoiding damage.

Tina Argumedo Collection
Daniel McNeil
Boston Public Library

The collections get highlighted and if you submit a story we will post it prominently. For instance: Boston Public Library, Daniel McNeil and Tina Argumedo’s Argentinian Tango collection.

The reason to highlight the donors is twofold: one is the celebrate the donor and their story, but the other is to help contextualize these recordings for different generations. These stories help users find meaning in the materials and find things they want to listen to. This way we can lead new listeners to love this music as the original collectors have

Working together we can broaden this collection to works from around the world and different cultural groups in each country.

If you are a private individual or an institution and have records to contribute, even if they are not 78s, please start with this simple form, or email info@archive.org, or call +1-415-561-6767 and we will contact you immediately. Thank you.

Boston Public Library’s 78rpm Records Come to the Internet: Reformatting the Boston Public Library Sound Archives

Following eighteen months of work, more than 50,000 78rpm record “sides” from the Boston Public Library’s sound archives have now been digitized and made freely available online by the Internet Archive.  

”This project and the very generous support and diversity of expertise that converged to make it possible, all ensure the Library’s sound collections are not only preserved but made accessible to a much broader audience than would otherwise ever have been possible, all in the spirit of Free to All.” said David Leonard, President of the Boston Public LIbrary.

In 2017, the Boston Public Library transferred their sound archives to the Internet Archive so that the materials could be reformatted digitally and preserved physically.  Working in collaboration with George Blood LP, using their specialty turntable and expert staff, these recordings have been digitized at high standards so that others can use these materials for research.  This is now the largest collection within the Great 78 Project, which aims to bring hundreds of thousands of 78rpm recordings to the Internet.

The records within BPL’s collection represent early twentieth century music and sound recordings from both popular and obscure artists.  78s were made from shellac, a resin secreted from female beetles, and are incredibly brittle and delicate; records can break from simple handling.  Digitizing these records is therefore the best way to preserve not only the music on the recordings but also the original artifact itself, ensuring the continued availability of the resource into the future.

After the recordings were digitized, volunteers with the Internet Archive and the Archive of Contemporary Music linked the sides to published discographies using a mix of manual techniques and custom algorithms to find dates and context.  As a result of these activities, more than 80% of the sides now have dates or links to contemporaneous reviews. Additionally, more than 250 have been matched to sheet music and displayed alongside the music, based on the digitized collections from Connecticut College.  

The inclusion of discographies was an important component of this project, providing users the necessary historical context for the recordings. CashBox Magazine was digitized and contributed by the Earl Gregg Swem Library, located at the College of William and Mary in Williamsburg, Virginia.  David Seubert of the University of California Santa Barbara contributed database exports to aid matching against UCSB’s Discography of American Historical Recordings. University of Toronto made print discographies available for research in this project.

As a result of project activities, more than 750 different labels are represented in the collection, spanning from 1901 to 1966. Highlights of the collection include early American jazz and blues recordings, such as 11 sides from the renowned Paramount Records, originally founded by the Wisconsin Chair Company.

At an event at the Boston Public Library last month, Brewster Kahle, the Digital Librarian of the Internet Archive, presented the digital files from the 50,000 sides to David Leonard, the President of the Boston Public Library. With the return of the digital files, BPL was able to unlock  access to the materials in a form that won’t damage the originals, ensuring the long-term viability of the 78s and the music recorded on them. The project was featured on-air during the Boston Public Radio program the next day, including samples from the recordings.

How can you get involved?

The Internet Archive invites other individuals and institutions to participate in this program by:

  • Uploading your digitized recordings;
  • Contributing metadata and context to the recordings;
  • Donating 78rpm records to the Internet Archive, where the they will be preserved and digitized as funding allows (and funding for mass digitization is now available);
  • Digitizing your 78’s with the same careful but cost-effective technologies from George Blood LP and then contribute the digital files, but retain the physical discs.

We would like to emphasize that “reformatting” library collections by donating the physical objects to the Internet Archive can be a model for cost effective modern access and physical preservation.  To learn more about library reformatting, please contact Chris Freeland, Director of Open Libraries.

This project was funded by the Kahle/Austin Foundation.


A Public Peek into 1923

Commercial radio broadcasting began in the 1920s, bringing entertainment, news and music into people’s homes. Now, instead of needing to play a 78rpm disc on your phonograph, you could just tune in to listen to popular songs.

And in 1923 that means you would have been listening to one of the many versions of “Yes! We Have No Bananas” written by Frank Silver and Irving Cohn.  

You could listen to the Billy Jones version (play below), the Billy Murray version, a Yiddish version, or an Italian version, among others.

Yes! We Have No Bananas by Billy Jones from the 78rpm collection

Then you could have moved on to dancing the Charleston, popularized by the song of the same name from the 1923 musical “Runnin’ Wild.”   And with the explosion of recordings by African American musicians, you could also enjoy “Baby Won’t You Please Come Home” by Bessie Smith and “Dipper Mouth Blues” by Louis Armstrong.

Autogyro (1934)

In the news of the day you saw the first flight of an autogyro (the precursor to the helicopter).

Jack Dempsey defended his World Heavyweight Championship title against Tommy Gibbons and Luis Firpo.

And Howard Carter’s team finally entered the burial chamber of King Tutankhamen, as covered in books, sheet music and song

But why are we focusing on 1923? Because for the first time in 20 years, new works are entering the public domain in the United States (read more: 1, 2, 3). And those works were all published in, you guessed it, 1923.

Settle in with a Reese’s Peanut Butter Cup, a Butterfinger, or a refreshing Popsicle (all invented in 1923!) while you watch Cecil B. DeMille’s The Ten CommandmentsThe White Sister starring Lillian Gish, or The Hunchback of Notre Dame starring Lon Chaney. Or any one of 50 other films available on archive.org from that year.

After your movie marathon, you can turn to your “new” reading materials to learn about sewing the latest women’s fashions, try an old recipe from a cook book (we recommend the Marshmallow Loaf), learn about theatrical lighting, construct yourself a bungalow (um, check the lastest building codes first), grab some sheet music, read up on Benito Mussolini, and learn “How You Can Keep Fit” from Rudolph Valentino (!).

Finally, settle in to read some Robert Frost, Virginia Woolf, Edith Wharton, or Kahlil Gibran. And while you’re here, take a look at the 20,000 other texts we have available from 1923. 

We look forward to introducing you to 1924 NEXT January!

Decades of music celebrating Audiovisual Heritage

In honor of World Day for Audiovisual Heritage (October 27) we’d like to take you on a brief tour through seven decades of digitized music and audio recordings from 1900 through 1970.  We’ve been working to digitize 78rpm discs for the Great 78 Project to preserve the heritage of the first half of the 20th century, and now we’re turning our eyes toward vinyl LPs that have fallen out of print in the Unlocked Recordings collection.

1905 – A Picnic For Two

1906 – Talmage on Infidelity (very judgy)


1912 – Till the Sands of the Desert Grow Cold

1916 – I’ll Take you Home Again, Kathleen


1920 – I Want a Jazzy Kiss (as opposed to a bluesy kiss)

1937 – A Cowboy Honeymoon (hint: includes yodeling)


1939 – The Red Army Chorus of the U.S.S.R. (when we were pals)

1945– Don’t you Worry ‘Bout That Mule” (spoiler alert – he ain’t goin’ blind)


1947 – Everything is Cool (so sayeth Bab’s 3 Bips & a Bop)

1950 – When both accordions and Hi-Fi were hip


1950 – “They’re all dressed up to go swinging and, Man, they’re a gas!” (Sonny Burke from the back cover)

1957 – Amongst fierce competition, this gem wins Most Nightmare Inducing Cover Image


1958 – Dance music from Israel

1959 – This intensely sleepy version of “Makin’ Whoopee” will send you to sleep in the lounge.


1960 – My next story is a little risque (and so is the one after that)

1961 – Recorded live at the Second City Cabaret Theatre, Chicago, Ill.


1961 – Easy winner for the worst song opening we’ve ever heard, enjoy Tiger Rag from The Percussive Twenties.

1962 – Significant improvement on the Tiger Rag from the Doowackadoodlers


1963 – “Adults only” saucy comedy

1966 – Organ-ized wins best pun, as well as having “Popular songs arranged for organ” by “Brazil’s #1 Organist”


1966 – The music stylings of Mrs. Miller are not to be missed – personal favorites are “Hard day’s night” and “These boots are made for walkin'”

1966 – The “You Don’t Have to be Jewish” Players are falling in love


1969 – The Begatting of the President

Don’t Click on the Llama

WE HAVE ONE SIMPLE REQUEST…. DO NOT CLICK ON THE LLAMA.

Clicking on the Llama will release Webamp, a javascript-based player that mimics, down to individual strangeness and bugs, the operations of the once dominant Winamp, a media player considered to be one of the classic software creations of the 1990s.

To help you avoid this llama, we’ll tell you it’s in the upper right corner of any Internet Archive item that has a music player in it. This means the Grateful Dead recordings, radio airchecks, network record labels like monotonik, and all manner of podcasts now have the capability to be turned into a Winamp-like player that becomes your new default.

(If, by mistake, you click on the Llama, clicking on it again will turn off the Webamp player and restore the default player.)

This all got started because of the skins.

As part of our celebration of all things Internet, the Archive now has a large collection of Winamp Skins, which were artistic re-imaginings of the Winamp interface, that allowed all sorts of neat creative works on what could have been a basic media player. These “skins” were contributed to over the years (and new ones are still created!) and now number in the thousands. In the collection you’ll see examples of superheroes, video games, surreal images and a pretty wide array of pop stars and celebrities.

We have added over 5,000 skins (with many more coming), and then someone had the bright idea to make the Webamp player work within the Internet Archive to show off these skins, and here we are.

Thanks to Jordan Eldredge and the Webamp programming community for this new and strange periscope into the 1990s internet past.

 

Audio / Video player updated – to jwplayer v8.2

We updated our audio/video (and TV) 3rd party JS-based player from v6.8 to v8.2 today.

This was updated with some code to have the same feature set as before, as well as new:

  • much nicer cosmetic/look updates
  • nice “rewind 10 seconds” button
  • controls are now in an updated control bar
  • (video) ‘Related Items’ now uses the same (better) recommendations from the bottom of an archive.org /details/ page
  • Airplay (Safari) and Chromecast basic casting controls in player
  • playback speed rate control now easier to use / set
  • playback keyboard control with SPACE and left , right and up, down keys
  • (video) Web VTT (captions) has much better user interface and display
  • flash is now only used to play audio/video if html5 doesnt work (flash does not do layout or controls now)

Here’s some before / after screenshots:

Mass downloading 78rpm record transfers

To preserve or discover interesting 78rpm records you can download them to your own machine (rather than using our collection pages).  You can download lots on to a mac/linux machine by using a command line utility.

Preparation:  Download the IA command line tool.     Like so:

$ curl -LO https://archive.org/download/ia-pex/ia
$ chmod +x ia
$ ./ia help

Option 1:   if you want just a set of mp3’s to play download to your /tmp directory:

./ia download --search "collection:georgeblood" --no-directories --destdir /tmp -g "[!_][!7][!8]*.mp3"

or just blues (or hillbilly or other searches):

./ia download --search "collection:georgeblood AND blues" --no-directories --destdir /tmp -g "[!_][!7][!8]*.mp3"

Option 2: if you want to preserve the FLAC and MP3 and metadata files for the best version of the 78rpm record we have.  (if you are using a Mac Install homebrew on a mac, then type “brew install parallel”.  On linux try “apt-get install parallel”)

./ia search 'collection:georgeblood' --sort=publicdate\ asc --itemlist > itemlist.txt
cat itemlist.txt | parallel --joblog download.log './ia download {} --destdir /tmp -g "[!_][!7][!8]*"'

parallel --retry-failed --joblog download.log './ia download {} --destdir /tmp -g "[!_][!7][!8]*"'

Building Digital 78rpm Record Collections Together with Minimal Duplication

78_mama-yo-quiero_joaquin-garay-al-wallace-orchestra-e-b-marks_gbia0034720aBy working together, libraries who are digitizing their collections can minimize duplication of effort in order to save time and money to preserve other things.  This month we made progress with 78rpm record collections.

The goal is to bring many collections online as cost effectively as possible. Ideally, we want to show each online collection as complete but only digitize any particular item once. Therefore one digitized item may belong virtually to several collections. We are now doing this with 78rpm records in the Great 78 Project.

It starts with great collections of 78s (18 contributors so far). For each record, we look up the record label, catalog number, and title/performer, to see if we have it already digitized. If we have it already, then we check the condition of the digitized one against the new one– if we would improve the collection, we digitize the new one. If we do not need to digitize it, we add a note to the existing item that it now also belongs to another collection, as well as note where the duplicate physical item can be found.

For instance, the KUSF collection we are digitizing has many fabulous records we have never seen before including sound effect records.  But about half are records we have digitized better copies of before, so we are not digitizing most of those. We still attribute the existing digital files to the KUSF collection so it will have a digital file in the online collection for each of their physical discs.

It takes about half the time to find a record is a duplicate than to fully digitize it, and given that we are now seeing about half of our records not needing to be digitized, we are looking for ways to speed this up.

OCLC has many techniques to help with deduplication of books and we are starting to work with them on this, but for 78s we are making progress in this way. Please enjoy the 78s.

Thank you to GeorgeBlood L.P., Jake Johnson, B. George, and others.

The 20th Century Time Machine

by Nancy Watzman & Katie Dahl

Jason Scott

With the turn of a dial, some flashing lights, and the requisite puff of fog, emcees Tracey Jaquith, TV Architect, and Jason Scott, Free Range Archivist, cranked up the Internet Archive 20th Century Time Machine on stage before a packed house at the Internet Archive’s annual party on October 11.

Eureka! The cardboard contraption worked! The year was 1912, and out stepped Alexis Rossi, director of Media and Access, her hat adorned with a 78rpm record.

1912

D’Anna Alexander (center) with her mother (right) and grandmother (left).

“Close your eyes and listen,” Rossi asked the audience. And then, out of the speakers floated the scratchy sounds of Billy Murray singing “Low Bridge, Everybody Down” written by Thomas S. Allen. From 1898 to the 1950s, some three million recordings of about three minutes each were made on 78rpm discs. But these discs are now brittle, the music stored on them precious. The Internet Archive is working with partners on the Great 78 Project to store these recordings digitally, so that we and future generations can enjoy them and reflect on our music history. New collections include the Tina Argumedo and Lucrecia Hug 78rpm Collection of dance music collected in Argentina in the mid-1930s.

1927

Next to emerge from the Time Machine was David Leonard, president of the Boston Public Library, which was the first free, municipal library founded in the United States. The mission was and remains bold: make knowledge available to everyone. Knowledge shouldn’t be hidden behind paywalls, restricted to the wealthy but rather should operate under the principle of open access as public good, he explained. Leonard announced that the Boston Public Library would join the Internet Archive’s Great 78 Project, by authorizing the transfer of 200,000 individual 78s and LPs to preserve and make accessible to the public, “a collection that otherwise would remain in storage unavailable to anyone.”

David Leonard and Brewster Kahle

Brewster Kahle, founder and Digital Librarian of the Internet Archive, then came through the time machine to present the Internet Archive Hero Award to Leonard. “I am inspired every time I go through the doors,” said Kahle of the library, noting that the Boston Public Library was the first to digitize not just a presidential library, of John Quincy Adams, but also modern books.  Leonard was presented with a tablet imprinted with the Boston Public Library homepage by Internet Archive 2017 Artist in Residence, Jeremiah Jenkins.

1942

Kahle then set the Time Machine to 1942 to explain another new Internet Archive initiative: liberating books published between 1923 to 1941. Working with Elizabeth Townsend Gard, a copyright scholar at Tulane University, the Internet Archive is liberating these books under a little known, and perhaps never used, provision of US copyright law, Section 108h, which allows libraries to scan and make available materials published 1923 to 1941 if they are not being actively sold. The name of the new collection: the Sony Bono Memorial Collection, named for the now deceased congressman and former representative who led the passage of the Copyright Term Extension Act of 1998, which included the 108h provision as a “gift” to libraries.

One of these books includes “Your Life,” a tome written by Kahle’s grandfather, Douglas E. Lurton, a “guide to a desirable living.” “I have one copy of this book and two sons. According to the law, I can’t make one copy and give it to the other son. But now it’s available,” Kahle explained.

1944

Sab Masada

The Time Machine cranked to 1944, out came Rick Prelinger, Internet Archive Board member, archivist, and filmmaker. Prelinger introduced a new addition to the Internet Archive’s film collection: long-forgotten footage of an Arkansas Japanese internment camp from 1944.  As the film played on the screen, Prelinger welcomed Sab Masada, 87, who lived at this very camp as a 12-year-old.

Masada talked about his experience at the camp and why it is important for people today to remember it. “Since the election I’ve heard echoes of what I heard in 1942,” Masada said. “Using fear of terrorism to target the Muslims and people south of the border.”

1972

Next to speak was Wendy Hanamura, the director of partnerships. Hanamura explained how as a sixth grader she discovered a book at the library, Executive Order 9066, published in 1972, which chronicled photos of Japanese internment camps during World War II.

“Before I was an internet archivist, I was a daughter and granddaughter of American citizens who were locked up behind barbed wire in the same kind of camps that incarcerated Sab,” said Hanamura. That one book – now out of print – helped her understand what had happened to her family.

Inspired by making it to the semi-final round of the MacArthur 100&Change initiative with a proposal that provides libraries and learners with free digital access to four million books, the Internet Archive is forging ahead with plans, despite not winning the $100 million grant. Among the books the Internet Archive is making available: Executive Order 9066.

1985

The year display turned to 1985, Jason Scott reappeared on stage, explaining his role as a software curator. New this year to the Internet Archive are collections of early Apple software, he explained, with browser emulation allowing the user to experience just what it was like to fire up a Macintosh computer back in its hay day. This includes a collection of the then wildly popular “HyperCards,” a programmatic tool that enabled users to create programs that linked materials in creative ways, before the rise of the world wide web.

1997

After Vinay Goelthis tour through the 20th century, the Time Machine was set to 1997. Mark Graham, Director of the Wayback Machine and Vinay Goel, Senior Data Engineer, stepped on stage. Back in 1997, when the Wayback Machine began archiving websites on the still new World Wide Web, the entire thing amounted to 2.2 terabytes of data. Now the Wayback Machine contains 20 petabytes. Graham explained how the Wayback Machine is preserving tweets, government websites, and other materials that could otherwise vanish. One example: this report from The Rachel Maddow Show, which aired on December 16, 2016, about Michael Flynn, then slated to become National Security Advisor. Flynn deleted a tweet he had made linking to a falsified story about Hillary Clinton, but the Internet Archive saved it through the Wayback Machine.

Goel took the microphone to announce new improvements to Wayback Machine Search 2.0. Now it’s possible to search for keywords, such as “climate change,” and find not just web pages from a particular time period mentioning these words, but also different format types — such as images, pdfs, or yes, even an old Internet Archive favorite, animated gifs from the now-defunct GeoCities–including snow globes!

Thanks to all who came out to celebrate with the Internet Archive staff and volunteers, or watched online. Please join our efforts to provide Universal Access to All Knowledge, whatever century it is from.

Editor’s Note, 10/16/17: Watch the full event https://archive.org/details/youtube-j1eYfT1r0Tc