Dreaming of Semantic Audio Restoration at a Massive Scale

I believe we can do a fabulous job of bringing the music from the 78rpm era back to vibrant life if we really understand wear and if we could model the instruments and voices.

In other words, I believe we could reconstruct a performance by semantically modeling the noise and distortion we want to get rid of, as well as modeling the performer’s instruments.

To follow this reasoning—what if we knew we were examining a piano piece and knew what notes were being played on what kind of piano and exactly when and how hard for each note—we could take that information to make a reconstruction by playing it again and recording that version. This would be similar to what optical character recognition (OCR) does with images of pages with text—it knows the language and it figures out the words on the page and then makes a new page in a perfect font. In fact, with the OCR’ed text, you can change the font, make it bigger, and reflow the page to fit on a different device.

What if we OCR’ed the music? This might work well for the instrumental accompaniment, because then we would handle a voice, if any, differently. We could have a model of the singer’s voice based on not only this recording and other recordings of this song, but also all other recordings of that singer. With those models we could reconstruct the voice without any noise or distortion at all.

We would balance the reconstructed and the raw signals to maintain the subtle variations that make great performances.   This could also be done for context as sometimes digital filmmakers add in some scratched film effects.

So, there can be a wide variety of restoration tools if we make the jump into semantics and big data analysis.

The Great 78 Project will collect and digitize over 400,000 digitized 78rpm recordings to make them publicly available, creating a rich data set to do large scale analysis. These transfers are being done with four different styli shapes and sizes at the same time, and all recorded at 96KHz/24bit lossless samples, and in stereo (even though the records are in mono, this provides more information about the contours of the groove). This means each groove has 8 different high-resolution representations of every 11 microns. Furthermore, there are often multiple copies of the same recording that would have been stamped and used differently. So, modeling the wear on the record and using that to reconstruct what would have been on the master may be possible.

Many important records from the 20th century, such as jazz, blues, and ragtime, have only a few performers on each, so modeling those performers, instruments, and performances is quite possible.  Analyzing whole corpuses is now easier with modern computers, which can provide insights beyond restoration as well as understand playing techniques that are not commonly understood.

If we build full semantic models of instruments, performers, and pieces of music, we could even create virtual performances that never existed.  Imagine a jazz performer virtually playing a song that had not been written in their lifetime. We could have different musician combinations, or singers performing with different cadences. Areas for experimentation abound once we cross the threshold of full corpus analysis and semantic modeling.

We hope the technical work done on this project will have a far-reaching effect on a full media type since the Great 78 Project will digitize and hold a large percentage of all 78rpm records ever produced from 1908 to 1950.  Therefore, any techniques that are built upon these recordings can be used to restore many many records.

Please dive in and have fun with a great era of music and sound.

 

(we get a sample every 11microns when digitizing the outer rim of a 78rpm record at 96KHz.   And given we now have 8 different readings of that, with 24bit resolution, we hopefully can get a good idea of the groove.   There are optical techniques that are very cool, but those have their own issues, I am told

10″ * 3.14 = 31.4″ circumference = 80cm/revolution

@ 78rpm:  60 seconds/min / 78revolutions/minute = .77 seconds / revolution

80cm/rev   / (.77sec/rev)  = 104cm/sec

96Ksampes/sec

104cm/sec / (96ksamples/sec) = 11microns )

 

3 thoughts on “Dreaming of Semantic Audio Restoration at a Massive Scale

  1. Pingback: The Internet Archive Announces Ambitious Project to Collect, Digitize, and “Semantically Restore” 78rpm Era Audio Recordings | LJ INFOdocket

  2. anonymous sustaining member

    Please more posts like this, about super interesting archival work.

    Please fewer posts about the minutiae of American partisan politics.

  3. Steven Van Impe

    Overlaying recordings of different copies of the same 78 could produce a recording with reduced noise, but adding information that is not there will never recreate the true sound. OCR is different because you’re looking at characters that are all the same, whereas sound is a wave of endless complexity that is compressed by leaving out some of that complexity. This is not a random process, it’s a gate: a part of the sound passes, another part doesn’t pass. It never ends up on any record and you’re never going to be able to retrieve it.

Comments are closed.