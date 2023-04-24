Hopefully we have a dataset primed for AI researchers to do something really useful, and fun– how to take noise out of digitized 78rpm records.
The Internet Archive has 1,600 examples of quality human restorations of 78rpm records where the best tools were used to ‘lightly restore’ the audio files. This takes away scratchy surface noise while trying not to impair the music or speech. In the items are files in those items are the unrestored originals that were used.
But then the Internet Archive has over 400,000 unrestored files that are quite scratchy and difficult to listen to.
The goal is, or rather the hope is, that a program that can take all or many of the 400,000 unrestored records and make them much better. How hard this is is unknown, but hopefully it is a fun project to work on.
Many of the recordings are great and worth the effort. Please comment on this post if you are interested in diving in.
This project sounds somewhat ill-suited to AI in its current form for numerous reasons–click and noise removal are not a one-size-fits-all problem, and each recording needs to be addressed on its own merits. Even the best click removal algorithms will still mistake loud brass notes for clicks because they strongly resemble clicks on a spectrogram. Anything more than moderate click cleanup inevitably screws up the music unless it’s closely monitored and done in manageable chunks. It’s far better to use current algorithms and some form of batch processing. You might be better suited to acquiring one or more CEDAR Cambridge computers with the server pack and doing batch processing on the archive. Using somewhat conservative settings would do much better than most of the “cleaned up” examples. Frankly, most of those are not very good exemplars.
While batch processing could be very good, having a well-informed engineer work on them all individually will always be * much * better. There are no shortcuts.
Surface noise manifests in a different pattern on every single record. AI therefore would have a very hard time drawing noise removal conclusions from other recordings. The best solution we currently have is CEDAR’s Auto-Dehiss and NR5 modules. I’m sure those could be improved–it’s probably best to collaborate with them than reinvent the wheel.
Yes, it is true, people will be better at this. We just have so many recordings, it would be helpful to automate. if you would be up for trying the CEDAR tools, we would love the help.
Am interested in helping
excellent. Pls see some of the other replies in this thread to see some examples.
I would definitely be interested!
Thank you! would love the help. the restored versions are named -restored.flac in the filenames, for instance:
in:
https://archive.org/download/78_goodnight-irene_hudie-ledbetter-john-lomax-gordon-jenkins-and-his-orchestra-the-wea_gbia0000900a
the original has the same filename with out the “-restored”
(many of these have their originals in a zip file, but they are there).
I’m asking around to see what we’ve got here at Stability, Some of our generative audio models have proven useful for denoising, so I think we can probably do something here. Will poke at it and see what comes out.
The collection of high-quality ones is useful, any chance there’s some way to find a sampling of songs that are particularly bad? Might be nice to start off with the really noisy / damaged ones, and then see how that generalizes to noise reduction on the rest of the collection.
This would be spectacular. Each of those that are “restored” has a bad one it came from.
this original is not too bad, but it is better restored: https://archive.org/download/78_aba-daba-honeymoon_bea-abbott–jerry-keller-bud-dinwiddie–his-orch.-arthur-field_gbia0011476b/Aba%20Daba%20Honeymoon%20-%20Bea%20Abbott%20%26%20Jerry%20Keller-restored.flac
vs.
https://archive.org/download/78_aba-daba-honeymoon_bea-abbott–jerry-keller-bud-dinwiddie–his-orch.-arthur-field_gbia0011476b/_01%20Aba%20Daba%20Honeymoon%20-%20Bea%20Abbott%20%26%20Jerry%20Keller.flac
in https://archive.org/details/78_aba-daba-honeymoon_bea-abbott–jerry-keller-bud-dinwiddie–his-orch.-arthur-field_gbia0011476b
in https://archive.org/details/78_goodnight-irene_hudie-ledbetter-john-lomax-gordon-jenkins-and-his-orchestra-the-wea_gbia0000900a/_01+Goodnight+Irene+-+Hudie+Ledbetter+-+John+Lomax.flac
vs.
https://archive.org/details/78_goodnight-irene_hudie-ledbetter-john-lomax-gordon-jenkins-and-his-orchestra-the-wea_gbia0000900a/Goodnight+Irene+-+Hudie+Ledbetter+-+John+Lomax-restored.flac
and we can help get dumps of these if that is helpful.