The audio archive contains recordings ranging from alternative news programming, to Grateful Dead concerts, to Old Time Radio shows, to book and poetry readings, to original music uploaded by our users.
Founded in 2005, Librivox is a community of volunteers from all over the world who record audio versions of public domain texts: poetry, short stories, whole books, even dramatic works, in many different languages.
The Live Music Archive is a community committed to providing the highest quality live concerts in a lossless, downloadable format, along with the convenience of on-demand streaming.
The Internet Arcade is a web-based library of arcade (coin-operated) video games from the 1970s through to the 1990s, emulated in JSMAME, part of the JSMESS software package. Containing hundreds of games ranging through many different genres and styles, the Arcade provides research, comparison, and entertainment in the realm of the Video Game Arcade.
Many items are added to the Internet Archive’s collections every month, by us and by our patrons. Here’s a round up of some of the new media you might want to check out. Logging in might be required to borrow certain items.
Notable new collections:
Shakemore Festival (2007-present) Maryland: The Shakemore Music Festival is an annual weekend event comprised of 20 to 30 acts, featuring frequent appearances from what has become a large Shakemore family of bands (including Bang Bang Lulu, Caching Behavior, Cigarbox Planetarium, Go Pills, Weird Paul Rock Band, and many others).
Quantum Leap Podcast: The Quantum Leap Podcast talks about every episode of the cult hit time travel program, as well as the novels and comic books it inspired.
This month we’ve added books on varied subjects in more than 20 languages. Click through to explore, but here are a few interesting items to start with:
The audio archive contains recordings ranging from alternative news programming, to Grateful Dead concerts, to Old Time Radio shows, to book and poetry readings, to original music uploaded by our users.
Founded in 2005, Librivox is a community of volunteers from all over the world who record audiobooks of public domain texts in many different languages.
The Live Music Archive is a community committed to providing the highest quality live concerts in a lossless, downloadable format, along with the convenience of on-demand streaming (all with artist permission).
This collection hosts complete, freely downloadable/streamable, often Creative Commons-licensed catalogs of ‘virtual record labels’. These ‘netlabels’ are non-profit, community-built entities dedicated to providing high quality, non-commercial, freely distributable MP3/OGG-format music for online download in a multitude of genres.
By 1922 we were solidly in the Jazz Age – F. Scott Fitzgerald’s Tales of the Jazz Age was published in 1922, and the term was already in popular usage. Jazz migrated from Black American communities in New Orleans into the rest of the United States, having evolved from its roots in rag time, blues and Creole music. In fact, 1922 was the year Louis Armstrong left New Orleans to join King Oliver’s Creole Jazz Band in Chicago.
Alexander’s Ragtime Band (1911) written by Irving Berlin and performed by Collins and Harlan
Early recordings by Bert Williams (the first Black American on Broadway and the first Black man to star in a film), Fanny Brice (the real-life ‘Funny Girl’), Enrico Caruso (the legendary Italian operatic tenor), and so many others give life and flavor to our imaginings of the early 20th century.
Here are some of the top songs from 1922, to give you a taste:
But personally when I “flip through” these records I’m always drawn to the novelty songs.
There’s a whole genre of sound imitations, like Violin Mimicry where a violin is used to imitate people talking, Jingles from the Marsh Birds with a man imitating birds imitating popular songs (just as confusing as it sounds), and A Cat-astrophe with people imitating rather catastrophic cats to music.
As usual, we are also welcoming some new books, movies, journals, and sheet music – this time from 1926! (Read about 1925, 1924, and 1923 in previous posts.)
The Clothes We Wear (1926) by Frank and Frances Carpenter
Other interesting books from 1926 that you might want to explore include Show Boat by Edna Ferber which was made into the musical Show Boat in 1927 with music by Jerome Kern, The Clothes We Wear by Frank and Frances Carpenter which is a child friendly exploration of how clothes are made all the way from the field through weaving and into sewing, or The Art of Kissing by Clement Wood which is pretty self explanatory.
We invite you to explore some of the other items dated 1926 in our collections to find your own fun items that may now be in the public domain.
Virtual Party for the Public Domain
Please join us for a virtual party on January 20, 2022 at 1pm Pacific/4pm Eastern time with a keynote from Senator Ron Wyden, champion of the Music Modernization Act and a bunch of musical acts, dancers, historians, librarians, academics, activists and other leaders from the Open world! (And yes, we DO have a book from 1926 about how to throw the world’s best party.)
When professor Jason Luther wants students in his Intro to Writing Arts class to learn about multimodal composition, he has them go to the Internet Archive for inspiration.
Students peruse 78rpm records going back to the early 20th century to find just the right one for their assignment. There is no lack of material with more than 300,000 recordings from 1898 through the 1950s preserved. They are available to the public because of the collaborative Great 78 Project.
Although the students are enrolled at Rowan University in New Jersey, many are participating remotely from their homes this year because of the pandemic, and the materials are conveniently available digitally to them from anywhere.
“If the [Great 78 Project] didn’t exist, I don’t think I would have this curriculum at all,” said Luther, assistant professor for Writing Arts in the Ric Edelman College of Communication & Creative Arts at Rowan. “What I really like is the research challenge. It’s really powerful. So many times students have recovered the lost histories of these songs.”
For The Phono Project, Rowan students create podcasts and social media posts about recordings in the Archive’s 78s collection. They also tap into primary sources on the Archive to write the history of the songs. They can write about the stories behind songs like the Billie Holiday classic “God Bless the Child,” or John Lee Hooker’s “Boogie Chillen” from 1948. Many gravitate to artists like Elvis Presley or Frank Sinatra, but Luther tries to get them to branch out—especially now that there are more than 200 stories in the project’s collection.
Luther developed the project in 2018 as part of the “Technologies and Future of Writing” module in the writing course. Students have just eight classes to complete the 1-3 minute podcasts, in which they learn to master a mix of audio tools and editing skills using Audacity and WordPress. The course covers issues of compatibility and ownership, along with instruction on the economy of writing like a critic about lyrics and culture. For one recent class session, he invited Liz Rosenberg of the Archive to be a guest speaker and talk about the organization’s work and the Great 78 Project.
In the future, Luther said he would like to find more ways to incorporate some of the Archive’s collection into his curriculum. For instance, he may have students use primary source documents from independent publishers over time to craft something tangible, such as an actual history from those materials that could be passed along. “That’s one of the neat things about accretion,” he said. “We have the creativity, but then there’s also documents on the Archive that are helping us understand the 78s themselves. It’s such a vast resource.”
Incorporating materials from the Internet Archive into your course curriculum is easy. Each semester we hear from instructors doing so worldwide. Let us know how you are weaving Internet Archive media into your classes by writing to us at info@archive.org.
Radio remains one of the most-consumed forms of traditional media today, with 89% of Americans listening to radio at least once a week as of 2018, a number that is actually increasing during the pandemic. News is the most popular radio format and 60% of Americans trust radio news to “deliver timely information about the current COVID-19 outbreak.”
Local talk radio is home to a diverse assortment of personality-driven programming that offers unique insights into the concerns and interests of citizens across the nation. Yet radio has remained stubbornly inaccessible to scholars due to the technical challenges of monitoring and transcribing broadcast speech at scale.
Debuting this past July, the Internet Archive’s Radio Archive uses automatic speech recognition technology to transcribe this vast collection of daily news and talk radio programming into searchable text dating back to 2016, and continues to archive and transcribe a selection of stations through present, making them browsable and keyword searchable.
Ngrams data set
Building on this incredible archive, the GDELT Project and I have transformed this massive archive into a research dataset of radio news ngrams spanning 26 billion English language words across portions of 550 stations, from 2016 to the present.
You can keyword search all 3 million shows, but for researchers interested in diving into the deeper linguistic patterns of radio news, the new ngrams dataset includes 1-5grams at 10 minute resolution covering all four years and updated every 30 minutes. For those less familiar with the concept of “ngrams,” they are word frequency tables in which the transcript of each broadcast is broken into words and for each 10 minute block of airtime a list is compiled of all of the words spoken in those 10 minutes for each station and how many times each word was mentioned.
Some initial research using these ngrams
How can researchers use this kind of data to understand new insights into radio news?
The graph below looks at pronoun usage on BBC Radio 4 FM, comparing the percentage of words spoken each day that were either (“we”, “us”, “our”, “ours”, “ourselves”) or (“i”, “me”, “i’m”). “Me” words are used more than twice as often as “we” words but look closely at February of 2020 as the pandemic began sweeping the world and “we” words start increasing as governments began adopting language to emphasize togetherness.
“We” (orange) vs. “Me” (blue) words on BBC Radio 4 FM, showing increase of “we” words beginning in February 2020 as Covid-19 progresses
TV vs. Radio
Combined with the television news ngrams that I previously created, it is possible to compare how topics are being covered across television and radio.
The graph below compares the percentage of spoken words that mentioned Covid-19 since the start of this year across BBC News London (television) versus radio programming on BBC World Service (international focus) and BBC Radio 4 FM (domestic focus).
All three show double surges at the start of the year as the pandemic swept across the world, a peak in early April and then a decrease since. Yet BBC Radio 4 appears to have mentioned the pandemic far less than the internationally-focused BBC World Service, though the two are now roughly equal even as the pandemic has continued to spread. Over all, television news has emphasized Covid-19 more than radio.
Covid-19 mentions on Television vs. Radio. The chart compares BBC News London (TV) in blue, versus BBC World Service (Radio) in orange and BBC Radio 4 FM (Radio) in grey.
For now, you can download the entire dataset to explore on your own computer but there will also be an interactive visualization and analysis interface available sometime in mid-Spring.
It is important to remember that these transcripts are generated through computer speech recognition, so are imperfect transcriptions that do not properly recognize all words or names, especially rare or novel terms like “Covid-19,” so experimentation may be required to yield the best results.
Researchers can ask questions that for the first time simultaneously look across audio, video, imagery and text to understand how ideas, narratives, beliefs and emotions diffuse across mediums and through the global news ecosystem. Helping to seed the future of such at-scale research, the Internet Archive and GDELT are collaborating with a growing number of media archives and researchers through the newly formed Media Data Research Consortium to better understand how critical public health messaging is meeting the challenges of our current global pandemic.
About Kalev Leetaru
For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.
On January 1st, 2021, many books, movies and other media from 1925 will enter the public domain in the United States. Some of them are quite famous — jump ahead to see lists of those well known books and movies that you can enjoy on the Internet Archive — or take the scenic route with me.
What does this all mean? Essentially, many items created in 1925 in the US that are still under copyright will become free and open for people to use in any way they see fit in the new year. But check out Duke Law’s Center for the Study of the Public Domain article for a more in-depth explanation.
As part of this yearly ritual, I explore our collections to unearth these newly freed items, and I invariably run across a few things that hit a nerve. This year, it started with this intertitle in “Isn’t Life Terrible?” Less than 20 seconds into this 1925 film, and suddenly I’m dumped back into 2020.
Rude, right? I don’t even have a front yard to enjoy during shelter in place.
Gondolas still glide under the Bridge of Sighs, and the Tower of Pisa is still leaning, but the 1925 version of the Colosseum certainly lacks today’s fake gladiator photo ops.
Looking at the past with the eyes of today
Every toe dipped into the past has the potential to surprise or shock. The story of a pantry shelf, an outline history of grocery specialties is only mildly interesting on the surface. Essentially, it’s a sales pitch to food manufacturers encouraging them to advertise in a set of women’s magazines. The book contains short case histories of successful food brands like Maxwell House Coffee, Campbell Soup, Coca Cola, etc. (all of whom advertise with them, naturally).
The book gives you a glimpse of why people were so enthusiastic about mass produced, packaged foods. Unsanitary conditions, bugs in your sugar, milk going bad over night; things modern shoppers never think about.
It puts this glowing praise of Kraft Cheese into perspective: “…a pasteurized product, blended to obtain a uniformity of quality and flavor, a thing greatly lacking in ordinary types of cheese.” (page 149)
That’s pretty entertaining if you’re a cheese lover. I think most people would agree that Kraft cheese is no longer on the cutting edge.
But keep poking around and you find a much deeper cultural divergence. While The story of a pantry shelf is extolling the virtues of the home economics training available at Cornell, you stumble across this horrifying sentence (page 12).
I was not expecting to read about orphaned babies being used as “learning aids” while flipping through stories about Jell-O. Intellectually, I know that attitudes towards children have changed over the years — the Fair Labor Standards Act, which set federal standards for child labor, wasn’t even passed until 1938. But this casual aside tossed in amongst the marketing hype still packs an emotional punch. It’s important to remember how far we have come.
Even writing that was forward-thinking for the time, like the booklet Homo-sexual life, is terribly backward according to today’s standards. It’s from the Little Blue Book series — we have many that were published in 1925, and the publisher was quite prolific for many years. The series provided working class people with inexpensive access to all kinds of topics including philosophy, sexuality, science, religion, law, and government. Post WWII, they published criticism of J. Edgar Hoover and the founder was subsequently targeted by the FBI for tax evasion. But in 1925, they were going strong and one of their prolific writers was Clarence Darrow.
Controversies of the Age
Darrow was writing about prohibition for the Little Blue Book series in 1925, but that is also the year he defended John T. Scopes for teaching evolution in his Tennessee classroom. The Scopes Trial generated a huge amount of publicity, pitting religion against science, and even giving rise to popular songs like these two 78rpm recordings from 1925.
Like the Scopes trial, prohibition had its passionate adherents and detractors. This was the “Roaring 20s” — the year The Great Gatsby was published — with speakeasies and flappers and iconic cocktails. And yet the pro-prohibition silent film Episodes in the Life of a Gin Bottle follows a bottle around as it lures people into a state of dissolution.
And the most unchanging part of this particular season, of course — children still anticipate the arrival of Santa Claus with questions, wishes and schemes.
The silent film Santa Claus features two children who want to know where Saint Nick lives and how he spends his time. We follow him to the North Pole (Alaska in disguise) to see Santa’s workshop, snow castle, reindeer, and friends and neighbors. Jack Frost, introduced around 14:20, appears to be wearing the prototype for Ralphie’s bunny suit in “A Christmas Story” (but with a magic wand). Stick around for the sleigh crash at 20:45, and right around 22:20 Santa wipes out on the ice.
And just in case you’re still doing your holiday shopping, I feel like I should pass on a recommendation from this ad in a 1925 The Billboard magazine: Armadillo Baskets make beautiful Christmas gifts. And you can still buy vintage versions online – trust me, I looked. You’re welcome.
Discogs has cracked the nut, struck the right balance, and is therefore an absolute Internet treasure– Thank you.
If you don’t know them, Discogs is a central resource for the LP/78/CD music communities, and as Wikipedia said “As of 28 August 2019 Discogs contained over 11.6 million releases, by over 6 million artists, across over 1.3 million labels, contributed from over 456,000 contributor user accounts—with these figures constantly growing…”
When I met the founder, Kevin Lewandowski, a year ago he said the Portland based company supports 80 employees and is growing. They make money by being a marketplace for buyers and sellers of discs. An LP dealer I met in Oklahoma sells most of his discs through discogs as well as going at record fairs.
The data about records is spectacularly clean. Compare it to Ebay, where the data is scattershot, and you have something quite different and reusable. It is the best parts of musicbrainz, CDDB, and Ebay– where users can catalog their collections and buy/sell records. By starting with the community function, Kevin said, the quality started out really good, and then adding the market place later led it to its success.
But there is something else Discogs does that sets it apart from many other commercial websites, and this makes All The Difference:
The Great 78 Project has leveraged this bulk database to help find the date of release for 78’s. Just yesterday, I downloaded the new dataset and added it to our 78rpm date database, and in last year 10’s of thousands more 78’s were added to discogs, and we found 1,500 more dates for our existing 78’s. Thank you!
The Internet Archive Lost Vinyl Project leverages the API’s by looking up records we will be digitizing to find track listings.
A donor to our CD project used the public price information to appraise the CDs he donated for a tax write-off.
We want to add links back from Discogs to the Internet Archive and they have not allowed that yet (please please), but there is always something more to do.
I hope other sites, even commercial ones, would allow bulk access to their data (an API is not enough).
You could listen to multiple people recite the first 50 digits of pi in various styles, including to the tune of the Battle Hymn of the Republic (my personal favorite), in the voice of Bullwinkle, as an infomercial, in Latin, while laughing, in Morse Code, and while eating actual pie.
We have been digitizing about 8,000 78rpm record sides each month and now have 122,000 of them done. These have been posted on the net and over a million people have explored them. We have been digitizing, typing the information on the label, and linking to other information like discographies, databases, reviews and the like.
Volunteers, users, and internal QA checkers have pointing out typos, and we decided to go back over a couple of month’s metadata and found problems. And then we contracted with professional proofreaders and they found even more (2% of the records at this point had something to point out, some are matters of opinion or aesthetics, some lead to corrections).
We are going to pay the professional proofreaders to correct the 5 most important fields for all 122,000 records, but can use more help. We are pointing these out here in hopes to interest volunteer proofreaders and to share our experience in continually improving our collections.
Here are some of the issues with the primary performer field: before-the-after that we have now corrected from the June 2019 transfers (before | after) that we hope to upload in the next couple of weeks:
Jose Melis And His Latin American Ensemble | Jose Melis And His-Latin American Ensemble Columbia-Orchestra | Columbia-Orchester S. Formichi and T. Chelotti | S. Formichi e T. Chelotti Dennis Daye and The Rhythmaires | Dennis Day and The Rhythmaires Harry James and His Orchestra | Harry James and His Orch. Charles Hart & Elliot Shaw | Charles Hart & Elliott Shaw Peerless Quartet | Peerless Quartette
Some of the title corrections:
O Vino Fa ‘Papla (Wine Makes You Talk) | ‘O Vino Fa ‘Papla (Wine Makes You Talk) Masked Ball Salaction | Masked Ball Selection Moonlight and Roses (Brings Mem’ries Of You) | Moonlight and Roses (Bring Mem’ries Of You) Que Bonita Eres Tu (You Are Beutiful) | Que Bonita Eres Tu (You Are Beautiful) Buttered Roll | “Buttered Roll” Paradise | “Paradise” Got a Right to Cry | “Got a Right to Cry” Blue Moods | “Blue Moods” Auf Wiederseh’n Sweerheart | Auf Wiederseh’n Sweetheart George M. Cohan Medley – Part 1 | George M. Cohan Medley – Part 2 Dewildered | Bewildered Lolita (Seranata) | Lolita (Serenata) Got a Right to Cry | “Got a Right to Cry” Joe Liggins and His Honeydrippers Blue Moods | “Blue Moods” Body and Soul | “Body and Soul” Mais Qui Est-Ce | Mais Qui Est-Ce? Wail Till the Sun Shines Nellie Blues | Wait Till the Sun Shines Nellie Blues Que Te Pasa Joe (What Happens Joe) | Que Te Pasa Jose (What Happens Joe) SAMSON AND DELILAH Softly Awakens My Heart | SAMSON AND DELILAH Softly Awakes My Heart I’m Gonna COO, COO, COO | (I’m Gonna) COO, COO, COO
Good news: we have funding to preserve at least another 250,000 sides of 78rpm records, and we are looking for donations to digitize and physically preserve. We try to do a good job of digitizing and hosting the recordings and then thousands of people listen, learn, and enjoy these fabulous recordings.
If you have 78s (or other recordings) that you would like to find a good home for, please think of us — we are a non-profit and your donations will be tax-deductible, digitized for all to hear, and physically preserved. If you are interested in donating recordings of any type or appropriate books, please start with this form and we will contact you immediately
We are looking for anything we do not already have. (We are finding 80% duplication rates sometimes, so we are trying to find larger or more niche collections). We will physically preserve all genres, but our current funding has directed us to prioritize digitization of non-classical and non-opera.
We can pay for packing and shipping, and are getting better at the logistics for collections of a few thousand and up. These are fragile objects and we are having good luck avoiding damage.
The reason to highlight the donors is twofold: one is the celebrate the donor and their story, but the other is to help contextualize these recordings for different generations. These stories help users find meaning in the materials and find things they want to listen to. This way we can lead new listeners to love this music as the original collectors have
Working together we can broaden this collection to works from around the world and different cultural groups in each country.
If you are a private individual or an institution and have records to contribute, even if they are not 78s, please start with this simple form, or email info@archive.org, or call +1-415-561-6767 and we will contact you immediately. Thank you.