The COVID-19 pandemic has been life-changing for people around the globe. As efforts to slow the progress of the virus unfolded in early 2020, librarians, archivists and others with interest in preserving cultural heritage began considering ways to document the personal, societal, and systemic impacts of the global pandemic. These collections included preserving physical, digital and web-based information and artifacts for posterity and future research use.
In response, the Internet Archive’s Archive-It service launched a COVID-19 Web Archiving Special Campaign starting in April 2020 to allow existing Archive-It partners to increase their web archiving capacity or new partners to join to collect COVID-19 related content. In all, more than 100 organizations took advantage of the COVID-19 Web Archiving Special Campaign and more than 200 Archive-It partner organizations built more than 300 new collections specifically about the global pandemic and its effects on their regions, institutions, and local communities. From colleges, universities, and governments documenting their own responses to community-driven initiatives like Sonoma County Library’s Sonoma Responds Community Memory Archive, a variety of information has been preserved and made available. These collections are critical historical records in and of themselves, and when taken in aggregate will allow researchers a comprehensive view into life during the pandemic.
We have been exploring with partners ways to provide unified access to hundreds of individual COVID-related web collections created by Archive-It users. When the Institute of Museum and Library Services launched the American Rescue Plan grant program, that was part of the broader American Rescue Plan, a $1.9 trillion stimulus package signed into law on March 11, we applied and were awarded funding to build a COVID-19 Web Archive access portal – a dedicated search and discovery access platform for COVID-19 web collections from hundreds of institutions. The COVID-19 Web Archive will allow for browsing and full text search across diverse institutional collections and enable other access methods, including making datasets and code notebooks available for data analysis of the aggregate collections by scholars. This work will support scholars, public health officials, and the general public in fully understanding the scope and magnitude of our historical moment now and into the future. The COVID-19 Web Archive is unique in that it will provide a unified discovery mechanism to hundreds of aggregated web archive collections built by a diverse group of over 200 libraries from over 40 US states and several other nations, from large research libraries to small public libraries to government agencies. If you would like your Archive-It collection or a portion of it included in the COVID-19 Web Archive, please fill out this interest form by Friday, April 29, 2022. If you are an institution in the United States that has COVID-related web archives collected outside of Archive-It or Internet Archive services that you are interested in having included in the COVID-19 Web Archive, please contact email@example.com.
Radio remains one of the most-consumed forms of traditional media today, with 89% of Americans listening to radio at least once a week as of 2018, a number that is actually increasing during the pandemic. News is the most popular radio format and 60% of Americans trust radio news to “deliver timely information about the current COVID-19 outbreak.”
Local talk radio is home to a diverse assortment of personality-driven programming that offers unique insights into the concerns and interests of citizens across the nation. Yet radio has remained stubbornly inaccessible to scholars due to the technical challenges of monitoring and transcribing broadcast speech at scale.
Debuting this past July, the Internet Archive’s Radio Archive uses automatic speech recognition technology to transcribe this vast collection of daily news and talk radio programming into searchable text dating back to 2016, and continues to archive and transcribe a selection of stations through present, making them browsable and keyword searchable.
Ngrams data set
Building on this incredible archive, the GDELT Project and I have transformed this massive archive into a research dataset of radio news ngrams spanning 26 billion English language words across portions of 550 stations, from 2016 to the present.
You can keyword search all 3 million shows, but for researchers interested in diving into the deeper linguistic patterns of radio news, the new ngrams dataset includes 1-5grams at 10 minute resolution covering all four years and updated every 30 minutes. For those less familiar with the concept of “ngrams,” they are word frequency tables in which the transcript of each broadcast is broken into words and for each 10 minute block of airtime a list is compiled of all of the words spoken in those 10 minutes for each station and how many times each word was mentioned.
Some initial research using these ngrams
How can researchers use this kind of data to understand new insights into radio news?
The graph below looks at pronoun usage on BBC Radio 4 FM, comparing the percentage of words spoken each day that were either (“we”, “us”, “our”, “ours”, “ourselves”) or (“i”, “me”, “i’m”). “Me” words are used more than twice as often as “we” words but look closely at February of 2020 as the pandemic began sweeping the world and “we” words start increasing as governments began adopting language to emphasize togetherness.
TV vs. Radio
Combined with the television news ngrams that I previously created, it is possible to compare how topics are being covered across television and radio.
The graph below compares the percentage of spoken words that mentioned Covid-19 since the start of this year across BBC News London (television) versus radio programming on BBC World Service (international focus) and BBC Radio 4 FM (domestic focus).
All three show double surges at the start of the year as the pandemic swept across the world, a peak in early April and then a decrease since. Yet BBC Radio 4 appears to have mentioned the pandemic far less than the internationally-focused BBC World Service, though the two are now roughly equal even as the pandemic has continued to spread. Over all, television news has emphasized Covid-19 more than radio.
For now, you can download the entire dataset to explore on your own computer but there will also be an interactive visualization and analysis interface available sometime in mid-Spring.
It is important to remember that these transcripts are generated through computer speech recognition, so are imperfect transcriptions that do not properly recognize all words or names, especially rare or novel terms like “Covid-19,” so experimentation may be required to yield the best results.
Researchers can ask questions that for the first time simultaneously look across audio, video, imagery and text to understand how ideas, narratives, beliefs and emotions diffuse across mediums and through the global news ecosystem. Helping to seed the future of such at-scale research, the Internet Archive and GDELT are collaborating with a growing number of media archives and researchers through the newly formed Media Data Research Consortium to better understand how critical public health messaging is meeting the challenges of our current global pandemic.
About Kalev Leetaru
For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.
“That’s how I think of it now: listening as intimacy. My shoulders dropped. The muscles in my neck and face relaxed. I breathed more deeply.”
—Donald Antrim, “How Music Can Bring Relief During These Anxious Times,” The New Yorker
Every Monday at 9:55 a.m., the concerts begin. Lap steel guitarists. Feminist indie folk bands. Improvisational cellists. For the Internet Archive staff, spread across many continents, these ten-minute concerts that begin and end our work week create an aural bubble where listening together feels intimate, uplifting. For us this music has somehow become, yes, essential. For the artists, zooming in from makeshift home stages, the chance to perform live for our staff of 100+ creates a connection with an audience that has been severed during the pandemic. “It was so nourishing to be supported, not only emotionally, but also financially at a time when musicians are being hit incredibly hard,” said singer-songwriter, Annie Hart. “It made my art feel valued and appreciated and helped me continue to make more.”
The idea to create this impromptu concert series originated with Alexis Rossi, who heads the Internet Archive’s Collections team. Five minutes before our Monday morning and Friday lunch staff zoom meetings, Alexis and Web Archiving Program Manager, Peggy Lee, act as virtual stage managers, getting performers dialed in, audio levels tweaked. The Internet Archive pays performers a small fee and staffers “tip” the artists through paypal or Venmo. “”I have several friends who are full time performers, and shelter in place completely destroyed their ability to work and make a living,” explains Rossi. “So I jumped at the chance to help book acts because I knew that even a little bit of income would help.”
“The music series has been a way we bring people into our house, the place where we come together as a community, and have this shared experience together. I love those ten minutes.”
—Peggy Lee, Co-producer, Essential Music
What started as a fun idea has solidified into ESSENTIAL MUSIC: Concerts From Home, a program that we believe could be replicated anywhere, offering organizations many intangible benefits. “The Internet Archive’s live performances have been such a bright spot in my week,” says engineer, Jason Buckner. “They bring such a positive energy to our meetings and you can see it in the faces of everyone watching on Zoom.” Just ten minutes of music seems to have a magical effect on staff: inspiration. “Seeing other creatives excel at what they do helps bind me closer to my work,” agrees Isa Herico Velasco, Internet Archive engineer. “It affirms what we are actually stewarding: the preservation and celebration of humanity.”
Here are some of the Essential Music concerts, recorded and shared by permission of the artists:
Ainsley Wagoner / Silverware (6/16/20)
Ainsley Wagoner creates ethereal music as the artist, Silverware. Ainsley is also a product designer who co-created the super cool OAM project — an experiment in mixing sound, colours, and geometries on the web. “Performing is one of my favorite parts about being a musician,” says Wagoner. “Even though I have recorded music online, nothing beats playing a song live. For now, I don’t have an in-person performance outlet, so it felt really good to do that virtually with the Internet Archive.”
Alex Spoto (6/22/20)
Alex is a multi-instrumentalist who has performed and recorded with Last Good Tooth, Benjamin Booker, and many others. He is a longtime contributor for Aquarium Drunkard and the co-author of Fowre 2: Gone Country, a book of interviews with contemporary Country musicians. He got his start playing classical violin, then ‘old-time’ folk music, and then improvised “free” music. He is currently musically obsessed with cajun fiddling, old cumbia, the jazzier side of Merle Haggard, the polyrhythmic foundation of Saharan folk music, the sly and sensitive folkways of Michael Hurley, and the Internet Archive’s 78 project!
Vickie Vertiz (6/26/20)
The oldest child of an immigrant Mexican family, Vickie Vértiz was born and raised in Bell Gardens, a city in southeast Los Angeles County. Her writing is featured in the New York Times magazine, the San Francisco Chronicle, Huizache, Nepantla, the Los Angeles Review of Books, KCET Departures, and the anthologies: Open the Door (from McSweeney’s and the Poetry Foundation), and The Coiled Serpent (from Tia Chucha Press), among many others.
Vértiz’s first full collection of poetry, Palm Frond with Its Throat Cut, published in the Camino del Sol Series by The University of Arizona Press won a 2018 PEN America literary prize. Vickie is a proud member of Colectivo Miresa, a feminist cooperative speaker’s bureau, her first poetry collection, Swallows, is available from Finishing Line Press. She teaches at the University of California, Santa Barbara.
Jess Sylvester / Marinero (6/29/20)
Jess Sylvester is a Bay Area chicanx songwriter/composer, also known as Marinero. Marinero is known for his dreamy, cholo-fi signature style of taking samples of 60s latin music and adding spacey pop flavors. His newest album, Trópico de Cáncer, is rooted in bossa nova and Tropicália sounds. Watch his profile in Content Magazine.
Sylvester says: “I was actually touched by the introduction given to me right before playing a song for their team. It was a shock to hear the level of research they had done referencing my life and past projects, and in retrospect made sense considering it was the Internet Archive just living up to their name.” Thank you for listening to my music and making me feel heard and supported.”
Ivan Forde (7/6/20)
Ivan Forde is a Guyanese-born, Harlem artist. Forde (b. 1990) works across sound performance, printmaking, digital animation and installation. Using a wide variety of photo-based and print-making processes (and more recently music and performance), Ivan Forde retells stories from epic poetry casting himself as every character. His non-linear versions of these time-worn tales open the possibility of new archetypes and alternative endings. By crafting his own unique mythology and inserting himself in historical narratives, he connects the personal to the universal and offers a transformative view of prevailing narratives in the broader culture.
Zachary James Watkins (7/10/20)
Zachary James Watkins is an Oakland-based sound artist. He was one-half of the defunct duo Black Spirituals and is now part of the current duo Watkins/Peacock. Zachary has received commissions from Cornish, The Microscores Project, The Beam Foundation, Somnubutone, the sfSoundGroup and the Seattle Chamber Players. He has shared bills with Earth, the Sun Ra Arkestra, and designed the sound and composed music for the plays “I Have Loved Strangers.” His 2006 composition Suite for String Quartet was awarded the Paul Merritt Henry Prize for Composition and has subsequently been performed at the Labs 25th Anniversary Celebration, the Labor Sonor Series at Kule in Berlin Germany and in Seattle Wa, as part of the 2nd Annual Town Hall New Music Marathon. Zachary has been an artist in residence at the Espy Foundation, Djerassi and the Headlands Center for The Arts.
Bill Walker (7/13/20)
A gifted composer and instrumentalist, Bill Walker’s music has been described as cinematic, adventurous, and innovative. His solo performances create a rich tapestry of layered sounds, blending electric and acoustic guitars, lap steel guitars, and percussive guit-boxing with state of the art live looping techniques and sound design.
This Santa Cruz, CA-based musician was featured in Guitar Player Magazine for his collaboration with Erdem Helvacioglu on the critically acclaimed CD, “Fields and Fences”. To hear more tune in to his YouTube channel.
Jennifer Cheng (7/17/20)
Jennifer S. Cheng’s work includes poetry, lyric essay, and image-text forms exploring immigrant home-building, shadow poetics, and the feminine monstrous. She is the author of MOON: LETTERS, MAPS, POEMS, selected by Bhanu Kapil for the Tarpaulin Sky Award, and HOUSE A, selected by Claudia Rankine for the Omnidawn Poetry Prize. She is a 2019 NEA Literature Fellow and graduated from Brown University, the University of Iowa, and San Francisco State University.
Jess Sah Bi (7/17/20)
Jess Sah Bi, with his musical partner, Peter One, is one of the most popular musical acts in West Africa, performing to stadium-sized audiences at home in the Ivory Coast and throughout Benin, Burkina Faso, and Togo. Their album, Our Garden Needs Its Flowers, originally thrust them into stardom in the late ’80s. The album was inspired both by classic American country and folk music and the traditions of Ivorian village songs, but it focused thematically on the political turmoil of the region. Songs are sung in French, Gouro, and English.
Theresa Wong (7/24/20)
Theresa Wong is a Berkeley-based composer, cellist and vocalist active at the intersection of music, experimentation, improvisation and the synergy of multiple disciplines. Bridging sound, movement, theater and visual art, her primary interest lies in finding the potential for transformation for both the artist and receiver alike. Her works include The Unlearning (Tzadik), 21 songs for violin, cello and 2 voices inspired by Goya’s Disasters of War etchings, O Sleep, an improvised opera for an 8 piece ensemble exploring the conundrum of sleep and dream life. In 2018, Theresa founded fo’c’sle, a record label dedicated to adventurous music from the Bay Area and beyond. Theresa has shared her work internationally at venues including Fondation Cartier in Paris, Yerba Buena Center for the Arts in San Francisco, Cafe Oto in London, Festival de Arte y Ópera Contemporánea in Morelia, Mexico and The Stone and Roulette in New York City.
After her performance, the artist wrote, ““I could sense the spirit reaching out beyond glass and pixels, sparking back to life that basic need of connecting with others.”
I wanted to share my thoughts in response to the lawsuit against the Internet Archive filed on June 1 by the publishers Hachette, Harpercollins, Wiley, and Penguin Random House.
I founded the Internet Archive, a non-profit library, 24 years ago as we brought the world digital. As a library we collect and preserve books, music, video and webpages to make a great Internet library.
We have had the honor to partner with over 1,000 different libraries, such as the Library of Congress and the Boston Public Library, to accomplish this by scanning books and collecting webpages and more. In short, the Internet Archive does what libraries have always done: we buy, collect, preserve, and share our common culture.
But remember March of this year—we went home on a Friday and were told our schools were not reopening on Monday. We got cries for help from teachers and librarians who needed to teach without physical access to the books they had purchased.
Over 130 libraries endorsed lending books from our collections, and we used Controlled Digital Lending technology to do it in a controlled, respectful way. We lent books that we own—at the Internet Archive and also the other endorsing libraries. These books were purchased and we knew they were not circulating physically. They were all locked up. In total, 650 million books were locked up just in public libraries alone. Because of that, we felt we could, and should, and needed to make the digitized versions of those books available to students in a controlled way to help during a global emergency. As the emergency receded, we knew libraries could return to loaning physical books and the books would be withdrawn from digital circulation. It was a lending system that we could scale up immediately and then shut back down again by June 30th.
And then, on June 1st, we were sued by four publishers and they demanded we stop lending digitized books in general and then they also demanded we permanently destroy millions of digital books. Even though the temporary National Emergency Library was closed before June 30th, the planned end date, and we are back to traditional controlled digital lending, the publishers have not backed down.
Schools and libraries are now preparing for a “Digital Fall Semester” for students all over the world, and the publishers are still suing.
Please remember that what libraries do is Buy, Preserve, and Lend books.
Controlled Digital Lending is a respectful and balanced way to bring our print collections to digital learners. A physical book, once digital, is available to only one reader at a time. Going on for nine years and now practiced by hundreds of libraries, Controlled Digital Lending is a longstanding, widespread library practice.
What is at stake with this suit may sound insignificant—that it is just Controlled Digital Lending—but please remember– this is fundamental to what libraries do: buy, preserve, and lend.
With this suit, the publishers are saying that in the digital world, we cannot buy books anymore, we can only license and on their terms; we can only preserve in ways for which they have granted explicit permission, and for only as long as they grant permission; and we cannot lend what we have paid for because we do not own it. This is not a rule of law, this is the rule by license. This does not make sense.
We say that libraries have the right to buy books, preserve them, and lend them even in the digital world. This is particularly important with the books that we own physically, because learners now need them digitally.
This lawsuit is already having a chilling impact on the Digital Fall Semester we’re about to embark on. The stakes are high for so many students who will be forced to learn at home via the Internet or not learn at all.
Librarians, publishers, authors—all of us—should be working together during this pandemic to help teachers, parents and especially the students.
I call on the executives at Hachette, HarperCollins, Wiley, and Penguin Random House to come together with us to help solve the pressing challenges to access to knowledge during this pandemic.
Put simply, the campaign asks Congress to clarify libraries’ right to buy and lend books today as they have done for centuries.
Today, amidst a skyrocketing demand for digital books, many books are not available on digital shelves at any price because there are no commercially available digital versions of older titles. This gap limits how libraries can serve their patrons.
“Many libraries are currently closed, and sadly it looks like they may be for months to come,” said John Bergmayer, Legal Director of Public Knowledge. “We need to make sure that libraries can continue serving their communities, not just during the pandemic, but after, as tightened budgets put the squeeze on library services and limit the scope of their collections.”
Filling the Gap with Controlled Digital Lending
Libraries have begun making and lending out digital versions of physical works in their collections based on current legal protections—a practice called Controlled Digital Lending, or CDL. As Public Knowledge’s Let Libraries Fight Back campaign explains:
CDL is a powerful tool to bridge the gap between print and electronic resources. Under CDL, a digital copy of a physical book can only be read and used by one person at a time. Only one person can “borrow” an electronic book at once, and while it is being lent electronically, the library takes the physical book out of circulation.
CDL allows libraries to reach their patrons even when those patrons can’t make it to the physical library — a problem that’s been more prevalent than ever during the pandemic. Without programs like this, library patrons are prevented from accessing a world of content and information — and low-income, rural, and other marginalized communities are hit the hardest.
However, Public Knowledge acknowledges that the challenge extends beyond print materials. “Controlled Digital Lending makes it so that a library’s existing print collection is more useful, and can be accessed remotely,” explained Bergmayer. “But we also need to make sure that libraries can acquire digital-native books and other media under the same terms they have always operated under.”
For six weeks, Internet Archive book scanner, Mandy Weiler, was unable to digitize art history books inside the now shuttered Getty Research Institute. Furloughed and stuck inside her 500-square-foot apartment in Los Angeles, she spent a lot of time staring out the window. She took up bird watching and hung out with her cat. Then, the Internet Archive had an idea: bring scanners back from furlough and hire experts to teach Mandy and other book scanners new skills, including dating 78 rpm records and performing custom audio restoration on these recordings from a century ago. “Before I was doing the whole Doom Scrolling all day long,” Mandy recalls. “When we came back from furlough…I was really glad to get assigned to the 78 project. It has been such a nice distraction to get lost in these old records.”
Across the country, in Philadelphia, music metadata researcher Liz Rosenberg was also in need of work. A specialist in 78 rpm records trained by the legendary George Blood, Liz agreed to lead a new project cataloging 38,000 discs in our 78s collection for which the date of publication is unknown. Her team of a dozen former book scanners starts with the data on the record labels, but that is often just the beginning. Almost every 78 recording was originally assigned a matrix number, usually etched into the shellac itself. It is the only sure way of identifying a performance, but certain record labels such as Victor kept its matrix numbers secret. Sometimes only abbreviated versions of the matrix number are printed on the label itself.
Enter Mandy Weiler, who has a bachelor’s degree in Public History and has worked for libraries for the last decade. Turns out that digging up music metadata is right in her wheelhouse. “It’s a lot of web sleuthing. I’m one of those people who have 100 tabs open all the time on their computer,” she explained. “My partner calls me a story hoarder. I’m glad we are putting it to good use.”
Take for instance this 78 disc, Reigen (La Ronde de L’amour) by Adolf Wohlbruck, publication date unknown. “I started in all the normal places, but I wasn’t getting any hits,” Mandy recounted. “I couldn’t narrow down the dates and then I started finding out more about the performer.”
Mandy’s research uncovered a fascinating story behind this disc. The performer, Adolf Wohlbruck, was better known as Anton Walbrook, the queer, Jewish son of a circus performer who fled Nazi Germany in 1936. Walbrook went on to become a celebrated actor in Hollywood and England.
In La Ronde, the 1950 film for which he recorded this performance, Walbrook plays the master of ceremonies in a cycle of stories about love. The film was nominated for two Academy Awards and was later remade in a 1964 film starring Jane Fonda.
Perhaps Walbrook’s best known films are the 49th Parallel and The Red Shoes, both of which you can still watch in the Internet Archive. All of this rich backstory is now recorded in the Internet Archive Review Section for this 78 rpm record.
“It can take you down some great rabbit holes,” Mandy said. “This is the stuff I’ve been interested in my adult life. Digging in and finding information that is not readily available and sharing it. Universal Access to to Knowledge. That’s important to me because it helps so many people.”
Oh, and the publication date of the 78 Reigen? Mandy says it is likely 1951. Can anyone help us further?
The Internet Archive’s Archive-It service is collaborating with the International Internet Preservation Consortium’s (IIPC) Content Development Group (CDG) to archive web-published resources related to the ongoing Novel Coronavirus (Covid-19) outbreak. The IIPC Content Development Group consists of curators and professionals from dozens of libraries and archives from around the world that are preserving and providing access to the archived web. The Internet Archive is a co-founder and longtime member of the IIPC. The project will include both subject-expert curation by IIPC members as well as the inclusion of websites nominated by the public (see the nomination form link below).
Due to the urgency of the outbreak, archiving of nominated web content will commence immediately and continue as needed depending on the course of the outbreak and its containment. Web content from all countries and in any language is in scope. Possible topics to guide nominations and collections:
A special thanks to Alex Thurman of Columbia University and Nicola Bingham of the British Library, the co-chairs of the IIPC CDG, and to other IIPC members participating in the project. Thanks as well to any and all public nominators assisting with identifying and archiving records about this significant global event.