The following guest post from curator and amateur radio enthusiast Kay Savetz is part of our Vanishing Culture series, highlighting the power and importance of preservation in our digital age. Read more essays online or download the full report now.
Amateur Radio has been a hobby for well over 100 years. For as long as there has been an understanding of electricity and radio waves, people have been experimenting with these technologies and advancing the state of the art. As a result, the world has moved from wired telegraphy to tube radios to telephones—fast forward a century—to GPS and high-speed digital communication devices that fit in your pocket.
Advances made by amateur radio experimenters have propelled the work of NASA, satellites, television, the internet, and every communications company in existence today. People fiddling with radios have pushed forward technological advances the world around, time and time again.
And yet, the people making these efforts, doing these feats, aren’t always the best at documenting and preserving their work for the future. That’s where Internet Archive comes in.
I’m the curator of the Digital Library of Amateur Radio and Communications. DLARC is a project of the Internet Archive, and my job is to find and preserve this rich history of radio and communications. DLARC collects resources related to amateur radio, satellite communications, television, shortwave radio, pirate radio, experimental communications, and related communications.
In the two years since the project launched, DLARC has preserved thousands of magazines and journals, manuals, product catalogs, radio programs, and conference proceedings. These materials were scattered worldwide, often inaccessible and in obsolete formats. We’ve digitized material that was on paper, cassette tape, reel-to-reel tape, CD-ROMs, DVDs. We’ve digitized video from 16mm film, VHS, U-Matic, Betacam and even more obscure video formats.
We’ve built a collection of more than 140,000 items and made them available to the world. Researchers, academics, and hobbyists use the library to learn from the rich history of this 100-year-old hobby.
Learn more about DLARC
One reason this preservation is necessary is that the people creating history don’t always realize at the time that they’re creating history. In 1977, the creators of Amateur Radio Newsline—a weekly audio news bulletin—probably didn’t realize that their project would still be going on in 2024, 47 years later. And for all of their amazing work, if they had realized they were documenting history, they might have made more effort to save those recordings: the first 20 years of their work are missing. (DLARC has found some recordings from 1996, then most of them since 2012.)
Sometimes creators do recognize the importance of their effort. For more than six years, Len Winkler hosted Ham Radio & More, a radio show about amateur radio. Winker recorded every episode on cassette tape and managed to digitize many of the shows himself. However, the process of digitizing hundreds of episodes is tedious and he wasn’t able to complete it. With his approval, DLARC stepped in to finish the job. They’re all online now, more than 300 episodes including interviews with many notable names in the radio community.
There have been other huge successes: the entire 43-year run of 73 Magazine is digitized and online thanks to the publisher, Wayne Green, who donated the collection to Internet Archive before his death. Most issues of The W5YI Report, a ham radio newsletter that was published for 25 years, are online as well.
Attempting to preserve material years, or sometimes decades, after the fact makes systematic preservation nearly impossible. For every success story of content saved and archived, there is a heartbreaking story of loss. When amateur radio enthusiasts die, their media collections are often disposed of by survivors who don’t have any connection to amateur radio. File cabinets and bookcases full of (sometimes irreplaceable) materials are emptied into recycle bins.
Another challenge to preservation and access is membership organizations that keep their material behind paywalls. They sometimes prevent any of their information from being lent in an online library, which it is their right to do. However while they actively thwart efforts at preservation, it remains unclear whether those groups are adequately preserving their own history.
Some material is preserved intentionally, but a good amount was saved purely by accident. The material we recover and digitize has come from attics and basements, from libraries discarding obsolete material, from long-forgotten FTP sites, from scratched CD-ROMs, and from the estates of people who have passed.
So we float where the radio waves take us, trying to preserve the past as much as possible, while encouraging today’s content creators to consider how to make their material accessible to future generations.
The following guest post from media historian Taylor Cole Miller is part of our Vanishing Culture series, highlighting the power and importance of preservation in our digital age. Read more essays online or download the full report now.
The nesting material of my university office is blank VHS tapes. A few of these tapes were well-worn security blankets with comforting shows I watched over and over to propel myself through childhood and adolescence. Where normal people might have held onto a cherished dolly or baseball glove as nostalgic trinkets of their youth, I kept my jumpy copy of CBS’ live-action Alice in Wonderland along with episodes of The Golden Girls, The Oprah Winfrey Show, and Xena: Warrior Princess. These artifacts, the ones I clung to growing up, eventually became the foundation of my research as a media historian. While writing my master’s thesis, a media ethnography of rural gay men, you’d find me at garage and estate sales every month asking if there were any old VHS tapes of Oprah lying around. And in order to even access episodes of his short-lived show, All That Glitters,for my doctoral dissertation, I had to become friendly with and visit producer Norman Lear himself to watch shows in his personal archive. Television culture is inextricably linked with American culture, but most early television is lost forever, a vanishing era of our culture with few traces.
As a scholar, my specific area of interest is television syndication—the practice of selling content directly to local stations and station ownership groups without going through a network. The stations can air these shows at whatever time and with whatever frequency they desire. There are two primary types of syndication: First-run syndication such as talk shows like The Oprah Winfrey Show or Ricki Lake; game shows like Jeopardy or Wheel of Fortune; court shows like Judge Judy or scripted originals like Xena: Warrior Princess or Star Trek: The Next Generation. And second-run syndication, most often referred to as reruns of popular shows. This means my objects of study are often limited by what is available and how. Many television shows from the last 50 or 60 years have been officially released on physical media like VHS, Betamax, LaserDisc, or DVD, or made available via streaming or on-demand services, but these are primarily primetime network or cable programs, not daily syndicated talk shows, game shows, public affairs programs, or kids’ TV. Despite its own ephemerality, syndication remains television’s best archivist: It preserves shows that can still turn a profit in reruns, even if it doesn’t always ensure their accessibility or proper care. While syndication keeps certain programs alive in archives, they often remain unaired or improperly preserved without enough demand. Those that no longer generate revenue, no matter how innovative, tend to disappear—left to decay on shelves or locked away in obsolete formats under the weight of copyright restrictions–or worse. One of the most tragic examples of this vanishing culture, allegedly twenty feet below the surface of the Upper New York Bay, is the lost archive of the DuMont Television Network.
In television’s beginning, three familiar companies expanded their operations from radio: NBC, CBS, and ABC. But there was also a fourth company competing with these fledgling television efforts—DuMont, a television and equipment manufacturer that contributed numerous innovations in the technology of TV itself. Although big commercial television was still years away, DuMont was selling television sets by the 1930s. Its 1938 set, for example, the DuMont 180, featured a massive 14-inch screen and retailed for $395-445. To help sell his sets, Allen B. DuMont opened an experimental television station (W2XVT), which operated programming that the showroom models could display to demonstrate picture quality, a practice that continued with the launch of the commercial DuMont Network in 1946.
That year, DuMont gave the greenlight to the half-hour show, Faraway Hill. Although “firsts” are hard to claim given that much of early TV history is lost, Faraway Hill is often thought to be the first network television soap opera. The show was created by David P. Lewis, who adapted it from his unfinished novel. According to Elana Levine in her history of soaps, Her Stories, like with radio soaps before, the show included “stream-of-consciousness” style voice-overs that allowed women to look away as needed under the social expectations of household duties. As reported in his obituary, Lewis said DuMont was desperate for programming, particularly during the nine hours of weekly programming it aired in competition with NBC. The show aired only ten episodes, and reportedly made no money, with Lewis claiming he did it to “test the mind of the viewer.” Through Faraway Hill, Levine argues that DuMont “experimented with visuals, including set changes, establishing shots, and some visual effects while, narratively, it tried a recapping strategy that would become a fixture of daytime TV soaps, repeating the last scene of the previous episode as the start of the next.” A second soap effort, A Woman to Remember,ran daily for five months in 1949, with half of that run appearing in daytime. Although Faraway Hill is recognized as the first primetime television serial—a format that would define all Primetime Emmy winners for Outstanding Drama Series in the 21st century—it has vanished because DuMont broadcast it live and, as far as we know, never recorded it.
Faraway Hill wasn’t the only first in its genre from DuMont. The network also aired Captain Video and His Video Rangers from 1949 to 1955, considered the first popular sci-fi television show and DuMont’s longest-running program. If you’re a fan of television comedy, you can thank Mary Kay and Johnny, often thought to be the first network sitcom—a multi-camera comedy that premiered on DuMont in 1947. DuMont was also the first network to broadcast the NFL championship game in 1951, launched Jackie Gleason’s career, and aired the Army-McCarthy hearings in 1954.”
While television was predominantly white at the time, DuMont produced pioneering shows led by women of color. In 1950, the phenomenally talented Hazel Scott likely became the first Black woman to host her own television show, decades before Oprah Winfrey’s debut in national syndication. The Hazel Scott Show, which aired thrice weekly on DuMont, showcased Scott—a piano prodigy and accomplished musician who had won an early Civil Rights case–a racial discrimination lawsuit against restaurateurs Harry and Blanche Utz in February 1949. However, after she was blacklisted in Red Channels (a publication that accused entertainers of communist sympathies during the McCarthy era), a smear campaign led to the show’s cancellation, and Scott’s groundbreaking contributions to early television history have largely been forgotten.
Also lost to history is DuMont’s The Gallery of Madame Liu-Tsong in 1951, featuring legendary actor Anna May Wong in probably the first American television series with an Asian-American lead. Wong’s character was an art dealer whose investigative art history skills also helped her become a crime solver. There are no known recordings or even scripts of the show still in existence. The only information we have on these programs is what remains of it in schedules and TV listings. For this article, I audited several TV History textbooks from respected scholars, and I could find no mention of either The Hazel Scott Show or The Gallery of Madame Liu-Tsong.
DuMont Television collapsed in 1955 after clunky UHF (Ultra High Frequency) regulations hammered the final nail in its coffin. These rules limited the reach of UHF stations, putting DuMont at a disadvantage compared to the more accessible VHF (Very High Frequency) channels. Still, before its demise, DuMont produced a rich schedule of innovative programs—many of which may never be seen again. According to testimony in a report for the Library of Congress, DuMont’s television archive was intentionally destroyed as a result of the negotiations of a sale in the 1970s. Reportedly, the parties were concerned about who would be responsible for the sensitive archival needs, like temperature control, of such a massive collection. In the report, Edie Adams, a talented performer and a key figure at DuMont, along with her husband Ernie Kovacs—who hosted his own show on the network—shared what she heard about its demise while trying to archive her husband’s career. “At 2 a.m., [one of the lawyers] had three huge semis back up to the loading dock […] filled them all with stored kinescopes and 2” videotapes, drove them to a waiting barge in New Jersey, took them out on the water, made a right at the Statue of Liberty, and dumped them in the Upper New York Bay. Very neat. No problem.” While this is the commonly reported lore of DuMont’s demise, no one really knows for sure what happened. Could some materials still exist? True or not, DuMont’s metaphorical watery grave nevertheless serves as a poignant reminder for how easily traces of our past can vanish.
DuMont Network and the Internet Archive
The Internet Archive is an important repository where saved DuMont programs have been collected and made available to the public. Many of these programs survive from personal collections of performers or producers who kept copies in their personal files. The Internet Archive houses a few surviving examples of DuMont programming, including clips from Cavalcade of Stars, where The Honeymooners and Jackie Gleason made their first appearances in sketches. The archive also includes Okay, Mother, a game show that premiered in 1948, and one of the earliest daytime network TV shows, with one surviving episode available to watch.
Also in the Internet Archive are one or a few episodes each of DuMont shows now in the public domain, including The Adventures of Ellery Queen, The Arthur Murray Show, Flash Gordon, Front Page Detective, The Goldbergs, Hold That Camera, The Johns Hopkins Science Review, Kids and Company, Life is Worth Living, Man Against Crime, Miss U.S. Television Grand Finals, The Morey Amsterdam Show, The Old American Barn Dance, On Your Way, Public Prosecutor, Rocky King- Inside Detective, The School House, Sense and Nonsense, Steve Randall, They Stand Accused, Tom Corbett- Space Cadet, Twenty Questions, and You Asked for It. Beneath the surface of the Upper New York Bay might rest DuMont’s legacy, forgotten by most but not entirely lost. But while its kinescopes may have submitted to a watery grave, the efforts of open-access archives like the Internet Archive—storing the personal collections of those who saw value in preserving their histories—offer glimmers of hope. Perhaps, like my cherished collection of VHS tapes, some forgotten episode, script, or production material is still out there, waiting to be discovered, languishing in an old filing cabinet, on a neglected shelf, or in a dusty attic. Or maybe we’ll unearth some other unknown broadcast treasure in the search. With the ongoing work of archivists, collectors, and historians, maybe we can work to piece together the remnants of America’s vanishing early television history and provide to future generations. I want to believe.
About the author
Taylor Cole Miller is an Assistant Professor of Media Studies at the University of Wisconsin-La Crosse and a media history content creator under the handle tvdoc. His research focuses on television histories, syndication, and queer media studies and can be found in journals like Camera Obscura and Television and New Media as well as numerous anthologies and popular press outlets. He is co-editor of the forthcoming collection The Golden Girls: Essays from the Lanai from Rutgers University Press.
The Wayback Machine, Archive.org, Archive-it.org, and OpenLibrary.org came up in stages over the week after cyberattacks with some of the contributor features coming up over the last couple of weeks. A few to go. Much of the development during this time has been focused on securing the services so they can still run while attacks continue.
The Internet Archive is adapting to a more hostile world, where DDOS attacks are recurring periodically (such as yesterday and today), and more severe attacks might happen. Our response has been to harden our services and learn from friends. This note is to share some high level findings, without being so detailed as to help those that are still attacking archive.org.
By tightening firewall technologies, we have changed how data flows through our systems to improve monitoring and control. The downside is these upgrades have forced changes to software, some of it quite old.
The bright side is this is forcing upgrades that we have long planned or hoped for. We are greatly helped by the free and open source community’s improving tools that can be used by large corporations as well as non-profit libraries because they are freely available.
Also, some commercial companies have offered assistance that would generally be prohibitively expensive. We are grateful for the support.
Where the Internet Archive has always focused on building collections and preserving them, we have been starkly reminded how important reliable access is to researchers, journalists, and readers. This is leading us to install technical defenses and increase staff to improve service availability.
Libraries in general, and the Internet Archive in specific, have been under attack for many years now. For us it started with the book publishers suing (about lending books), and now the recording industry (about 78rpm records), which is a drain on our staff and financial resources. Now recurring DDOS attacks distract us from the goals of preservation and access to our digital heritage.
We don’t know why these attacks have started recently and if they are coordinated, but we are building defenses.
We are grateful for the support from our patrons, through social media, through donations, and through offers of help, which frankly, makes it worthwhile to keep building a library for all of us.
The following Q&A between writer Caralee Adams and journalist Philip Bump of The Washington Post is part of our Vanishing Culture series, highlighting the power and importance of preservation in our digital age. Read more essays online or download the full report now.
Philip Bump is a columnist for The Washington Post based in New York. He writes the weekly newsletter How To Read This Chart. He’s also the author of The Aftermath: The Last Days of the Baby Boom and the Future of Power in America.
Caralee Adams: What does it mean for an individual journalist to have their work preserved? Why is it important to have easy access to news stories from the past?
Philip Bump: One of the nice things about my career has been that I’ve worked for outlets that I feel confident are doing their own preservation, like TheWashington Post. I’m not particularly worried about losing access to my writing. However, it’s less of a concern for me than it is for other outlets, unfortunately. It is unquestionably the case that I find the Internet Archive useful and use it regularly for a variety of things—both for its preservation of online content and collection of closed captioning for news programs.
Any recent examples of when you’ve found the Internet Archive particularly useful?
I use the search tool on closed captioning more than anything else. The other day I was trying to find an old copy of a webpage. I was writing about Donald Trump’s comments on Medal of Honor recipients. As it turns out, there is not an immediately accessible resource for when Medals of Honor were granted to members of the military. You can see aggregated—how many there are—but you can’t see who was given a medal and when they served. I actually used the Internet Archive to see how the metrics changed between the beginning of Trump’s presidency and by the end of it. I was able to see that there were medals awarded to about 11 people who served during the War on Terror, three who served in Vietnam, and one during World War II. Then, I was able to go back and double check against the Trump White House archive, which is done by the National Archives, and see the people to whom he had given this award. That’s a good example of being able to take those two snapshots in time and then compare them in order to see what the difference was to get this problem solved.
Why is it important for the public to have free public access to an archive of the news for television or print?
It’s the same reason that it’s important, in general, to have any sort of archive: it increases accountability and increases historical accuracy. The Internet Archive is essential at ensuring that we have an understanding of what was happening on the internet at a given point in time. That is not something that is constantly useful, but it is something that is occasionally extremely useful. I do a lot of work in politics and get to see what people are saying at certain points in time, which are important checks and accountability for elected officials. The public can know what they were saying when they were running in the primary as compared with the general [election]. The Archive allows anyone to be able to get information from websites that are no longer active. If you’re looking for something and you have the old link to Gawker or the old link to a tweet, you can often [find] it archived. The Internet Archive doesn’t capture everything—it couldn’t possibly do so. But it captures enough to generally answer the questions that need to get answered. There’s nowhere else that does that. There are other archiving sites, but none that do so as comprehensively, or none with an archive that goes back that far.
Has any of your journalism vanished from the public? Do you have any examples where you’ve been looking for something and it’s been missing?
Yes. One of the challenges is that multimedia content has often, in the past, been overlooked. There are old news reports that I’ve been unable to find because they’re on video in the era before there was a lot of accessibility and transcripts. Therefore, yes, there are certainly things like that which come up with some regularity. Also, particularly in the era of 2005 to 2015, there were a lot of independent sites that had useful news reports—particularly since we’re talking about the cast of political characters that have been around in the public eye at that point in time. It’s often the case that it’s hard to track those things down. Or if you’re trying to track down the original source or verify a rumor, you might need to dip into the Archive. There are a lot of sites from that era of “bespoke” blogs that the Internet Archive often captures.
How does limited access to historical data or previous coverage impact you as a journalist?
It is hard to say, because relatively speaking, I am advantaged by the fact that I live in this era. If I were doing this in 1990, [I’d use] basically whatever was at the New York Public Library and on microfiche. It is far better than it used to be, but the amount of content being produced is also far larger. It is both a positive and a negative that it is far easier to do that sort of research here from my desk at home than it would possibly have been 30 years ago. In fact, I was working on a project where I relied heavily on a local newspaper in a small town in Pennsylvania that wasn’t available online. I literally had to hire someone in the town to go to the library, find [coverage from] the particular date and the local paper and to get the scans done. It cost me hundreds of dollars, but that was the only way to do it. You can see how getting these things done is problematic and challenging.
When Paramount deleted the MTV News Archive in June, there was a lot of dismay, but some say it was frivolous, disposable, and kind of meant to be thrown away. How do you feel about that?
My first writing gig online was at MTV News in college, so that actually had a personal resonance for me. I was at Ohio State in the early to mid 1990s, and I got this little internship with MTV News. I wrote one piece about this band called The Hairy Patt Band. It ended up on the MTV News website. I was very excited. I haven’t seen that in 30 years. It’s one of those things where I wondered what ever happened to that story or if it exists anywhere, in any form. So, that [news] actually had resonance. It’s a bummer. Is it as important to maintain the archives of MTV News as it is The Washington Post? I’m biased, but I would say, no. But it is still a loss of culture—and it is a unique loss of culture. This was a unique and novel form of information that was emergent in the 1990s and now is lost. In the moment, its very existence captured the culture in a way that is worth preserving.
How do you feel about the future of digital preservation of news, data, and information?
I’m more pessimistic than I used to be. I came of age with the internet. When it was new, I used to describe it as the emergence from a new dark age. We had all this information and there was no more going back. All this existed. Everything was online, and we had archives. Now, we see, in part because the scale has increased so quickly that economic considerations come into play, and all of a sudden… the internet isn’t just an endless archive anymore. There are very few places that are doing what libraries do to capture these things on microfiche or store books for the public’s benefit. There is so much of it and that becomes the problem.
Why is it important to pay attention to this issue and preserve journalism for future reporters?
It is obviously the case that we are creating information, culture, and benchmarks for society faster than we can figure out how we’re going to make sure they’re preserved. I think that’s probably always been the case, except that what’s different now is that we are more cognizant of the process of preservation and the challenges of preservation. We expect there to be this thing that exists forever. We don’t yet know how to balance the interest in having as few things be ephemeral as possible, versus the value in doing that… maybe it’s not even possible to preserve everything in the way that we would want to at scale. We have created a process by which it is possible to record and observe nearly everything, and now we’re realizing that that is potentially in conflict with our desire to also store and preserve all this information indefinitely.
Anything you’d like to add?
I think it’s worth noting that preservation is one of the few areas in which I think artificial intelligence bears some potential benefit. One of the things that I’ve long found frustrating is that The New York Times, The Washington Post, and other major news outlets, have enormous storehouses of information—not all of it textual. The New York Times must have, in its archives, photos of every square inch of New York City at some point in time over the course of the past 100 years. Artificial intelligence is a great tool for indexing and documenting. We now have tools that allow us to go deeper into our archives and extract more information from them, which I think is a positive development, and is something I’ve advocated for a long time publicly. Only with the advent of artificial intelligence does large-scale preservation become something that seems feasible. One can go through the National Archive and extract an enormous amount of information that is currently stored there in an accessible form, which saves someone from having to stumble upon a particular image. I think that is beneficial. I don’t think that necessarily solves the storage at scale issue, but it does address the fact that so much information is currently locked away and inaccessible, which is another facet of the challenge.
About the author
Caralee Adams is a journalist based in Bethesda, Maryland. She is a graduate of Iowa State University and received her master’s in political science at the University of New Orleans. After working at newspapers and magazines, she has been a freelancer covering education, science, tech and health for a variety of publications for more than 30 years.
Last week, along with a DDOS attack and exposure of patron email addresses and encrypted passwords, the Internet Archive’s website javascript was defaced, leading us to bring the site down to access and improve our security.
The stored data of the Internet Archive is safe and we are working on resuming services safely. This new reality requires heightened attention to cyber security and we are responding. We apologize for the impact of these library services being unavailable.
The Wayback Machine, Archive-It, scanning, and national library crawls have resumed, as well as email, blog, helpdesk, and social media communications. Our team is working around the clock across time zones to bring other services back online. In coming days more services will resume, some starting in read-only mode as full restoration will take more time.
We’re taking a cautious, deliberate approach to rebuild and strengthen our defenses. Our priority is ensuring the Internet Archive comes online stronger and more secure.
This October, we are publishing The Vanishing Culture Report, a new open access report examining the power and importance of preservation in our digital age.
As more content is created digitally and provided to individuals and memory institutions through temporary licensing deals rather than ownership, materials such as sound recordings, books, television shows, and films are at constant risk of being removed from streaming platforms. This means they are vanishing from our culture without ever being archived or preserved by libraries.
But the threat of vanishing is not exclusive to digital content. As time marches on, analog materials on obsolete formats—VHS tapes, 78rpm recordings, floppy disks—are deteriorating and require urgent attention to ensure their survival. Without proper archiving, digitization, and access, the cultural artifacts stored in these formats are in danger of being lost forever.
By highlighting the importance of ownership and preservation in the digital age, The Vanishing Culture Report aims to inform individuals, institutions, and policymakers about the breadth and scale of cultural loss thus far, and inspire them to take proactive steps in ensuring that our cultural record remains accessible for future generations.
Share Your Story!
As part of the Vanishing Culture report, we’d like to hear from you. We invite you to share your stories about why preservation is important for the media you use on our site. Whether it’s a website crawl in the Wayback Machine, a rare book that shaped your perspective, a vintage film that captured your imagination, or a collection that you revisit often, we want to know why preserving these items is important to you. Share your story now!
At this year’s annual celebration in San Francisco, the Internet Archive team showcased its innovative projects and rallied supporters around its mission of “Universal Access to All Knowledge.”
“People need libraries more than ever,” said Brewster Kahle, founder of the Internet Archive, at the October 12 event. “We have a set of forces that are making libraries harder and harder to happen—so we have to do something more about it.”
Efforts to ban books and defund libraries are worrisome trends, Kahle said, but there are hopeful signs and emerging champions.
Watch the full live stream of the celebration
Among the headliners of the program was Connie Chan, Supervisor of San Francisco’s District 1, who was honored with the 2023 Internet Archive Hero Award. In April, she authored and unanimously passed a resolution at the San Francisco Board of Supervisors, backing the Internet Archive and the digital rights of all libraries.
Chan spoke at the event about her experience as a first-generation, low-income immigrant who relied on books in Chinese and English at the public library in Chinatown.
Watch Supervisor Chan’s acceptance speech
“Having free access to information was a critical part of my education—and I know I was not alone,” said Chan, who is a supporter of the Internet Archive’s role as a digital, online library. “The Internet Archive is a hidden gem…It is very critical to humanity, to freedom of information, diversity of information and access to truth…We aren’t just fighting for libraries, we are fighting for our humanity.”
Several users shared testimonials about how resources from the Internet Archive have enabled them to advance their research, fact-check politicians’ claims, and inspire their creative works. Content in the collection is helping improve machine translation of languages. It is preserving international television news coverage and Ukrainian memes on social media during the war with Russia.
Technology is changing things—some for the worse, but a lot for the better, said David McRaney, speaking via video to the audience in the auditorium at 300 Funston Ave. “And when [technology] changes things for the better, it’s going to expand the limited capabilities of human beings. It’s going to extend the reach of those capabilities, both in speed and scope,” he said. “It’s about a newfound freedom of mind, and time, and democratizing that freedom so everyone has access to it.”
Open Library developer Drini Cami explained how the Internet Archive is using artificial intelligence to improve access to its collections.
When a book is digitized, it used to be that photographs of pages had to be manually cropped by scanning operators. The Internet Archive recently trained a custom machine learning model to automatically suggest page boundaries—allowing staff to double the rate of process. Also, an open-source machine learning tool converts images into text, making it possible for books to be searchable, and for the collection to be available for bulk research, cross-referencing, text analysis, as well as read aloud to people with print disabilities.
“Since 2021, we’ve made 14 million books, documents, microfiche, records—you name it—discoverable and accessible in over 100 languages,” Cami said.
As AI technology advanced this year, Internet Archive engineers piloted a metadata extractor, a tool that automatically pulls key data elements from digitized books. This extra information helps librarians match the digitized book to other cataloged records, beginning to resolve the backlog of books with limited metadata in the Archive’s collection. AI is also being leveraged to assist in writing descriptions of magazines and newspapers—reducing the time from 40 to 10 minutes per item.
“Because of AI, we’ve been able to create new tools to streamline the workflows of our librarians and the data staff, and make our materials easier to discover, and work with patrons and researchers, Cami said. “With new AI capabilities being announced and made available at a breakneck rate, new ideas of projects are constantly being added.”
A recent Internet Archive hackathon explored the risks and opportunities of AI by using the technology itself to generate content, said Jamie Joyce, project lead with the organization’s Democracy’s Library project. One of the hackathon volunteers created an autonomous research agent to crawl the web and identify claims related to AI. With a prompt-based model, the machine was able to generate nearly 23,000 claims from 500 references. The information could be the basis for creating economic, environmental and other arguments about the use of AI technology. Joyce invited others to get involved in future hackathons as the Internet Archive continues to expand its AI potential.
Peter Wang, CEO and co-founder at Anaconda, said interesting kinds of people and communities have emerged around cultures of sharing. For example, those who participate in the DWeb community are often both humanists and technologists, he said, with an understanding about the importance of reducing barriers to information for the future of humanity. Wang said rather than a scarcity mindset, he embraces an abundant approach to knowledge sharing and applying community values to technology solutions.
“With information, knowledge and open-source software, if I make a project, I share it with someone else, they’re more likely to find a bug,” he said. “They might improve the documentation a little bit. They might adapt it for a novel use case that I can then benefit from. Sharing increases value.”
The Internet Archive’s Joy Chesbrough, director of philanthropy, closed the program by expressing appreciation for those who have supported the digital library, especially in these precarious times.
“We are one community tied together by the internet, this connected web of knowledge sharing. We have a commitment to an inclusive and open internet, where there are many winners, and where ethical approaches to genuine AI research are supported,” she said. “The real solution lies in our deep human connection. It inspires the most amazing acts of generosity and humanity.”
***
If you value the Internet Archive and our mission to provide “Universal Access to All Knowledge,” please consider making a donation today.
For more than 20 years, the Internet Archive’s Television News Archive has monitored television news, preserving more than 9.5 million broadcasts totaling more than 6.6 million hours from across the world, with a continuous archive spanning the past decade. Today just a small sliver of that archive is accessible to journalists and scholars due to the inaccessibility of video at this scale: fast forwarding through that much television news is simply beyond the ability of any human to make sense of. The small fraction of programs that contain closed captioning, speech recognition transcripts or OCR’d onscreen text can be keyword searched through the TV Explorer and TV AI Explorer, but for the majority of this global multi-decade archive, there has until now been no way for researchers to assess and understand the narratives of television news at scale, especially the visual landscape that distinguishes television from other forms of media and which is so central to understanding many of the world’s biggest stories from war to pandemics to the economy.
As the TV News Archive enters its third decade, it is increasingly exploring the ways in which it can preserve the domestic and international response to global events as it did with 9/11 two decades ago. As a first step towards this vision, over the last few months the Archive has preserved more than 46,000 broadcasts from domestic Belarusian, Russian and Ukrainian television news channels, including (in the order they were added to the Archive) Russia Today (part of the Archive since July 2010 but included in this collection starting January 1), Russian channels 1TV, NTV and Russia 1 (from March 26) and Russia 24 (from April 25), Ukrainian channel Espreso (from April 25) and Belarusian channel Belarus 24 (from May 16).
Why preserve television news coverage in a time of war? For journalists today it makes it possible to digest and report on how the war is being framed and narrated, with an eye towards how these narratives influence and shape popular support for the conflict and its potential future trajectory. For future generations of scholars, it makes it possible to look back at the contemporary information environment and prevailing public information, perspectives, and narratives.
While there are myriad options for the general public to watch these channels today in realtime, there is no research-oriented archival interface designed for journalists and scholars to understand their coverage at the scale of days to months, to scan for key visuals and events and to comment, discuss and illustrate how nations are portraying major stories.
To address this critical need, today we are tremendously excited to unveil the Television News Visual Explorer, a collaboration of the GDELT Project, the Internet Archive’s Television News Archive and the Media-Data Research Consortium to explore new approaches to enabling rapid exploration and understanding of the visual landscape of television news.
The Visual Explorer converts each broadcast into a grid of thumbnails, one every 4 seconds, displayed in a grid six frames wide and scrolling vertically through the entire program, making it possible to skim an hour-long broadcast in a matter of seconds. Clicking on any thumbnail plays a brief 30 second clip of the broadcast at that point, making it trivial to rapidly triage a broadcast for key moments. The underlying thumbnails can even be downloaded as a ZIP file to enable non-consumptive computational analysis, from OCR to augmented search.
Machines today can catalog the basic objects and activities they see in video and generate transcripts of their spoken and written words, but the ability to contextualize and understand the meaning of all that coverage remains a uniquely human capability. No person could watch the entirety of the Archive’s 6.6 million hours of broadcasts, yet even just the 46,000 broadcasts in this new collection would be difficult for a single researcher to watch or even fast forward through in their entirety. Television’s linear format means coverage has historically been consumed a single moment at a time like a flashlight in a darkened warehouse. In contrast, this new interface makes it possible to see an entire broadcast all at once in a single display, making television news “skimmable” for the first time.
The Visual Explorer and this new research collection of Belarusian, Russian and Ukrainian television news coverage represent early glimpses into a new initiative reimagining how memory institutions like the Archive can make their vast television news archives more accessible to scholars, journalists and informed citizens. Beneath the simple and intuitive interface lies an immensely complex and highly experimental set of workflows prototyping both an entirely new scholarly and journalistic interface to television news and entirely new approaches to rapidly archiving international television coverage of global events.
Over the coming weeks, additional channels from the TV News Archive will become available through the new Visual Explorer, as well as a variety of experiments with the new lenses that tools like automatic transcription and translation can offer in helping journalists and scholars make sense of such vast realtime archives.
For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.
Watching a single episode of the evening news can be informative. Tracking trends in broadcasts over time can be fascinating.
The Internet Archive has preserved nearly 3 million hours of U.S. local and national TV news shows and made the material open to researchers for exploration and non-consumptive computational analysis. At a webinar April 13, TV News Archive experts shared how they’ve curated the massive collection and leveraged technology so scholars, journalists and the general public can make use of the vast repository.
Roger Macdonald, founder of the TV News Archive, and Kalev Leetaru, collaborating data scientist and GDELT Project founder, spoke at the session. Chris Freeland, director of Open Libraries, served as moderator and Internet Archive founder Brewster Kahle offered opening remarks.
Watch video
“Growing up in the television age, [television] is such an influential, important medium—persuasive, yet not something you can really quote,” Kahle said. “We wanted to make it so that you could quote, compare and contrast.”
The Internet Archive built on the work of the Vanderbilt Television Archive, and the UCLA Library Broadcast NewsScape to give the public a broader “macro view,” said Kahle. The trends seen in at-scale computational analyses of news broadcasts can be used to understand the bigger picture of what is happening in the world and the lenses through which we see the world around us.
In 2012, with donations from individuals and philanthropies such as the Knight Foundation, the Archive started repurposing the closed captioning data stream required of all U.S. broadcasters into a search index. “This simple approach transformed the antiquated experience of searching for specific topics within video,” said Macdonald, who helped lead the effort. “The TV caption search enabled discovery at internet speed with the ability to simultaneously search millions of programs and have your results plotted over time, down to individual broadcasters and programs.”
Scholars and journalists were quick to embrace this opportunity, but the team kept experimenting with deeper indexing. Techniques like audio fingerprinting, Optical Character Recognition (OCR) and Computer Vision made it possible to capture visual elements of the news and improve access, Macdonald said.
Sub-collections of political leaders’ speeches and interviews have been created, including an extensive Donald Trump Archive. Some of the Archive’s most productive advances have come from collaborating with outsiders who have requested more access to the collection than is available through the public interface, Macdonald said. With appropriate restrictions to maintain respect for broadcasters and distribution platforms, the Archive has worked with select scientists and journalists as partners to use data in the collection for more complex analyses.
Treating television as data
Treating television news as data creates vast opportunities for computational analysis, said Leetaru. Researchers can track word frequency use in the news and how that has changed over time. For instance, it’s possible to look at mentions of COVID-related words across selected news programs and see when it surged and leveled off with each wave before plummeting downward, as shown in the graph below.
From television news to digitized books and periodicals, dozens of projects rely on the collections available at archive.org for computational and bibliographic research across a large digital corpus. Data scientists or anyone with questions about the TV News Archives, can contact info@archive.org.
Up Next
This webinar was the fourth a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The next will be about Analyzing Biodiversity Literature at Scale on April 27. Register here.
A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.
This week we release new data generated by our Face-o-Matic tool, developed in collaboration with Matroid, adding to our list of public figures detected by facial-recognition on major cable news stations on the TV News Archive.
In addition to President Donald Trump and the four congressional leaders, the expanded list now includes most former living presidents and recent major party presidential contenders, including Hillary Clinton and Barack Obama. (For the full list of public officials tracked, as well as methodical notes, see bottom of the post.)
Detecting faces on TV news and turning them into data provides a new quantitative path for journalists and researchers to explore how news is presented to the public and compare and contrast editorial choices that individual networks make. This new measure shows us the duration that politicians’ faces are actually shown on screen, whether it’s a clip of that person speaking, muted footage, or a still photo shown in the background to illustrate a point.
Adding to the Television Explorer, fueled by closed captions and our Third Eye chyron reading tool, a wealth of information is now available to analyze. (See the TV News Archive home page for examples of visualizations created by journalists and researchers using TV News Archive data.)
Here are six quick takeaways using Face-o-Matic for an analysis covering roughly six months, from November 2017 through May 2018, looking at four cable TV news networks: BBC News, CNN, Fox News, and MSNBC.
As we’ve seen in past analyses with Face-o-Matic data, President Donald Trump is the major political star on cable TV news as compared to other top political figures examined. To put this in perspective: over a six month period stretching from November 2017 to May 2018, the president’s face appeared on TV cable news the equivalent of a full 13.5 days, counting every second of face-time. The next closest political figure we analyzed was House Speaker Paul Ryan, R., Wis., whose visage appeared the equivalent of one day.
After Trump, GOP leaders in Congress are the most popular faces on TV cable news.
The two GOP leaders in Congress, Ryan and Senate Majority Leader Mitch McConnell, R., Ky. are the next most popular faces on TV news cable news networks. Between the two, Ryan ranks first on the TV news cable networks we examined: BBC News, CNN, Fox News, and MSNBC. McConnell is the next most shown face on these networks, with the exception of BBC News.
Hillary Clinton and Barack Obama figure prominently on Fox News.
Fox News airs proportionately more images of failed presidential candidate 2016 Hillary Clinton and former president Barack Obama than other cable TV news networks. Fox News showed Clinton’s face 7.6 times more than CNN did, and Obama’s 3.6 times more. Fox News also showed Clinton 3.6 times more than MSNBC, and Obama, 2.3 times more.
Hannity shows more Hillary Clinton face-time than any other top-rated Fox News show.
Not only does the Fox News “Hannity” program air more images of Hillary Clinton proportionately than any other top rated Fox News show, with just one exception, it is the Fox News show that shows her face more than current congressional leaders–Ryan, McConnell, Schumer or Pelosi. “Hannity” also shows more images of Obama than other top rated Fox News shows.
Ryan face-time spikes on news shows aired during morning hours.
All three U.S. cable news networks examined showed high rates of face-time for Ryan on shows airing during morning hours, ranging from 9 am to 11 am. This may be linked to his leadership role in Congress and that morning hours are prime for large announcements. For example, on Fox News’ “America’s Newsroom” and “Happening Now” show spikes of face-time for Ryan. On MSNBC, “Live with Hallie Jackson” and “Live with Velshi and Ruhle” show high rates of images for Ryan. And on CNN, “At This Hour with Kate Bolduan” shows high rates of Ryan as well.
Links to interactive charts for top-rated news shows; view can be adjusted to exclude specific politicians. The source for top-rated shows is shows with 2017 top viewership by Nielsen.
BBC News provides a window into how news is presented to a major foreign audience. Like U.S. cable news networks, BBC News features more face-time for Trump than other political figures examined. Ryan ranks a distant second. Overall, BBC News, however, shows much lower rates of images of U.S. political figures than U.S. cable news shows do.
The Face-o-Matic data set, available for download on the Internet Archive, uses facial recognition to track the faces of prominent public officials as they appear on major cable TV news networks: BBC News, CNN, Fox News, and MSNBC. The list of public officials tracked, along with the date that detection began, is here:
President & current congressional leaders
President Donald Trump, 7/13/17
Speaker Paul Ryan, R., Wis., 7/13/17
House Minority Leader Nancy Pelosi, D., Calif., 7/13/17
Senate Majority Leader Mitch McConnell, R., Ky., 7/13/17
Senate Minority Leader, Chuck Schumer, D., N.Y., 7/13/17
Former living presidents and recent major party presidential candidates*
George H.W. Bush, 10/5/17
George W. Bush, 11/1/17
Jimmy Carter, 10/21/17
Bill Clinton, 9/12/17
Hillary Clinton, 9/12/17
Barack Obama, 7/13/17
Mitt Romney, 10/4/17
*Note: Our data set does not include Sen. John McCain, R., Ariz., who ran for president opposite Obama in 2008. Sample testing of facial detection for the senator revealed a somewhat frequent rate of false positives – instances where the identified face was not the senator’s, but rather one of a number of lookalikes. While we make no claim that all of the detections in the Face-o-matic data set are error free, we did test faces to minimize these. Please be sure to notify us if you find errors in the data.