Tag Archives: television

The tech powering the Political TV Ad Archive

Ever wonder how we built the Political TV Ad Archive? This post explains what happens back stage — how we are using advanced technology to generate the counts for how many times a particular ad has aired on television, where, and when, in markets that we track.

There are three pieces to the Political TV Ad Archive:

  • The Internet Archive collects, prepares, and serves the TV content in markets where we have feeds. Collection of TV is part of a much larger effort to meet the organization’s mission of providing “Universal Access to All Knowledge.”The Internet Archive is the online home to millions of free books, movies, software, music, images, web pages and more.
  • The Duplitron 5000 is our whimsical name for an open source system responsible for taking video and creating unique, compressed versions of the audio tracks. These are known as audio fingerprints. We create an audio fingerprint for each political ad that we discover, which we then match against our incoming stream of broadcast television to find each new copy, or airing, of that ad. These results are reported back to the Internet Archive.
  • The Political TV Ad Archive is a WordPress site that presents our data and our videos and presents it to the rest of the world. On this website, for the sake of posterity, we also archive copies of political ads that may be airing in markets we don’t track, or exclusively on social media. But for the ads that show up in areas where we’re collecting TV, we are able to present the added information about airings.


Step 1: recording television

We have a whole bunch of hardware spread around the country to record television. That content is then pieced together to form the programs that get stored on the Internet Archive’s servers. We have a few ways to collect TV content. In some cases, such as the San Francisco market, we own and manage the hardware that records local cable. In other cases, such as markets in Ohio and Iowa, the content is provided to us by third party services.

Regardless of how we get the data, the pipeline takes it to the same place. We record in minute-long chunks of video and stitch them together into programs based on what we know about the station’s schedule. This results in video segments of anywhere from 30 minutes to 12 hours. Those programs are then turned into a variety of file formats for archival purposes.

The ad counts we publish are based on actual airings, as opposed to reported airings. This means that we are not estimating counts by analyzing Federal Election Commission (FEC) reports on spending by campaigns. Nor are we digitizing reports filed by broadcasting stations with the Federal Communications Commission (FCC) about political ads, though that is a worthy goal. Instead we generate counts by looking at what actually has been broadcast to the public.

Because we are working from the source, we know we aren’t being misled. On the flip side, this means that we can only report counts for the channels we actively track and record. In the first phase of our project, we tracked more than 20 markets in 11 key primary states (details here.) We’re now in the process of planning which markets we’ll track for the general elections. Our main constraint is simple: money. Capturing TV comes at a cost.

A lot can go wrong here. Storms can affect reception, packets can be lost or corrupted before they reach our servers. The result can be time shifts or missing content. But most of the time the data winds up sitting comfortably on our hard drives unscathed.

Step 2: searching television

Video is terrible when you’re trying to look for a specific piece of it. It’s slow, it’s heavy, it is far better suited for watching than for working with, but sometimes you need to find a way.

There are a few things to try. One is transcription; if you have a time-coded transcript you can do anything. Like create a text editor for video, or search for key phrases, like “I approve this message.”

The problem is that most television is not precisely transcribed. Closed captions are required for most U.S. TV programs, but not for advertisements. Shockingly, most political ads are not captioned. There are a few open source tools out there for automated transcript generation, but the results leave much to be desired.

Introducing audio fingerprinting

We use a free and open tool called audfprint to convert our audio files into audio fingerprints.

An audio fingerprint is a summarized version of an audio file, one that has removed everything except the most interesting pieces of every few milliseconds. The trick is that the summaries are formed in a way that makes it easy to compare them, and because they are summaries, the resulting fingerprint is a lot smaller and faster to work with than the original.

The audio fingerprints we use are based on a thing called frequency. Sounds are made up of waves, and each wave repeats–oscillates–at different rates. Faster repetitions are linked to higher sounds, lower repetitions are lower sounds.

An audio file contains instructions that tell a computer how to generate these waves. Audfprint breaks the audio files into tiny chunks (around 20 chunks per second) and runs a mathematical function on each fragment to identify the most prominent waves and their corresponding frequencies.

The rest is thrown out, the summaries are stored, and the result is an audio fingerprint.

If the same sound exists across two files, a common set of dominant frequencies will be seen in both fingerprints. Audfprint makes it possible to compare the chunks between two sound files, count how many they have in common, and how many appear in roughly the same distance from one another.

This is what we use to find copies of political ads.

Step 3: cataloguing political ads

When we discover a new political ad the first thing we do is register it on the Internet Archive, kicking off the ingestion process. The person who found it types in some basic information such as who the ad mentions, who paid for it, and what topics are discussed.

The ad is then sent to the system we built to manage our fingerprinting workflow, we whimsically call the Duplitron 5000—or the “DT5k.” This uses audfprint to generate fingerprints, organizes how the fingerprints are stored, process the comparison results, and allows us to scale to process across millions of minutes of television.

DT5k generates a fingerprint for the ad, stores it, and then compares that fingerprint with hundreds of thousands of existing fingerprints for the shows that had been previously ingested into the system. It takes a few hours for all of the results to come in. When they do, the Duplitron makes sense of the numbers and tells the archive which programs contain copies of the ad and what time the ad aired.

These result end up being fairly accurate, but not perfect. The matches are based on audio, not video, which means we face trouble when the same soundtrack is used in a political ad as has been used in, for instance, an infomercial.

We are working on improving the system to filter out these kinds of false positives, but even with no changes these fingerprints have provided solid data across the markets we track.


The Duplitron 5000, counting political ads. Credit: Lyla Duey.

Step 4: enjoying the results

And so you understand a little bit more about our system. You can download our data and watch the ads at the Political TV Ad Archive.  (For more on our metadata–what’s in it, and what can you can do with it, read here.)

Over the coming months we are working to make the system more accurate. We are also exploring ways to identify newly released political ads without any need for manual entry.

P.S. We’re also working to make it as easy as possible for any researchers to download all of our fingerprints to use in their own local copies of the Duplitron 5000. Would you like to experiment with this capability? If so, contact me on Twitter at @slifty.

A Dream to Preserve TV News, on the Road to Realization… with Your Help

On The Media’s TLDR
Huffington Post
Fast Company

We are about to receive a remarkable private collection of video taped U.S. television news that spans 35 years.  We welcome contributions of TV news recorded before the year 2001 to help broaden our research library.

M_Stokes4Marion Marguerite Stokes, a librarian, social justice advocate and TV interview program host, believed that it was vital to preserve television news.

Mrs. Stokes started recording news at home in 1977 — and never stopped. Before her death in December 2012 she recorded 140,000 video cassettes. Her family searched for a home for her unique collection and found us in June.

It is a unique collection of local news from Boston (1977-1986) and Philadelphia (1986-2012), as well as all the national news. The Boston era is particularly notable for the busing/desegregation strife that raged throughout.

Marion Stokes’ amazing commitment to preserve television news, a passion that few at the time entirely understood, shaped the daily lives of her children growing up and, later, visits of her grandchildren. Her dream of using this collection for the public good can now be fulfilled.

In just a few days, four large shipping containers on trucks will be winding their way across the country to our Richmond, California physical archive. The digitization of such a huge collection will take a number of years and funding we have yet to raise.

Join us in helping to realize Marion Stokes’ gift to the future and make it available to all, forever, for free.  Please consider making a contribution, right now!

LEARNING FROM RECORDED MEMORY: 9/11 TV News Archive Conference

LEARNING FROM RECORDED MEMORY: 9/11 TV News Archive Conference

Co-sponsored by Internet Archive and New York University’s Moving Image Archiving and Preservation Program, Tisch School of the Arts

Wednesday, August 24, 4:00-6:00 pm; reception follows

New York University, Tisch School of the Arts, 721 Broadway, 6th Floor, Michelson Theater, New York, NY 10003

This conference highlights work by scholars using television news materials to help us understand how TV news presented the events of 9/11/2001 and the international response. Our collective recollection of 9/11 and the following days has become inseparable from the televised images we have all seen. But while TV news is inarguably the most vivid and pervasive information medium of our time, it has not been a medium of record. As the number of news outlets increases, research and scholarly access to the thousands of hours of TV news aired each day grows increasingly difficult. Scholars face great challenges in identifying, locating and adequately citing television news broadcasts in their research.

The 9/11 Television News Archive (http://archive.org/details/911) contains 3,000 hours of national and international news coverage from 20 channels over the seven days beginning September 11, plus select analysis by scholars. It is designed to assist scholars and journalists researching relationships between news events and coverage, engaging in comparative and longitudinal studies, and investigating “who said what when.” What kinds of research and scholarship will be enabled by access to an online database of TV news broadcasts? How will emerging TV news studies make use of this service? This conference offers contemporary insights and predictions on new directions in television news studies.


4:00:  Welcome: Richard Allen, Chair, Department of Cinema Studies, Tisch School of the Arts, NYU
4:05:  Brewster Kahle, Founder and Digital Librarian at the Internet Archive
4:15:  Brian A. Monahan, Iowa State University
4:25:  Deborah Jaramillo, Boston University
4:35:  Marshall Breeding, Vanderbilt Television News Archive
4:45:  Mark J Williams, Department of Film and Media Studies, Dartmouth College
4:55:  Carolyn Brown, American University
5:05:  Michael Lesk, Rutgers University
5:15:  Beatrice Choi, New York University
5:25:  Scott Blake, Artist
5:35:  Discussion
6:00:  Reception (Remarks by Dennis Swanson, President of Station Operations, Fox Television)


Welcome: Richard Allen, Chair, Department of Cinema Studies, Tisch School of the Arts, New York University


Brewster Kahle, Internet Archive

“Introducing the 9/11 TV News Archive”

Brewster Kahle is the founder and Digital Librarian of the Internet Archive in 1996.   An entrepreneur and Internet pioneer, Brewster invented the first Internet publishing system and helped put newspapers and publishers online in the 1990’s.  


Brian A. Monahan, Iowa State University

“Mediated Meanings and Symbolic Politics: Exploring the Continued Significance of 9/11 News Coverage”

In-depth analysis of television news coverage of the September 11 attacks and their aftermath reveals how these events were fashioned into “9/11,” the politically and morally charged signifier that has profoundly shaped public perception, policy and practice in the last decade.  The central argument is that patterned representations of 9/11 in news media and other arenas fueled the transformation of September 11 into a morality tale centered on patriotism, victimization and heroes.  The resulting narrow and oversimplified public understanding of 9/11 has dominated public discourse, obscured other interpretations and marginalized debate about the contextual complexities of these events. Understanding how and why the coverage took shape as it did yields new insights into the social, cultural and political consequences of the attacks, while also highlighting the role of news media in the creation, affirmation and dissemination of meanings in modern life.

Brian Monahan has extensively researched news coverage of 9/11, resulting in a number of scholarly presentations and a book, The Shock of the News: Media Coverage and the Making of 9/11 (2010, NYU Press).



Deborah Jaramillo, Boston University

“Fighting Ephemerality: Seeing TV News through the Lens of the Archive”

The experience of watching the news on TV as events unfold is often complicated by the space of exhibition — typically, the domestic space. When hour upon hour of news is catalogued and archived — placed in a space of focused study — the news and the experience become altogether different. What was meant to be ephemeral acquires permanence, and what is usually a short-term viewing experience becomes a rigorous, frame-by-frame examination. In this presentation I will discuss how the archive challenges researchers to adopt new ways of seeing and explaining TV news.

Deborah L. Jaramillo is Assistant Professor in the Department of Film and Television, Boston University.


Marshall Breeding, Vanderbilt Television News Archive

“An Overview of the Vanderbilt Television News Archive”

Marshall Breeding will give a brief overview of the Vanderbilt Television News Archive and how it carries out its mission to preserve and provide access to US national television news.   He will relate the incredibly diverse kinds of use that the archive receives, including: academic scholarly research; individuals seeking coverage of themselves or family members that may have appeared on the news in life-changing events; those needing historic footage for current journalism, documentaries or other creative works; or corporations or non-profits researching news coverage of their vested topics.  Breeding will also outline some of the constraints it faces in how it provides access to its collection.

Marshall Breeding is the Executive Director of the Vanderbilt Television News Archive and the Director for Innovative Technology and Research for the Vanderbilt University Library.


Mark J. Williams, Department of Film and Media Studies, Dartmouth College

“Media Ecology and Online News Archives”

Online TV news archives are a crucial digital resource to facilitate the awareness
of and critical study of Media Ecology.  The 9/11 TV News Archive will fundamentally
enhance our capacity for the study of historical TV newscasts. Two significant
research and teaching outcomes for this area of study are A) to better understand
the role of television news regarding the mediation of society and its popular
memory, and B) to underscore the significance of television news to the goal of
an informed citizenry.  The 9/11 TV News Archive will enhance and ensure the continued
study of the indelible tragic events and aftermath of 9/11, and make possible
new interventions within journalism history and media history, via online capacities
for access and collaboration.

Mark J. Williams is Associate Professor in the Department of Film and Media Studies, Dartmouth College.


Carolyn Brown, American University

“Documentation and Access: A Latino/a Studies Perspective on Using Video Archives”

This talk will explore the possibilities and potential of using accessible video news archives in two areas: immigration research in the field of communication and documentary journalism. I will speak of the significance of video news archives in my current film, The Salinas Project, and discuss my continuing research on Latino/as and immigration in the news.

Carolyn Brown is Assistant Professor in the School of Communication and Journalism at American University. She produced daily news shows for MSNBC News and Fox News Channel, and has worked as a producer and senior producer in local news in San Francisco, Washington, D.C., and Phoenix.


Michael Lesk, Rutgers University

“Image Analysis for Media Study”

Focusing on television news coverage of the 9/11 attacks, this talk will outline strategies for automatic quantitative analysis of television news imagery.

After receiving a PhD degree in Chemical Physics in 1969, Michael Lesk joined the computer science research group at Bell Laboratories, where he worked until 1984. From 1984 to 1995 he managed the computer science research group at Bellcore, then joined the National Science Foundation as head of the Division of Information and Intelligent Systems, and since 2003 has been Professor of Library and Information Science at Rutgers University, and chair of that department 2005-2008. He is best known for work in electronic libraries, and his book “Practical Digital Libraries” was published in 1997 by Morgan Kaufmann and the revision “Understanding Digital Libraries” appeared in 2004.  He is a Fellow of the Association for Computing Machinery, received the Flame award from the Usenix association, and in 2005 was elected to the National Academy of Engineering. He chairs the NRC Board on Research Data and Information.


Beatrice Choi, New York University

“Live Dispatch: The Ethics of Audio Vision Media Coverage in Trauma and the Legacy of Sound from Shell Shock to 9/11”

What experiential narratives—sensory, aesthetic and political—are invisible to those exposed to traumatic events? Considering September 11, 2001, the media coverage of the event is predominantly visual. People drift in and out of news footage, covered in dust and ash as they exclaim that witnessing the attacks was like watching a movie . In contrast, the wailing of sirens, the staccato thud of feet running from the stricken towers, and the chaotic overlap of voices break through—sometimes even swallow—the visual narratives spun for 9/11. For contemporary American traumatic events, this inquires into how porous the sensory modalities are in experiencing and remembering shock. How, after all, do sensory representations of traumatic events leave in/visible marks on documentation? I address these questions by exploring sound as an alternate modality, evoking a different level of traumatic indexicality. First, I draw attention to the sensory discrepancy between audio and visual content dispersed for American traumatic events, taking 9/11 as the focal event. By investigating the most highly represented media vehicles in the event—television and radio—I delve into a critical visual-acoustic analysis, looking specifically at FDNY radio transmissions and NY1 Aircheck news footage. Finally, I examine the discursive legacy sound imparts in moments of American crisis from shell shock accounts in the late 19th – 20th century to post-9/11 narratives of post-traumatic symptoms. In delineating this legacy, I hope to reveal the ways in which these documented discourses evolve past preconceived sensory boundaries in the experience of trauma.

Beatrice Choi is an NYU MA Graduate from the Media Culture Communication program. She has worked with the 9/11 archives for a year as a Moving Imagery Exhibitions Intern at the National September 11 Memorial & Museum, and recently completed a thesis on Post-Traumatic Landscapes, focusing primarily on post-Katrina New Orleans.


Scott Blake, artist

“9/11 Flipbook and Quantitative Media Study”

Scott Blake has created a flipbook consisting of images of United Airlines Flight 175 crashing into the south tower of the World Trade Center. Accompanying the images are essays written by a wide range of participants, each expressing their personal experience of the September 11th attacks. In addition, the authors of the essays were asked to reflect on, and respond to, the flipbook itself. Not surprisingly, the majority of the essayists experienced the events through news network footage. Blake is distributing his 9/11 Flipbooks to encourage a constructive dialog regarding the media’s participation in sensationalizing the tragedy. To further illustrate his point, Blake conducted a media study using the 9/11 TV News Archive to count the number of times major news networks showed the plane crashes, building collapses and people falling from the towers on September 11, 2001.

While best known for his Barcode Art, Scott Blake has created new works that are scandalous, witty, fun, pornographic, humorous and about a thousand other adjectives viewers might use when seeing them for the first time. A self-described “frivolous artist,” he mows over conceptual and visual boundaries to make work that is as thought provoking as it is entertainingly tongue-in-cheek.


Remarks by Dennis Swanson, President of Station Operations, Fox Television


We thank the many people at New York University and Internet Archive who have helped to make this conference possible.