Tag Archives: television news archive

Library as Laboratory Recap: Opening Television News for Deep Analysis and New Forms of Interactive Search

Watching a single episode of the evening news can be informative. Tracking trends in broadcasts over time can be fascinating. 

The Internet Archive has preserved nearly 3 million hours of U.S. local and national TV news shows and made the material open to researchers for exploration and non-consumptive computational analysis. At a webinar April 13, TV News Archive experts shared how they’ve curated the massive collection and leveraged technology so scholars, journalists and the general public can make use of the vast repository.

Roger Macdonald, founder of the TV News Archive, and Kalev Leetaru, collaborating data scientist and GDELT Project founder, spoke at the session. Chris Freeland, director of Open Libraries, served as moderator and Internet Archive founder Brewster Kahle offered opening remarks.

Watch video

“Growing up in the television age, [television] is such an influential, important medium—persuasive, yet not something you can really quote,” Kahle said. “We wanted to make it so that you could quote, compare and contrast.” 

The Internet Archive built on the work of the Vanderbilt Television Archive, and the UCLA Library Broadcast NewsScape to give the public a broader “macro view,” said Kahle. The trends seen in at-scale computational analyses of news broadcasts can be used to understand the bigger picture of what is happening in the world and the lenses through which we see the world around us.

In 2012, with donations from individuals and philanthropies such as the Knight Foundation, the Archive started repurposing the closed captioning data stream required of all U.S. broadcasters into a search index. “This simple approach transformed the antiquated experience of searching for specific topics within video,” said Macdonald, who helped lead the effort. “The TV caption search enabled discovery at internet speed with the ability to simultaneously search millions of programs and have your results plotted over time, down to individual broadcasters and programs.”

“[Television] is such an influential, important medium—persuasive, yet not something you can really quote. We wanted to make it so that you could quote, compare and contrast.”

Brewster Kahle, Internet Archive

Scholars and journalists were quick to embrace this opportunity, but the team kept experimenting with deeper indexing. Techniques like audio fingerprinting, Optical Character Recognition (OCR) and Computer Vision made it possible to capture visual elements of the news and improve access, Macdonald said. 

Sub-collections of political leaders’ speeches and interviews have been created, including an extensive Donald Trump Archive. Some of the Archive’s most productive advances have come from collaborating with outsiders who have requested more access to the collection than is available through the public interface, Macdonald said. With appropriate restrictions to maintain respect for broadcasters and distribution platforms, the Archive has worked with select scientists and journalists as partners to use data in the collection for more complex analyses.

Treating television as data

Treating television news as data creates vast opportunities for computational analysis, said Leetaru. Researchers can track word frequency use in the news and how that has changed over time.  For instance, it’s possible to look at mentions of COVID-related words across selected news programs and see when it surged and leveled off with each wave before plummeting downward, as shown in the graph below.

The newly computed metadata can help provide context and assist with fact checking efforts to combat misinformation. It can allow researchers to map the geography of television news—how certain parts of the world are covered more than others, Leetaru said. Through the collections, researchers have explored  which presidential tweets challenging election integrity got the most exposure on the news.  OCR of every frame has been used to create models of how to identify names of every “Dr.” depicted on cable TV after the outbreak of COVID-19 and calculate air time devoted to the medical doctors commenting on one of the virus variants.  Reverse image lookup of images in TV news has been used to determine the source of photos and videos.  Visual entity search tools can even reveal the increasing prevalence of bookshelves as backdrops during home interviews in the pandemic, as well as appearances of books by specific authors or titles. Open datasets of computed TV news metadata are available that include all visual entity and OCR detections, 10-minute interval captioning ngrams and second by second inventories of each broadcast cataloging whether it was “News” programming, “Advertising” programming or “Uncaptioned” (in the case of television news this is almost exclusively advertising).

From television news to digitized books and periodicals, dozens of projects rely on the collections available at archive.org for computational and bibliographic research across a large digital corpus. Data scientists or anyone with questions about the TV News Archives, can contact info@archive.org.

Up Next

This webinar was the fourth a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The next will be about Analyzing Biodiversity Literature at Scale on April 27. Register here.

Pro-Airbnb advertising dominated recent political TV ads in San Francisco

Based on algorithmic analysis, Pro-Airbnb advertising dominated political TV ads in San Francisco in the weeks leading up to Election Day. Two thirds of the minutes devoted to political ads on several initiatives and races before voters focused on arguments against a proposal to curb the company’s operations in the city, according to a review of the Internet Archive television archive. Voters ended up rejecting Proposition F, whose opponents claimed it would encourage neighbors to spy on each other and increase lawsuits, by a margin of 55 to 45 percent.

Minutes of TV Political Ads in San Francisco

The Archive identified total of 1,959 minutes of ads (4,591 plays) opposing Proposition F, out of 2,895 minutes devoted to all political TV ads, or roughly two thirds of the air-time.

To put that in perspective, Mayor Ed Lee, who won his reelection easily, was the subject of only 55 minutes of ads. Though he appeared in and narrated hundreds of ads supporting Propositions A and D, the only ads that mention his mayoral race were airings of a support ad paid for not by his own campaign, but rather by an independent expenditure from Clint Reilly, a local real estate developer and former professional political consultant.

Samples of all ads found to be related to 2015 San Francisco elections can be viewed here, and metadata about those that occurred in archived television can be downloaded from this page.

The only political ad that aired on television in support of proposition F was this one, which was observed for a total of 16 minutes between October 16th to 25th. The ad, which features a parody of the Eagles’ song “Hotel California,” was pulled from Youtube and the ShareBetterSF campaign website because of claims of copyright infringement. Dale Carlson, a spokesman for the campaign who contacted the Archive, wrote “We believe the ad is parody and did not constitute a copyright violation. But it had already run its course and we weren’t going to spend money on legal bills to defend an ad that was already off the air.”

In all, the Archive identified 14 unique ads opposing Proposition F that aired on TV. In the final days of the campaign, the opponents devoted airtime to this ad that calls the proposal “too extreme,” quotes from the San Francisco Chronicle, and cites high profile opponents such as Lt. Gov. Gavin Newsom, Mayor Lee. This 30-second ad aired 423 times on 10 channels in San Francisco (CNBC, CNN, FOXNEWS, KGO, KNTV, KOFY, KPIX, KRON, KTVU, MSNBC).

This review updates an earlier one issued last week focused exclusively on Airbnb ads, broadening the analysis to include all political TV ads aired from August 25th through November 3.  The Archive identified ads through a number of sources, including SFGov’s Summary of Third Party Expenditures Regarding San Francisco Candidates hosted by the City of San Francisco. An audio fingerprint was created for each ad and used to find matches in some 35,000 hours of archived local station programming and cable news network shows available in the San Francisco region.  The Internet Archive’s television news research library presents public opportunities to search, compare and contrast news programs in its archive.  Entertainment programming is only available for select algorithmic study within its server environment.

The Internet Archive’s review of political TV ads relating to Proposition F is part of experimentation in preparation for our new Knight Foundation funded project to track political TV ads in key primary states. Stay tuned for news about our December launch.

Research by Trevor von Stein

Pro-Airbnb political TV ads air at rate of 100:1 as San Franciscans head to polls

For every one minute of political ads aired in favor of a contentious ballot initiative intended to further regulate Airbnb’s growing presence in the city where it is headquartered, more than 100 minutes of ads urging them to vote “no,” have aired on local San Francisco area TV stations, according to an assessment of the Internet Archive’s television archive.

Audio fingerprinting of YouTube-hosted advertising was used to identify the same ads in local station programming and cable news networks available in the region, from August 25th through October 26th.  Sample ads can be viewed here, and metadata about their occurrences can be downloaded from this page.

Proposition F, which is backed by a coalition of unions, land owners, housing advocates, and neighborhood groups, would restrict private rentals to 75 nights per year as well as enact rules that would ensure that hotel taxes are paid and city code followed. It would also allow private party lawsuits by neighbors against private renters suspected of violating the law.

The Internet Archive found just one TV ad favoring the initiative, also appeared on the Proposition F campaign website. The Archive discovered 32 instances of this ad airing on local TV stations, for a total of 16 minutes of airplay. However, the ad, which features a parody of the song “Hotel California,” by the Eagles, (the lyrics were replaced with “Hotel San Francisco,”) was recently removed from the official website because of a claim of copyright infringement.

In contrast, in our sample range, Airbnb supporters aired more than 26 hours of ads against the initiative. One example ad, which is below, claims that the initiative would “encourage neighbors to spy on each other,” and “create thousands of new lawsuits.” This ad played at least 358 times in recent weeks, for a total of 179 minutes of airtime.

Over all, according to reports filed with the San Francisco Ethics Commission, opponents of Proposition F have reported spending $6.5 million compared to $256,000 from organizations supporting the initiative.

Of course the ad campaigns are not just limited to television. Airbnb apologized last week after it caught flack for a series of controversial bus stations and billboard ads that critics called “passive aggressive” and “whiny,”  for complaining about how public institutions, such as libraries, spent their tax revenue-derived budgets.

But TV remains a key way that political operators try to influence voters. As Nate Ballard, a Democratic strategist recently said on a local newscast: “That’s how you win campaigns in California, on TV.”

The Internet Archive’s review of political TV ads relating to Proposition F is part of experimentation in preparation for our new Knight Foundation funded project to track political TV ads in key primary states. Stay tuned for news about our December launch.

research by Trevor von Stein

 

 

 

 

 

Get your Dem debate visualizations here

Hot off the internet presses, here is media analyst’s Kalev Leetaru’s visualization tool, fueled by Internet Archive data, which enables users to trace particular phrases used in broadcast news coverage in the first 24 hours after would-be presidential nominees appeared in the first Democratic debate of the 2016 election.

Scroll down and what sticks out immediately are the two subjects that captured most of the news broadcasters’ attention: “Bernie Sanders’ “damn emails” quote and guns.

When the subject came up of the controversy over Clinton’s decision to do public work from a private email server, rather than attack Clinton, Sanders defended her:

“Let me say — let me say something that may not be great politics. But I think the secretary is right, and that is that the American people are sick and tired of hearing about your damn e-mails.”

According to Internet Archive data, that sound bite aired 496 times across stations.

The other issue that grabbed attention was gun violence: Sanders, who hails from gun-friendly rural Vermont, was called to task for his vote to make it tougher to hold gun manufacturers liable when the guns they make are used in a crime. Answering a question by CNN moderator Anderson Cooper, on whether Sanders is tough enough on guns, Clinton said:

“No, not at all. I think that we have to look at the fact that we lose 90 people a day from gun violence. This has gone on too long and it’s time the entire country stood up against the NRA. The majority of our country…(APPLAUSE)… supports background checks, and even the majority of gun owners do.”

This clip aired 260 times across stations.

However, these are just the top take-aways from this massive data crunching tool. It provides a search mechanism for the user to do deeper dives into the data and discover trends across and within certain types of news broadcasts.

Leetaru’s own analysis is here, on the Washington Post’s Monkey Cage. Among his observations:

There was also variation in how much attention each network paid to each candidate (you can see for yourself using the interactive visualization). Telemundo favored Sanders with 41 percent, followed by O’Malley with 24 percent and Clinton at just 21 percent, though admittedly, they broadcast a relatively small number of excerpts. FOX Business also favored Sanders 50 percent to Clinton’s 38 percent, as did CSPAN with Sanders at 52 percent to Clinton’s 44 percent. All other networks favored Clinton, though sometimes by a relatively close margin — like CNBC (50 percent Clinton to 43 percent Sanders) or PBS affiliates (41 percent Clinton to 38 percent Sanders).

This tool is also part of the Internet Archive’s testing of technology that we’ll use in our new Knight Foundation funded project to track political TV ads in key primary states, which will launch in early December.

Dig in and have fun.

As Democratic candidates debate, Internet Archive will be gathering data

When Hillary Clinton and Bernie Sanders take the podium tonight along with other contenders for the Democratic presidential nomination in 2016, their debate will be televised. The Television Archive will be tracking the news coverage surrounding the debate, viewable and searchable, here.

And this tool, developed by political scientist Kalev Leetaru  and fueled by Internet Archive data, allows users to see how many times a particular candidate’s name is mentioned in news coverage. Going into the debate, Hillary Clinton is getting more than twice as mentions as Sen. Bernie Sanders.

We take for granted that candidates will debate on screen, but it wasn’t always so. The faceoff between Republican Vice President Richard Nixon and Democrat U.S. Senator Jack Kennedy in 1960, 55 years ago last month, marked the first time that Americans were able to watch candidates for the nation’s highest office from the comfort of their living rooms. You can see part one of the debate here, preserved on the Archive’s servers:

The received wisdom about this famous debate was that, from this point on, candidates had to think not just about what they said on the campaign stump, but how they looked. This could make a huge difference in how the public and the media perceived who “won” the debate. Nixon looked tired and like he needed a shave. Kennedy looked healthy and vibrant. Those who listened on the radio thought Nixon won.

“It’s one of those unusual points in the timeline of history where you say things changed very dramatically–in this case, in a single night,” Alan Schroeder, a media historian and associate professor at Northeastern University, told Time Magazine in 2010.

Here’s part II of the Kennedy-Nixon 1960 debate:

We don’t know yet who the perceived winner of tonight’s debate will be. The Internet Archive’s data will provide one way to evaluate this. Stay tuned.

Who’s Really Winning the Media Wars in the 2016 Campaign?

When it comes to media coverage, it seems as if Donald Trump is “trumping” all his rivals, Republicans and Democrats alike.  But is that true?  And how does it vary by print, digital and television media?  Using the Internet Archive’s Television Archive and the GDELT Project, researcher Kalev Leetaru is able to analyze daily data to see who is winning the media wars of 2016.  Today we are excited to announce three new visualizations that explore American politics through the lens of television: a live campaign tracker hosted by The Atlantic that offers a running tally of all mentions of the 2016 presidential candidates across national television monitored by the Archive, and two visualizations that show which statements from the first Republican debate went viral on television.  Finally, an analysis published in The Guardian shows just how unique television coverage of the campaign is and how much it differs from print and online coverage.  Candidates live and die by their ability to capture media attention.  Now, thanks to Leetaru, citizens have the tools to examine the election media data daily.

A Live 2016 Campaign Tracker

atlantic-television-tracker

 

Media coverage of the 2016 presidential candidates has been dominating the news cycle for the last few months, with article after article asking which candidate is dominating the headlines at the moment.   Working with The Atlantic, we created the visualization above that tallies how many times each candidate has been mentioned on domestic national television networks thus far in 2015.  The list updates each morning, providing an incredibly unique peek into who is pulling ahead at the moment.  For those interested in drilling further into the data, an interactive explorer dashboard allows you to drill down by candidate and network.

Who Won the First Republican Debate?

debate

This past July we used audio fingerprinting technology from the Laboratory for the Recognition and Organization of Speech and Audio at Columbia University to scan the audio of all monitored television shows for two weeks after the President’s January 2015 State of the Union address and identified every time an excerpted clip of his speech was broadcast on another television show.  In this way we were able to create an interactive timeline of which portions of his speech went “viral”.

We’ve repeated that process for the first Republican debate, both the “prime” and “undercard” events, exploring which soundbites made the rounds across television news shows in the week following the debate.

For the undercard debate, Carly Fiorina was the clear winner, account for 45% of the soundbites from the debate that subsequently aired elsewhere in the following week, followed by Rick Perry at 15.7%.  Both of the most-excerpted responses from the undercard debate belonged to her, with her quote “Hillary Clinton lies about Benghazi, she lies about emails. She is still defending Planned Parenthood, and she is still her party’s frontrunner” appearing 53 times and her quote “Did any of you get a phone call from Bill Clinton? I didn’t. Maybe it’s because I hadn’t given money to the foundation or donated to his wife’s Senate campaign.” appearing 47 times.

For the prime debate, Trump was the overall winner, with 30.7% of the subsequently aired soundbites being his, followed by Rand Paul at 14.1% and Chris Christie at 13.7%.  The two most-excerpted statements of the debate were both by Trump, one regarding his refusal to pledge not to run as an Independent, which aired 199 times, and the second about his past misogynic Twitter comments, which aired 337 times.  Rand Paul and Chris Christie’s exchange about the fourth amendment and government surveillance aired 190 times, culminating in Rand Paul’s now-famous “I know you gave [President Obama] a big hug, and if you want to give him a big hug again, go right ahead.”  Ben Carson’s closing remarks about his work as a surgeon were the most-repeated of any of the candidates, with 86 rebroadcasts over the following week.

How Much Coverage is Trump Really Getting?

guardian-trump-analysis

Finally, with all of the hyperbole swirling about Trump’s utter domination of media coverage of the Republican race, a key question is just how much his lead differs across media modalities.  Is online news coverage of 2016 campaign cycle identical to print coverage identical to television coverage?  In a piece for The Guardian’s Data Blog, I explored election coverage across these different forms of media and found that Trump’s lead is entirely dependent on where you look, emphasizing just how important it is to be able to analyze television coverage directly.

As the 2016 political season begins to shift into high gear stay tuned for so much more to come as we explore television and politics!

A Dream to Preserve TV News, on the Road to Realization… with Your Help

On The Media’s TLDR
CNN
Huffington Post
Philly.com
Fast Company

We are about to receive a remarkable private collection of video taped U.S. television news that spans 35 years.  We welcome contributions of TV news recorded before the year 2001 to help broaden our research library.

M_Stokes4Marion Marguerite Stokes, a librarian, social justice advocate and TV interview program host, believed that it was vital to preserve television news.

Mrs. Stokes started recording news at home in 1977 — and never stopped. Before her death in December 2012 she recorded 140,000 video cassettes. Her family searched for a home for her unique collection and found us in June.

It is a unique collection of local news from Boston (1977-1986) and Philadelphia (1986-2012), as well as all the national news. The Boston era is particularly notable for the busing/desegregation strife that raged throughout.

Marion Stokes’ amazing commitment to preserve television news, a passion that few at the time entirely understood, shaped the daily lives of her children growing up and, later, visits of her grandchildren. Her dream of using this collection for the public good can now be fulfilled.

In just a few days, four large shipping containers on trucks will be winding their way across the country to our Richmond, California physical archive. The digitization of such a huge collection will take a number of years and funding we have yet to raise.

Join us in helping to realize Marion Stokes’ gift to the future and make it available to all, forever, for free.  Please consider making a contribution, right now!

UCLA Brings Light to the Undiscovered Country of Television News

The UCLA Library recently launched a remarkable broadcast news  research and education platform, Broadcast NewsScape.   The service is accessible online to users on the UCLA campus.  Platform managers hope to expand access throughout the UC system later this year.  NewsScape captures closed captioning, in a manner similar to our TV News Search & Borrow, to facilitate deep search and discovery of relevant segments of over 200,000 U.S. and international news program episodes.

BroadcastNewsScape1We are excited that the UCLA library has joined Vanderbilt University and the Internet Archive in offering tailored research and public interest access to television news.  These successful demonstrations of responsibly providing public benefit access to television news are helping to enrich conversations regarding mutual benefits among media and library stakeholders.

UCLA has a storied history in archiving television news, starting with the 1974 Senate Watergate hearings.  Between 1979 and 2003, UCLA recorded off-air more than 100,000 news programs, preserving and making them accessible in UCLA’s Film & Television Archive’s News and Public Affairs Collection  In 2005, Communication Studies department professors Francis F. Steen and Tim Groeling brought UCLA’s television news archiving into the digital age, recording direct to disks and, most transformationally, preserving available closed captioning.  Their collection has enabled researches to experiment with new digital processes for analyzing attributes of broadcast news.

NewsScape_infrastructureLast year, the UCLA Library started making provisions to take the digital news archive under its wing, devoting considerable server resources and relieving Francis and Tim from their 8-year labor of love maintaining their modest, sometimes cantankerous, hardware and ever-growing data stores.

Thanks to the leadership of associate university librarians Todd Grappone and Sharon Farb, the UCLA Library’s newly launched Broadcast NewsScape tool is welcoming scholars, educators and students from throughout the university to delve deeply and and derive new insights from the undiscovered country that is television news.

UCLA’s announcement: http://newsroom.ucla.edu/portal/ucla/ucla-library-launches-transformative-243873.aspx