Tag Archives: FiveThirtyEight

Expanding the Television Archive

When we started archiving television in 2000, people shrugged and asked, “Why?  Isn’t it all junk anyway?” As the saying goes, one person’s junk is another person’s gold. From 2010-18, scholars, pundits and above all, reporters, have spun journalistic gold from the data captured in our 1.5 million hours of television news recordings. Our work has been fueled by visionary funders(1) who saw the potential impact of turning television – from news reports to political ads – into data that can be analyzed at scale. Now the Internet Archive is taking its Television Archive in new directions. In 2018 our goals for television will be: better curation in what we collect; broader collection across the globe; and working with computer scientists interested in exploring our huge data sets. Simply put, our mission is to build and preserve comprehensive collections of the world’s most important television programming and make them as accessible as possible to researchers and the general public. We will need your help.  

“Preserving TV news is critical, and at the Internet Archive we’ve decided to rededicate ourselves to growing our collection,” explained Roger MacDonald, Director of Television at the Internet Archive. “We plan to go wide, expanding our archives of global TV news from every continent. We also plan to go deep, gathering content from local markets around the country. And we plan to do so in a sustainable way that ensures that this TV will be available to generations to come.”

Libraries, museums and memory institutions have long played a critical role in preserving the cultural output of our creators. Television falls within that mandate. Indeed some of the most comprehensive US television collections are held by the Library of Congress, Vanderbilt University and UCLA. Now we’d like to engage with a broad range of libraries and memory institutions in the television collecting and curation process. If your organization has a mandate to collect television or researcher demand for this media, we would like to understand your needs and interests. The Internet Archive will undertake collection trials with interested institutions, with the eventual goal of making this work self-sustaining.

Simultaneously, we are looking to engage researchers interested in the non-consumptive analysis of television at scale, in ways that continue to respect the interests of right holders. The tools we’ve created may be useful. For instance, we hope the tools the Internet Archive used to detect TV campaign ads can be applied by researchers in new and different ways.  If your organization has interest in computing with television as data at large, we are interested in working with you.

This groundbreaking interface for searching television news, based on the closed captions associated with US broadcasts, was developed between 2009-2012.

A brief history of the Internet Archive’s Television collection:

2000 Working with pioneering engineer, Rod Hewitt, IA begins archiving 20 channels originating from many nations.

Oct. 2001 September 11, 2001 Collection established, and enhanced in 2011.

2009-2012 With funding from the Knight Foundation and many others, we built a service to allow public searching, citation and borrowing of US television news programs on DVD.

2012-2014 Public TV news library launched with tools to search, quote and share streamed snippets from television news.

2014 Pilot launched to detect political advertisements broadcast in the Philadelphia region, led to developing open sourced audio fingerprinting techniques.

2016 Political ad detection, curation, and access expanded to 28 battleground regions for 2016 elections, enabling journalists to fact check the ads and analyze the data at scale. The same tools helped reporters analyze presidential debates.  This resulted in front-page data visualizations in The New York Times, as well as 150+ analyses by news outlets from Fox News to The Economist to FiveThirtyEight.

2017-date Experiments with artificial intelligence techniques to employ facial identification, and on-screen optical character recognition to aid searching and data mining of television. Special curated collections of top political leaders and fact-check integrations.

In the run-up to the 2016 presidential elections, journalists at the NYT and elsewhere began analyzing television as data, in this case looking at the different sound bites each network chose to replay.

Embarking on a new direction also means shifting away from some of our current services. Our dedicated television team has been focusing on metadata enhancement and assisting journalists and scholars to use our data. We will be wrapping up some of these free services in the next three to four months.  We hope others will take up where we left off and build the tools that will make our collection even more valuable to the public.

Now more than ever in this era of disinformation, our world needs an open, reliable, canonical reference source of television news. This cannot exist without the diligent efforts of technologists, journalists, researchers, and television companies all working together to create a television archive open for all. We hope you will join us!

To learn more about the work of the TV News Archive outreach and metadata innovation team over the last few years, please see our blog posts.

(1) Funding for the Television Archive has come from diverse donors, including the John S. and James L. Knight Foundation, Democracy Fund, Rita Allen Foundation, craigslist Charitable Fund and The Buck Foundation.

TV News Record: The year in TV news visualizations

Thanks for being part of our community at the TV News Archive. As 2017 draws to a close, we’ve chosen six of our favorite visualizations using TV News Archive data. We look forward to assisting many more journalists and researchers in what will likely be an even more tumultuous news year. 

The New York Times: Mueller indictments

The New York Times editorial page used our Third Eye chyron collection to produce an analysis of TV news coverage of major indictments of Trump campaign officials by special counsel Robert Mueller: “The way each network covered the story – or avoided it – is a sign of how the media landscape has become ever more politicized in the Trump era. ”

credit: Taylor Adams, Jessia Ma, and Stuart A. Thompson, The New York Times, “Trump Loves Fox & Friends,” November 1, 2017.

FiveThirtyEight: hurricane coverage

Writing for FiveThirtyEight.com, Dhrumil Mehta demonstrated that TV news broadcasters paid less attention to Puerto Rico’s hurricane Maria than to hurricanes Harvey and Irma, which hit mainland U.S. primarily in Texas and Florida. Mehta used TV News Archive data via Television Explorer.

credit: Dhrumil Mehta, “The Media Really Has Neglected Puerto Rico,” FiveThirtyEight, September 28, 2017.

TV News Archive: face-time for lawmakers

Using our Face-o-Matic data set, we found that Sen. Majority Leader Mitch McConnell, R., Ky., gets the most face-time on cable TV news, and MSNBC features his visage more than the other networks examined. Fox News features the face of House Minority Leader Nancy Pelosi, D., Calif., more than any other cable network.

Vox:  Mueller’s credibility

Vox’s Alvin Chang used Television Explorer to explore how Fox News reports on Mueller’s credibility. This included showing how often Fox news mentioned Mueller in the context of former presidential candidate Hillary Clinton.

Alvin Chang, “A week of Fox News transcripts shows how they began questioning Mueller’s credibility,” Vox, October 31, 2017.

The Trace: coverage of shootings

Writing for The Trace, Jennifer Mascia presented findings from Television Explorer showing how coverage of shootings declines rapidly: “Two days after 26 people were massacred in a Texas church, the incident — one of the worst mass shootings in American history — had nearly vanished from the major cable news networks.”

Credit: Jennifer Mascia, “Data Shows Shrinking Cable News Cycles for This Fall’s Mass Shootings,” The Trace, December 5, 2017.

The Washington Post: What TV news networks covered in 2017

Philip Bump of The Washington Post crunched Television Explorer data to look at coverage of eleven major news stories by five national news networks. Here’s his visualization of TV news coverage of “sexual assault,” which shows how coverage increased at the end of the year as dozens of prominent men in media, politics, and entertainment were accused of sexual harassment or assault.

Philip Bump, “What national news networks were talking about during 2017, The Washington Post, December 15, 2017.

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

History is happening, and we’re not just watching

  1. Which recent hurricane got the least amount of attention from TV news broadcasters?
    1. Irma
    2. Maria
    3. Harvey
  2. Thomas Jefferson said, “Government that governs least governs best.”
    1. True
    2. False
  3. Mitch McConnell shows up most on which cable TV news channel?
    1. CNN
    2. Fox News
    3. MSNBC

Answers at end of post.

The Internet Archive’s TV News Archive, our constantly growing online, free library of TV news broadcasts, contains 1.4 million shows, some dating back to 2009, searchable by closed captioning. History is happening, and we preserve how broadcast news filters it to us, the audience, whether it’s through CNN’s Jake Tapper, Fox’s Bill O’Reilly, MSNBC’s Rachel Maddow or others. This archive becomes a rich resource for journalists, academics, and the general public to explore the biases embedded in news coverage and to hold public officials accountable.

Last October we wrote how the Internet Archive’s TV News Archive was “hacking the election,” then 13 days away. In the year since, we’ve been applying our experience using machine learning to track political ads and TV news coverage in the 2016 elections to experiment with new collaborations and tools to create more ways to analyze the news.

Helping fact-checkers

Since we launched our Trump Archive in January 2017, and followed in August with the four congressional leaders, Democrat and Republican, as well as key executive branch figures, we’ve collected some 4,534 hours of curated programming and more than 1,300 fact-checks of material on subjects ranging from immigration to the environment to elections.

 

The 1,340 fact-checks–and counting–represent a subset of the work of partners FactCheck.orgPolitiFact and The Washington Post’s Fact Checker, as we link only to fact-checks that correspond to statements that appear on TV news. Most of the fact-checks–524–come from PolitiFact; 492 are by FactCheck.org, and 324 from The Washington Post’s Fact Checker.

We’re also proud to be part of the Duke Reporter’s Lab’s new Tech & Check collaborative, where we’re working with journalists and computer scientists to develop ways to automate parts of the fact-checking process.  For example, we’re creating processes to help identify important factual claims within TV news broadcasts to help guide fact-checkers where to concentrate their efforts. The initiative received $1.2 million from the John S. and James L. Knight Foundation, the Facebook Journalism Project and the Craig Newmark Foundation.

See the TrumpUS Congress, and executive branch archives and collected fact-checks.

TV News Kitchen

We’re collaborating with data scientists, private companies and nonprofit organizations, journalists, and others to cook up new experiments available in our TV News Kitchen, providing new ways to analyze TV news content and understand ourselves.

Dan Schultz, our senior creative technologist, worked with the start-up Matroid to develop Face-o-Matic, which tracks faces of selected high level elected officials on major TV cable news channels: CNN, Fox News, MSNBC, and BBC News. The underlying data are available for download here. Unlike caption-based searches, Face-o-Matic uses facial recognition algorithms to recognize individuals on TV news screens. It is sensitive enough to catch this tiny, dark image of House Minority Leader Nancy Pelosi, D., Calif., within a graphic, and this quick flash of Senate Minority Leader Chuck Schumer, D., N.Y., and Senate Majority Leader Mitch McConnell, R., Ky.

The work of TV Architect Tracey Jaquith, our Third Eye project scans the lower thirds of TV screens, using OCR, or optical character recognition, to turn these fleeting missives into downloadable data ripe for analysis. Launched in September 2017, Third Eye tracks BBC News, CNN, Fox News, and MSNBC, and collected more than four million chyrons captured in just over two weeks, and counting.

Download Third Eye data. API and TSV options available.

Follow Third Eye on Twitter.

Vox news reporter Alvin Chang used the Third Eye chyron data to report how Fox News paid less attention to Hurricane Maria’s destruction in Puerto Rico than it did to Hurricanes Irma and Harvey, which battered Florida and Texas. Chang’s work followed a similar piece by Dhrumil Mehta for FiveThirtyEight, which used Television Explorer, a tool developed by data scientist Kalev Leetaru to search and visualize closed captioning on the TV News Archive.

 

FiveThirtyEight used TV News Archive captions to create this look at how cable networks covered recent hurricanes.

CNN’s Brian Stelter followed up with a similar analysis on “Reliable Sources” October 1.

We’re also working with academics who are using our tools to unlock new insights. For example, Schultz and Jaquith are working with Bryce Dietrich at the University of Iowa to apply the Duplitron, the audiofingerprinting tool that fueled our political ad airing data, to analyze floor speeches of members of Congress. The study identifies which floor speeches were aired on cable news programs and explores the reasons why those particular clips were selected for airing. A draft of the paper was presented in the 2017 Polinfomatics Workshop in Seattle and will begin review for publication in the coming months.

What’s next? Our plans include making more than a million hours of TV news available to researchers from both private and public institutions via a digital public library branch of the Internet Archive’s TV News Archive. These branches would be housed in computing environments, where networked computers provide the processing power needed to analyze large amounts of data. Researchers will be able to conduct their own experiments using machine learning to extract metadata from TV news. Such metadata could include, for example, speaker identification–a way to identify not just when a speaker appears on a screen, but when she or he is talking. Metadata generated through these experiments would then be used to enrich the TV News Archive, so that any member of the public could do increasingly sophisticated searches.

Going global

We live in an interdependent world, but we often lack understanding about how other cultures perceive us. Collecting global TV could open a new window for journalists and researchers seeking to understand how political and policy messages are reported and spread across the globe. The same tools we’ve developed to track political ads, faces, chyrons, and captions can help us put news coverage from around the globe into perspective.

We’re beginning work to expand our TV collection to include more channels from around the globe. We’ve added the BBC and recently began collecting Deutsche Welle from Germany and the English-language Al Jazeera. We’re talking to potential partners and developing strategy about where it’s important to collect TV and how we can do so efficiently.

History is happening, but we’re not just watching. We’re collecting, making it accessible, and working with others to find new ways to understand it. Stay tuned. Email us at tvnews@archive.org. Follow us @tvnewsarchive, and subscribe to our weekly newsletter here.

Answer Key

  1. b. (See: “The Media Really Has Neglected Puerto Rico,” FiveThirtyEight.
  2. b. False. (See: Vice President Mike Pence statement and linked PolitiFact fact-check.)
  3. c. MSNBC. (See: Face-O-Matic blog post.)

Members of the TV News Archive team: Roger Macdonald, director; Robin Chin, Katie Dahl, Tracey Jaquith, Dan Schultz, and Nancy Watzman.

New Research Tool for Visualizing Two Million Hours of Television News

Guest post by Kalev Leetaru

Today the Internet Archive announces a new interactive timeline visualization–the Television Explorer–that lets you trace how any keyword–think “emails”, “tax returns”, “alt-right”–has been covered on U.S. television news over the past half-decade.

See the Television Explorer, a new tool for exploring TV News.

screenshot-2016-12-19-09-50-09

Over the past year and a half, the GDELT Project and the Internet Archive’s Television News Archive have worked closely together to visualize how U.S. television news has covered the contentious 2016 political campaign.

One of the tools we created was the 2016 Candidate Television Tracker, which used closed captioning to count how many times each of the presidential candidates was mentioned on television and offered a day-by-day timeline showing the ebbs and flows of who was “winning” the free media wars. (Answer: President-elect Donald Trump.) This tool was used by such media outlets as The Atlantic, The Washington Post, FiveThirtyEight, Politico and The Guardian, among many others.

Now we are adapting this tool to allow more sophisticated searches: rather than just the presidential candidates, now you can trace television news coverage of any keyword of your choosing. You can even run advanced searches that find words in conjunction with other works or phrases, such as finding mentions of Hillary Clinton that also discuss her email server. All search results are available for download via CSV and JSON export, making it possible for data journalists, researchers, and advocates to fine tune their analysis of the data.

When searching, you get back a visual timeline showing how often that word or phrase has appeared on American television news over the past half-decade. Nearly two million hours of television news totaling more than 5.7 billion words from over 150 distinct stations spanning July 2009 to present (though not all stations were monitored for the entire period) are searchable in this interface.

Unlike the Internet Archive’s Television New Archive interface, which returns results at the level of an hour or half-hour “show,” the interface here reaches inside of those six and a half years of programming and breaks the more than one million shows into individual sentences and counts how many of those sentences contain your keyword of interest. Instead of reporting that CNN had 24 hour-long shows yesterday that mentioned Donald Trump one or more times, the interface here will count how many sentences uttered on CNN yesterday mentioned his name–a vastly more accurate metric for assessing media attention.

Explore how CNN covered the presidential campaign of 2012 versus 2016 and understand just how big of a media event this year’s election really was. See precisely when Edward Snowden burst onto the scene and how Wikileaks got more coverage during the 2016 presidential election than its debut in 2010. Watch the seasonal spikes of Thanksgiving, or see how ebola received little attention, even as thousands died in Africa, becoming a topic only after the first Americans became infected.

Using the “near” search feature, plot coverage of Wikileaks that also mentioned either “Podesta,” “email,” or “emails” nearby and discover that FOX paid far more attention to the DNC and Podesta email hacks than CNN, MSNBC, CNBC or Bloomberg. In contrast, CNN focused more intensely on the Trayvon Martin shooting (Aljazeera America and Bloomberg were not yet being monitored by the Archive), while Aljazeera led coverage of the Michael Brown and Eric Garner deaths.

screenshot-2016-12-19-09-53-55

Search of term “Wikileaks” near Podesta, emails, Clinton

Search for “ivory” to see that Aljazeera America (which ceased operation in April 2016) devoted vastly more of its coverage to elephant poaching in Africa than any other monitored national network. It also paid the most attention to “Africa” and to the “refugee” crisis. On the other hand, Bloomberg has devoted much more of its time to “China” and to the economic crisis in “Greece” last year.

We look forward to seeing what people do with this new tool Please share your favorite searches on Twitter with the hashtag “#internetarchivetvsearch”. If you have any questions, please email kalev.leetaru5@gmail.com or nancyw@archive.org.

Kalev Leetaru is an independent data journalist. 

Internet Archive Canada and National Security Letter in the news: roundup

The Internet Archive garnered major media attention over the past week, first, on our plan to create a Canadian copy, and second, on the news we received a National Security Letter (NSL) requesting personal information about a user, the second in our history.

Canadian copy

Brewster Kahle’s post explaining why, in light of the new administration, the Internet Archive is raising money to build a copy of its collections in Canada hit a nerve.  More details were in a FAQ.

On November 29, Rachel Maddow led her MSNBC show with a segment about how the Internet Archive’s Wayback Machine helps reporters by preserving a record of what politicians say online, even when they later delete it.

One of her main examples: how soon after winning the election, President-elect Donald Trump’s official federal transition web page included a “rundown ….of all of the ‘world’s top properties that Donald Trump’s owns.”

The website has since been deleted, Maddow noted.

Maddow also called the Internet Archive, a “national treasure…an international treasure.” (We’re blushing.)

Meanwhile, Paul Sawers noted in Venture Beat:

 Given that lies and fake news played a crucial part in the 2016 U.S. presidential election narrative, it is somewhat notable that the Internet Archive had launched the Political TV Ad Archive back in January to help journalists fact-check claims made during political campaigning.

In The Washington Times, Andrew Blake wrote about the Internet Archive’s plans to create a Canadian copy and also reported:

Mr. Trump’s office did not immediately respond to a request for comment Wednesday. Prior to being elected president, however, the Republican businessman suggested taking action to prevent Americans from becoming radicalized online by the Islamic State terror group’s social media recruitment efforts.

Here’s a link to Trump’s speech referenced by The Washington Times.

Sam Thielman reported in The Guardian on challenges facing libraries generally, including the Internet Archive’s decision to create a Canadian copy of data. The piece also discusses how the New York Public Library has changed its privacy policies to assure readers that it will not keep user data longer than expected.

Other media outlets reporting on the Internet Archive’s news include NBC News, the BBC, the New RepublicRecode Daily, and Newsweek.

Increasing transparency on National Security Letters

Last week the Internet Archive also revealed we received a National Security Letter (NSL), requesting we turn over personal information about a particular user, the second in our history. We worked with the Electronic Frontier Foundation (EFF) to challenge the letter and gain the right to release it in redacted form; in the process, we also highlighted an error in the NSL about the right to appeal, which may have affected thousands of other letters.

Kim Zetter, a reporter for The Intercept, reported at length about how the Internet Archive took the unusual step of challenging the NSL–and won:

Now, Kahle and the archive are notching another victory, one that underlines the progress their original fight helped set in motion. The archive, a nonprofit online library, has disclosed that it received another NSL in August, its first since the one it received and fought in 2007. Once again it pushed back, but this time events unfolded differently: The archive was able to challenge the NSL and gag order directly in a letter to the FBI, rather than through a secretive lawsuit. In November, the bureau again backed down and, without a protracted battle, has now allowed the archive to publish the NSL in redacted form.

Dhrumil Mehta of FiveThirtyEight.com reported on the error exposed by the Internet Archive and the EFF–namely, the NSL incorrectly described the means for possible appeals of the gag order preventing an organization that has received such a letter from publicizing it. Mehta has filed a Freedom of Information Act request (FOIA) to find out how many letters sent out by the Federal Bureau of Investigation (FBI) contain this error:

This letter was particularly troublesome to privacy advocates because it contained misinformation about the rights of a letter recipient to challenge the nondisclosure requirement. The letter stated that the Internet Archive could “make an annual challenge to the nondisclosure requirement.” The Electronic Frontier Foundation, an advocacy organization that is legally representing the Internet Archive, pointed out in a press release that the passage of the USA Freedom Act in June of 2015 changed the law to allow letter recipients to challenge the National Security Letter at any time, not just once annually. In response to the EFF’s claim, the FBI withdrew its National Security Letter, allowed the Internet Archive to publish a redacted version of the letter containing the error and promised to correct the mistake by informing everyone else who got the same erroneous language.

It’s not just us

Tim Johnson of McClatchyDC drew all the themes together, linking the Internet Archive’s Canada announcement, the news on the NSL, and actions other library organizations are taking, all in one piece.

It turns out the nonprofit Internet Archive isn’t alone in taking action.

The New York Public Library announced a change this week to its privacy policy, informing users that it would retain less information about their activities.

The American Library Association, headquartered in Chicago, embraced that move and encourages others, including telling public libraries to encrypt all communications and lock up stored data to protect it from a prying government.