Tag Archives: The New York Times

TV News Record: Recognizing Trump’s voice on TV, NYT & Axios coverage, + Ryan fact-check

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week we explore cutting edge work by Joostware that moves us closer to solving the challenge of searching vast archives of video by speaker, note the use of TV News Archive data by The New York Times and Axios, and share a fact-checked interview by exiting House Speaker Paul Ryan about his legacy.

Joostware trained model to recognize Trump’s voice

What if you wanted to search the TV News Archive to find every instance where President Donald Trump is talking?

That’s the research question that the San Francisco-based firm Joostware concentrated on for its Who Said What project, which won a $50,000 prototype grant from the John S. and James L. Knight Foundation. Last week Joostware’s founder, Delip Rao, presented the project’s progress at a gathering in Austin, Texas. (The Internet Archive’s own Dan Schultz, in his Bad Idea Factory incarnation, also presented on Contextubot, which we recently profiled here.)

Audio and video today is viewed as an opaque object and it’s meant for linear consumption,” Rao said in his presentation. “But truly any audio and video especially in the context of news has a lot of structure to it. There are speakers of interest, and these speakers take turns, and then within each turn something was communicated. So our goal is to identify these speakers who are of interest and also the content that was spoken in that turn and indexing that.

Anyone can search the TV News Archive already via closed captions at the Internet Archive or via Television Explorer. Our experiments with facial detection and chyron extraction are another way to find and analyze news clips. But searching a video archive by “speaker id” – finding all the video where a person is actually talking – is a tough technical challenge. Our Trump Archive and congressional, executive branch, and administration archives are all manually curated video collections designed to demonstrate what it would be like to have automated speaker id search.

Joostware researchers have made progress toward this goal. They took material from the Trump Archive, and used it to train a model that recognizes the president’s voice, by using properties of the voice signal. They created a prototype search software that is more than 95% accurate on a human annotated dataset in returning video clips where Trump is actually speaking.

What’s next? With more resources, Joostware hopes to give this technology back to the Internet Archive to improve search within the TV News Archive. And Rao and others continue to work within the larger community of researchers working to crack the code of video to help fact-checkers and journalists hold power accountable.

No one is talking about tax law on cable TV news

Jim Tankersley and Karl Russell, reporters for The New York Times, used TV News Archive captions via GDELT’s Television Explorer to demonstrate how little coverage there is on cable TV news for the newly minted $2.5 trillion tax overhaul:

“Consider one of Mr. Trump’s preferred yardsticks: cable news coverage. Throughout the fall, as Republicans rushed their tax bill through Congress in two breakneck months, CNN, Fox News and MSNBC routinely devoted 10 percent of their daily coverage to tax issues, according to data from the Gdelt Project. Interest spiked as Mr. Trump signed the bill in late December, and then it fell precipitously.”

“Stormy Daniels wins TV war: overshadows taxes, health care”

For Axios, Caitlin Owens used TV New Archive data with GDELT’s Television Explorer to shed light on whether the TV networks are paying attention the priorities of the political parties: “Tax cuts and the Affordable Care Act are supposed to be big issues in the midterm elections, but both have faded from the attention of the cable news networks now that they’re no longer front and center in Congress.” Owens thinks it matters because “Democrats are campaigning hard on the GOP’s unpopular attempt to repeal and replace the ACA, and Republicans are pushing the financial benefits of their tax law.”


Fact-Check: Corporate tax revenues are rising (misleading)

House Speaker Paul Ryan, R., Wisc., announced last week he would not be seeking reelection, prompting television interviews that reflected on his legacy. In a “Meet the Press” interview Sunday on NBC, host Chuck Todd asked Ryan to respond to a statement by Sen. Bob Corker, R., Tenn.:

“’This Congress and this administration likely will go down as one of the most fiscally irresponsible administrations and Congresses that we ever had.’ And he’s referring to the fact that this tax bill spiked the deficit. It’s higher than even what was projected.” Ryan responded “That was going to happen. The baby boomers’ retiring was going to do that. These deficit trillion-dollar projections have been out there for a long, long time. Why? Because of mandatory spending, which we call entitlements. Discretionary spending under the CBO baseline is going up about $300 billion over the next 10 years. Tax revenues are still rising. Income tax revenues are still rising. Corporate income tax revenues. Corporate rate got dropped 40 percent, still rising.”

Eugene Kiely reported for FactCheck.org that “Ryan is right that $1 trillion deficit projections ‘have been out there for a long, long time…But corporate tax revenues are down for the first six months of the fiscal year, and they are projected to be less over the next 10 years than they otherwise would have been because of the law.”

Salvador Rizzo and Meg Kelly reported for The Washington Post’s Fact Checker, “The baby-boom generation is retiring, and Congress at best has taken only modest steps to rein in spending on old-age programs, largely because any serious effort is met with hostility and often-misleading attack ads…But the revenue side of the picture cannot be ignored.” “Congress has not been able to grapple with the spending — and  keeps taking steps to undermine the revenue flow as well.”

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

Expanding the Television Archive

When we started archiving television in 2000, people shrugged and asked, “Why?  Isn’t it all junk anyway?” As the saying goes, one person’s junk is another person’s gold. From 2010-18, scholars, pundits and above all, reporters, have spun journalistic gold from the data captured in our 1.5 million hours of television news recordings. Our work has been fueled by visionary funders(1) who saw the potential impact of turning television – from news reports to political ads – into data that can be analyzed at scale. Now the Internet Archive is taking its Television Archive in new directions. In 2018 our goals for television will be: better curation in what we collect; broader collection across the globe; and working with computer scientists interested in exploring our huge data sets. Simply put, our mission is to build and preserve comprehensive collections of the world’s most important television programming and make them as accessible as possible to researchers and the general public. We will need your help.  

“Preserving TV news is critical, and at the Internet Archive we’ve decided to rededicate ourselves to growing our collection,” explained Roger MacDonald, Director of Television at the Internet Archive. “We plan to go wide, expanding our archives of global TV news from every continent. We also plan to go deep, gathering content from local markets around the country. And we plan to do so in a sustainable way that ensures that this TV will be available to generations to come.”

Libraries, museums and memory institutions have long played a critical role in preserving the cultural output of our creators. Television falls within that mandate. Indeed some of the most comprehensive US television collections are held by the Library of Congress, Vanderbilt University and UCLA. Now we’d like to engage with a broad range of libraries and memory institutions in the television collecting and curation process. If your organization has a mandate to collect television or researcher demand for this media, we would like to understand your needs and interests. The Internet Archive will undertake collection trials with interested institutions, with the eventual goal of making this work self-sustaining.

Simultaneously, we are looking to engage researchers interested in the non-consumptive analysis of television at scale, in ways that continue to respect the interests of right holders. The tools we’ve created may be useful. For instance, we hope the tools the Internet Archive used to detect TV campaign ads can be applied by researchers in new and different ways.  If your organization has interest in computing with television as data at large, we are interested in working with you.

This groundbreaking interface for searching television news, based on the closed captions associated with US broadcasts, was developed between 2009-2012.

A brief history of the Internet Archive’s Television collection:

2000 Working with pioneering engineer, Rod Hewitt, IA begins archiving 20 channels originating from many nations.

Oct. 2001 September 11, 2001 Collection established, and enhanced in 2011.

2009-2012 With funding from the Knight Foundation and many others, we built a service to allow public searching, citation and borrowing of US television news programs on DVD.

2012-2014 Public TV news library launched with tools to search, quote and share streamed snippets from television news.

2014 Pilot launched to detect political advertisements broadcast in the Philadelphia region, led to developing open sourced audio fingerprinting techniques.

2016 Political ad detection, curation, and access expanded to 28 battleground regions for 2016 elections, enabling journalists to fact check the ads and analyze the data at scale. The same tools helped reporters analyze presidential debates.  This resulted in front-page data visualizations in The New York Times, as well as 150+ analyses by news outlets from Fox News to The Economist to FiveThirtyEight.

2017-date Experiments with artificial intelligence techniques to employ facial identification, and on-screen optical character recognition to aid searching and data mining of television. Special curated collections of top political leaders and fact-check integrations.

In the run-up to the 2016 presidential elections, journalists at the NYT and elsewhere began analyzing television as data, in this case looking at the different sound bites each network chose to replay.

Embarking on a new direction also means shifting away from some of our current services. Our dedicated television team has been focusing on metadata enhancement and assisting journalists and scholars to use our data. We will be wrapping up some of these free services in the next three to four months.  We hope others will take up where we left off and build the tools that will make our collection even more valuable to the public.

Now more than ever in this era of disinformation, our world needs an open, reliable, canonical reference source of television news. This cannot exist without the diligent efforts of technologists, journalists, researchers, and television companies all working together to create a television archive open for all. We hope you will join us!

To learn more about the work of the TV News Archive outreach and metadata innovation team over the last few years, please see our blog posts.

(1) Funding for the Television Archive has come from diverse donors, including the John S. and James L. Knight Foundation, Democracy Fund, Rita Allen Foundation, craigslist Charitable Fund and The Buck Foundation.

TV News Record: The year in TV news visualizations

Thanks for being part of our community at the TV News Archive. As 2017 draws to a close, we’ve chosen six of our favorite visualizations using TV News Archive data. We look forward to assisting many more journalists and researchers in what will likely be an even more tumultuous news year. 

The New York Times: Mueller indictments

The New York Times editorial page used our Third Eye chyron collection to produce an analysis of TV news coverage of major indictments of Trump campaign officials by special counsel Robert Mueller: “The way each network covered the story – or avoided it – is a sign of how the media landscape has become ever more politicized in the Trump era. ”

credit: Taylor Adams, Jessia Ma, and Stuart A. Thompson, The New York Times, “Trump Loves Fox & Friends,” November 1, 2017.

FiveThirtyEight: hurricane coverage

Writing for FiveThirtyEight.com, Dhrumil Mehta demonstrated that TV news broadcasters paid less attention to Puerto Rico’s hurricane Maria than to hurricanes Harvey and Irma, which hit mainland U.S. primarily in Texas and Florida. Mehta used TV News Archive data via Television Explorer.

credit: Dhrumil Mehta, “The Media Really Has Neglected Puerto Rico,” FiveThirtyEight, September 28, 2017.

TV News Archive: face-time for lawmakers

Using our Face-o-Matic data set, we found that Sen. Majority Leader Mitch McConnell, R., Ky., gets the most face-time on cable TV news, and MSNBC features his visage more than the other networks examined. Fox News features the face of House Minority Leader Nancy Pelosi, D., Calif., more than any other cable network.

Vox:  Mueller’s credibility

Vox’s Alvin Chang used Television Explorer to explore how Fox News reports on Mueller’s credibility. This included showing how often Fox news mentioned Mueller in the context of former presidential candidate Hillary Clinton.

Alvin Chang, “A week of Fox News transcripts shows how they began questioning Mueller’s credibility,” Vox, October 31, 2017.

The Trace: coverage of shootings

Writing for The Trace, Jennifer Mascia presented findings from Television Explorer showing how coverage of shootings declines rapidly: “Two days after 26 people were massacred in a Texas church, the incident — one of the worst mass shootings in American history — had nearly vanished from the major cable news networks.”

Credit: Jennifer Mascia, “Data Shows Shrinking Cable News Cycles for This Fall’s Mass Shootings,” The Trace, December 5, 2017.

The Washington Post: What TV news networks covered in 2017

Philip Bump of The Washington Post crunched Television Explorer data to look at coverage of eleven major news stories by five national news networks. Here’s his visualization of TV news coverage of “sexual assault,” which shows how coverage increased at the end of the year as dozens of prominent men in media, politics, and entertainment were accused of sexual harassment or assault.

Philip Bump, “What national news networks were talking about during 2017, The Washington Post, December 15, 2017.

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

TV News Record: With indictment, chyrons & captions get a graphic workout

A biweekly round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman

Fox News downplayed Mueller indictment, according to NYT editorial chyron analysis

In the most intensive use the Internet Archive’s Third Eye data to date, The New York Times editorial page analyzed chyron data to show how Fox News downplayed this week’s news of the indictment of former Trump campaign manager and other legal developments. The graphic-heavy opinion piece was featured at the top of the online homepage much of the day on Wednesday, Nov. 1:

Though it is far from the only possible way to evaluate news coverage, the chyron has become something of a touchstone for media analysts, being both the most obvious visual example of spin or distraction and the most shareable. Any negative coverage of the president usually prompts a flurry of tweets cataloguing the differences among networks in their chyron text. While CNN, MSNBC and the BBC are typically in alignment, Monday morning was a particularly stark example of how Fox News pushes its own version of reality.

 Read The New York Times opinion piece, and dig into the data yourself.

Captions yield insights on Mueller investigation, shooting coverage

Fox News actively tried to “plant doubt in viewers’ minds” as Mueller brought charges against former Trump campaign officials, according to an analysis of a week’s worth of closed captions by Alvin Chang of Vox News. Chang used Television Explorer, fueled by TV News Archive data, to crunch the numbers behind charts such as the one below.

And The Trace, an independent, nonprofit news organization that focuses on gun violence, used TV News Archive caption data via Television Explorer to show how TV news coverage of mass shootings declines quickly.

Face-o-Matic captures congressional leaders reactions on indictments

In the 24 hours following news breaking about the indictments, our Face-o-Matic data feed captured cable news networks’ editorial choices on how much face-time to allot to congressional leaders’ reactions. The answer: not much.

All together the four congressional leaders’ faces were shown for a total of 2.5 minutes on indictment-related reporting on screen by CNN, Fox News, and MSNBC. Ryan got the lion’s share of the attention. Much of this was devoted to airings of his photo in connection with his official statement,“[N]othing is going to derail what we are doing in Congress, because we are working on solving people’s problems.”

The image of Senate Majority Leader Mitch McConnell, K., Ky., was not featured by any network. House Minority Leader Nancy Pelosi, D., Calif., got attention only from Fox News, which featured her photo with discussion of her statement, in which she said despite the news, “we still need an outside fully independent investigation.”


Fact-check:Papadopoulos had a limited role in Trump campaign (had seat at table/not the whole story)

One of the most parsed statements this week was White House press secretary Sarah Huckabee Sanders’ claim that George Papadopolous, who pleaded guilty to lying to the FBI, had an “extremely limited” role in the campaign. “It was a volunteer position,” she said. “And again, no activity was ever done in an official capacity on behalf of the campaign.”

“Determining how important Papadopoulos was on the Trump team is open to interpretation, so we won’t put this argument to the Truth-O-Meter,” wrote Louis Jacobson, reporting for PolitiFact. Jacobson, however, laid out the known facts. For example, in March 2016, then presidential candidate Donald Trump tweeted out a photo of himself and advisors sitting at a table, saying it was a “national security meeting.” Papadopoulos is seen at the table sitting near future Attorney General Jeff Sessions. However, Jacobson also writes,“There is some evidence to support the argument that Papadopoulos was freelancing by pushing the Russia connection.”

Reviewing Sanders’ claim, as well as a Trump tweet along similar lines, Robert Farley and Eugene Kiely took a similar tack for FactCheck.org, concluding that Papadopoulos had a “seat at the table” in the campaign, but it was beyond licking envelopes and posting lawn signs:  “What we do know is that during this time — from late March to mid-August — Papadopoulos was in regular contact with senior Trump campaign officials and attended a national security meeting with Trump. We will let readers decide if this constitutes a ‘low-level volunteer.'”


Embed TV News Archive clips on web annotations

Now you can embed TV News Archive news clips when commenting and annotating the web, thanks to a new integration from Hypothes.is. From the Hypothesis.is blog:

This integration makes it easy for journalists, fact-checkers, educators, scholars and anyone that wants to relate specific text in a webpage, PDF, or EPUB to a particular snippet of video news coverage. All you need to do to use it is copy the URL of a TV News Archive video page, paste it into the Hypothesis annotation editor and save your annotation. You can adjust the start and end of the video to include any exact snippet. The video will then automatically be available to view in your annotation alongside the annotated text.

See a live example of the integration in this annotation with an embedded news video of Senator Charles Schumer at a news conference over a post that checks the facts in one of his statements.

“This integration means that one of the world’s most valuable resources — the news that the Internet Archive captures across the world everyday — will be able to be brought into close context with pages and documents across the web,” said Hypothesis CEO Dan Whaley. “For instance, a video of a politician making an actual statement next to an excerpt that claims the opposite, or a video of a newsworthy event next to a deeper analysis of it.”

Please take Hypothes.is for a spin and let us know what you think: tvnews@archive.org.

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

In the news: Trump Archive, end-of-term preservation, & link rot

News outlets have been getting the word out on Internet Archive efforts to preserve President-elect Donald Trump’s statements; the outgoing Obama Administration’s web page and government data; as well as preventing that nasty experience of encountering a “404” when you click on a link online, aka “link rot.”

Trump Archive 

A number of journalists have been exploring the riches contained within the newly launched Trump Archive, a TV news clips of the president-elect speaking peppered with links to more than 500 fact checks by national fact-checking groups.

Annie Wiener, writing for The New Yorker, immerses herself in Trump statements and discovers 56 mentions of the escalator in Trump tower, and that Trump:

“is a fan of the word “sleaze,” and of the phrase “tough cookie,” which he has used to describe policemen, his opponents’ political donors, Paul LePage, “real-estate guys in New York and elsewhere,” an unnamed friend who is a “great financial guy,” isis, three professional football players, Reince Priebus, Lyndon Johnson, and Trump’s father, Fred. After watching long stretches of video, she writes, “It occurred to me that spending time online in the Trump Archive could be a form of immersion therapy: a means of overcoming shock through prolonged exposure.”

Geoffrey Fowler, tech columnist for The Wall Street Journal, bemoans the lack of easy-to-use tech tools to help people be responsible citizens overall, but also notes the promise–and challenge–of a curated collection like the Trump Archive:

“The Trump Archive shows what’s hard about using tech to hold officials accountable. It’s assembled and hand-curated by humans. Yet even using the transcripts, it can be hard to tell the difference between a spoken name and a person who’s actually speaking. Archive officials say making their database applicable to hundreds or thousands more politicians would require help from tech firms with capabilities in machine learning and voice and facial recognition.”

Fowler also published this video, featuring plenty of Trump, an interview with Roger Macdonald, director of the TV News Archive; and ample footage of the Internet Archive’s San Francisco headquarters.

The Trump Archive also was featured in Marketplace Tech®, The HillForbesNewsweek, Buzzfeed News TechPlzVentureBeat, engadgetand more.

Preserving Obama Administration websites, social media

The Internet Archive’s efforts to help preserve government websites via the Wayback Machine during and after the transition has continued to garner attention. Wired reports on a group of climate scientists working against the clock to archive government websites related to global warming:

One half was setting web crawlers upon NOAA web pages that could be easily copied and sent to the Internet Archive. The other was working their way through the harder-to-crack data sets—the ones that fuel pages like the EPA’s incredibly detailed interactive map of greenhouse gas emissions, zoomable down to each high-emitting factory and power plant.

The New Scientist also writes on efforts to archive climate data:

Fears that data could be misused or altered have prompted crowd-sourcing to back up federal climate and environmental data, including Climate Mirror, a distributed volunteer effort supported by the Internet Archive and the Universities of Pennsylvania and Toronto.

The Los Angeles Times and Quartz offer reports on archiving climate data.

Internet Archive works against link rot

Tech publications were quick to inform their readers about the Internet Archive’s new chrome extension that fights link rot by directing users to archived web pages. Here is Mashable:

Now Internet Archive has built a Wayback Machine Chrome extension. It works like this: If you click on a link that would normally lead to an error page (think 404), the extension will instead give users the option to load an archived version of the page. The link is no longer simply gone.

Also writing on the fight against link rot: NetworkWorldVenture BeatThe Tech PortalBleeping Computer, and ZDNet.

 

 

 

A Year-end Message from the TV News Archive

by Katie Donnelly

Over the past extremely unpredictable election year, the Internet Archive invented new methods and tools to give journalists, researchers, and the public the power to access, scrutinize, share, and thoroughly fact-check political ads, presidential debates, and TV news broadcasts.

Our efforts were designed to help citizens better understand the patterns of political messages designed to persuade them and find factual, reliable information in what is disturbingly being seen as a “post-truth” world.

The Political TV Ad Archive project proved to be highly useful to our high-profile fact-checking partners, as well as reporters at an array of outlets including The New York Times, The Washington Post, FOX News, The Economist, The Atlantic, and more. By providing data about when, where, and how many times political ads aired on TV in key markets, the project unlocked new creative potential for data reporters to analyze how campaigns and outside groups were targeting messages to voters in different locations.

Breaking events, like political debates and speeches, also offered a chance for archived TV content to shine, allowing reporters to isolate and share clips in near-real time, and fact-checkers to harvest dubious statements for further exploration. In addition, the project’s experience with developing audio fingerprinting (through a new invention we call the Duplitron) for identifying instances of ads inspired a new use: tracking candidate debate sound bites in subsequent TV news shows.

In this way, reporters and researchers were able to analyze and report on which political statements were trending across different TV programs. This provided a way to show how political statements were trending across various networks, revealing the ideological, and agenda-setting and other editorial choices made by news producers about what issues to highlight and overlook.

screenshot-2016-12-19-13-21-14

As Roger Macdonald, director of the TV News Archive, wrote to project partners: “Citizens will increasingly hunger for sound information to inform wise electoral decisions. With our Republic being riven by increasing socio-political chaos and infectious divisions, whose magnitude has not been seen since before our Civil War, we think there are uncommon opportunities to serve citizens with the information for which they will increasingly yearn. We have an historic opportunity to thoughtfully place some grains of sand on the balance pan of reason.”

The project was supported by a generous grant from the Knight News Challenge, funded in partnership with the Knight Foundation, the Democracy Fund, the Hewlett Foundation and the Rita Allen Foundation, and received additional support from the Rita Allen Foundation, the Democracy Fund, PLCB Foundation, Craig Newmark, Christopher Buck, and others

Here is a quick look at project accomplishments:

Political TV Ad Archive

  • Total number of archived ad views, most embedded in partner sites: 2,036,063
  • Number of ads collected: 2,991
  • Political ads broadcast 364,822 times over 26 markets
  • Number of fact and source checks: 131
  • Press coverage: 156 articles

Katie Donnelly is associate director at Dot Connectors Studio, a Philadelphia-based strategy firm that has worked with the Political TV Ad Archive.

Internet Archive data fuels journalists’ analyses of how TV news shows covered prez debate

The presidential debate between Hillary Clinton and Donald Trump on September 26 drew an audience of 84 million, shattering records. It was also a first for the Internet Archive, which made data publicly available, for free, on how TV news shows covered the debate. These data, generated by the Duplitron, the open source tool used to generate counts of ad airings for the Political TV Ad Archive, also is able to track coverage of specific video clips by TV news shows.

Download TV News Archive presidential debate data here.

Journalists took these data and crunched away, creating novel visualizations to help the public understand how TV news presented the debates.

The New York Times created a visual timeline of TV cable news coverage in the 24 hours following the presidential debate, with separate lines for CNN, MSNBC, and Fox News. Below the time line were short explanations of the peaks and how the different networks varied in their presentations even when they all covered roughly the same ground. The project was the work of Jasmine C. Lee, Alicia Parlapiano, Adam Pearce, and Karen Yourish. For much of the day on Sept. 29, it was featured at the top of the New York Times website.

screenshot-2016-09-29-14-39-21

To see more visualizations created by journalists using TV News Archive data following the first presidential debate, visit the Political TV Ad Archive.

The Internet Archive will make similar data available on the upcoming vice presidential debate, as well as the remaining presidential debates. This effort is part of a collaboration with the Annenberg Public Policy Center to study how voters learn about candidates from debates.