Category Archives: Television Archive

Internet Archive Celebrates Research and Research Libraries at Annual Gathering

At this year’s annual celebration in San Francisco, the Internet Archive team showcased its innovative projects and rallied supporters around its mission of “Universal Access to All Knowledge.”

Brewster Kahle, Internet Archive’s founder and digital librarian, welcomes hundreds of guests to the annual celebration on October 12, 2023.

“People need libraries more than ever,” said Brewster Kahle, founder of the Internet Archive, at the October 12 event. “We have a set of forces that are making libraries harder and harder to happen—so we have to do something more about it.”

Efforts to ban books and defund libraries are worrisome trends, Kahle said, but there are hopeful signs and emerging champions.

Watch the full live stream of the celebration

Among the headliners of the program was Connie Chan, Supervisor of San Francisco’s District 1, who was honored with the 2023 Internet Archive Hero Award. In April, she authored and unanimously passed a resolution at the San Francisco Board of Supervisors, backing the Internet Archive and the digital rights of all libraries.

Chan spoke at the event about her experience as a first-generation, low-income immigrant who relied on books in Chinese and English at the public library in Chinatown.  

Watch Supervisor Chan’s acceptance speech

“Having free access to information was a critical part of my education—and I know I was not alone,” said Chan, who is a supporter of the Internet Archive’s role as a digital, online library. “The Internet Archive is a hidden gem…It is very critical to humanity, to freedom of information, diversity of information and access to truth…We aren’t just fighting for libraries, we are fighting for our humanity.”

Several users shared testimonials about how resources from the Internet Archive have enabled them to advance their research, fact-check politicians’ claims, and inspire their creative works. Content in the collection is helping improve machine translation of languages. It is preserving international television news coverage and Ukrainian memes on social media during the war with Russia.  

Quinn Dombrowski, of the Saving Ukrainian Cultural Heritage Online project, shows off Ukrainian memes preserved by the project.

Technology is changing things—some for the worse, but a lot for the better, said David McRaney, speaking via video to the audience in the auditorium at 300 Funston Ave. “And when [technology] changes things for the better, it’s going to expand the limited capabilities of human beings. It’s going to extend the reach of those capabilities, both in speed and scope,” he said. “It’s about a newfound freedom of mind, and time, and democratizing that freedom so everyone has access to it.”

Open Library developer Drini Cami explained how the Internet Archive is using artificial intelligence to improve access to its collections.

When a book is digitized, it used to be that photographs of pages had to be manually cropped by scanning operators. The Internet Archive recently trained a custom machine learning model to automatically suggest page boundaries—allowing staff to double the rate of process. Also, an open-source machine learning tool converts images into text, making it possible for books to be searchable, and for the collection to be available for bulk research, cross-referencing, text analysis, as well as read aloud to people with print disabilities.

Open Library developer Drini Cami.

“Since 2021, we’ve made 14 million books, documents, microfiche, records—you name it—discoverable and accessible in over 100 languages,” Cami said.

As AI technology advanced this year, Internet Archive  engineers piloted a metadata extractor, a tool that automatically pulls key data elements from digitized books. This extra information helps librarians match the digitized book to other cataloged records, beginning to resolve the backlog of books with limited metadata in the Archive’s collection. AI is also being leveraged to assist in writing descriptions of magazines and newspapers—reducing the time from 40 to 10 minutes per item.

“Because of AI, we’ve been able to create new tools to streamline the workflows of our librarians and the data staff, and make our materials easier to discover, and work with patrons and researchers, Cami said. “With new AI capabilities being announced and made available at a breakneck rate, new ideas of projects are constantly being added.”

Jamie Joyce & AI hackathon participants.

A recent Internet Archive hackathon explored the risks and opportunities of AI by using the technology itself to generate content, said Jamie Joyce, project lead with the organization’s Democracy’s Library project. One of the hackathon volunteers created an autonomous research agent to crawl the web and identify claims related to AI. With a prompt-based model, the machine was able to generate nearly 23,000 claims from 500 references. The information could be the basis for creating economic, environmental and other arguments about the use of AI technology. Joyce invited others to get involved in future hackathons as the Internet Archive continues to expand its AI potential.

Peter Wang, CEO and co-founder at Anaconda, said interesting kinds of people and communities have emerged around cultures of sharing. For example, those who participate in the DWeb community are often both humanists and technologists, he said, with an understanding about the importance of reducing barriers to information for the future of humanity. Wang said rather than a scarcity mindset, he embraces an abundant approach to knowledge sharing and applying community values to technology solutions.

Peter Wang, CEO and co-founder at Anaconda.

“With information, knowledge and open-source software, if I make a project, I share it with someone else, they’re more likely to find a bug,” he said. “They might improve the documentation a little bit. They might adapt it for a novel use case that I can then benefit from. Sharing increases value.”

The Internet Archive’s Joy Chesbrough, director of philanthropy, closed the program by expressing appreciation for those who have supported the digital library, especially in these precarious times.

“We are one community tied together by the internet, this connected web of knowledge sharing. We have a commitment to an inclusive and open internet, where there are many winners, and where ethical approaches to genuine AI research are supported,” she said. “The real solution lies in our deep human connection. It inspires the most amazing acts of generosity and humanity.”

***

If you value the Internet Archive and our mission to provide “Universal Access to All Knowledge,” please consider making a donation today.

A New Approach To Understanding War Through Television News: Introducing The TV News Visual Explorer & The Belarusian, Russian & Ukrainian TV News Archive

For more than 20 years, the Internet Archive’s Television News Archive has monitored television news, preserving more than 9.5 million broadcasts totaling more than 6.6 million hours from across the world, with a continuous archive spanning the past decade. Today just a small sliver of that archive is accessible to journalists and scholars due to the inaccessibility of video at this scale: fast forwarding through that much television news is simply beyond the ability of any human to make sense of. The small fraction of programs that contain closed captioning, speech recognition transcripts or OCR’d onscreen text can be keyword searched through the TV Explorer and TV AI Explorer, but for the majority of this global multi-decade archive, there has until now been no way for researchers to assess and understand the narratives of television news at scale, especially the visual landscape that distinguishes television from other forms of media and which is so central to understanding many of the world’s biggest stories from war to pandemics to the economy.

As the TV News Archive enters its third decade, it is increasingly exploring the ways in which it can preserve the domestic and international response to global events as it did with 9/11 two decades ago. As a first step towards this vision, over the last few months the Archive has preserved more than 46,000 broadcasts from domestic Belarusian, Russian and Ukrainian television news channels, including (in the order they were added to the Archive) Russia Today (part of the Archive since July 2010 but included in this collection starting January 1), Russian channels 1TV, NTV and Russia 1 (from March 26) and Russia 24 (from April 25), Ukrainian channel Espreso (from April 25) and Belarusian channel Belarus 24 (from May 16).

Why preserve television news coverage in a time of war? For journalists today it makes it possible to digest and report on how the war is being framed and narrated, with an eye towards how these narratives influence and shape popular support for the conflict and its potential future trajectory. For future generations of scholars, it makes it possible to look back at the contemporary information environment and prevailing public information, perspectives, and narratives.

While there are myriad options for the general public to watch these channels today in realtime, there is no research-oriented archival interface designed for journalists and scholars to understand their coverage at the scale of days to months, to scan for key visuals and events and to comment, discuss and illustrate how nations are portraying major stories.

To address this critical need, today we are tremendously excited to unveil the Television News Visual Explorer, a collaboration of the GDELT Project, the Internet Archive’s Television News Archive and the Media-Data Research Consortium to explore new approaches to enabling rapid exploration and understanding of the visual landscape of television news.

The Visual Explorer converts each broadcast into a grid of thumbnails, one every 4 seconds, displayed in a grid six frames wide and scrolling vertically through the entire program, making it possible to skim an hour-long broadcast in a matter of seconds. Clicking on any thumbnail plays a brief 30 second clip of the broadcast at that point, making it trivial to rapidly triage a broadcast for key moments. The underlying thumbnails can even be downloaded as a ZIP file to enable non-consumptive computational analysis, from OCR to augmented search.

Machines today can catalog the basic objects and activities they see in video and generate transcripts of their spoken and written words, but the ability to contextualize and understand the meaning of all that coverage remains a uniquely human capability. No person could watch the entirety of the Archive’s 6.6 million hours of broadcasts, yet even just the 46,000 broadcasts in this new collection would be difficult for a single researcher to watch or even fast forward through in their entirety. Television’s linear format means coverage has historically been consumed a single moment at a time like a flashlight in a darkened warehouse. In contrast, this new interface makes it possible to see an entire broadcast all at once in a single display, making television news “skimmable” for the first time.

The Visual Explorer and this new research collection of Belarusian, Russian and Ukrainian television news coverage represent early glimpses into a new initiative reimagining how memory institutions like the Archive can make their vast television news archives more accessible to scholars, journalists and informed citizens. Beneath the simple and intuitive interface lies an immensely complex and highly experimental set of workflows prototyping both an entirely new scholarly and journalistic interface to television news and entirely new approaches to rapidly archiving international television coverage of global events.

Over the coming weeks, additional channels from the TV News Archive will become available through the new Visual Explorer, as well as a variety of experiments with the new lenses that tools like automatic transcription and translation can offer in helping journalists and scholars make sense of such vast realtime archives.

Get Started With The Television News Visual Explorer!

About Kalev Leetaru

For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.

Library as Laboratory Recap: Opening Television News for Deep Analysis and New Forms of Interactive Search

Watching a single episode of the evening news can be informative. Tracking trends in broadcasts over time can be fascinating. 

The Internet Archive has preserved nearly 3 million hours of U.S. local and national TV news shows and made the material open to researchers for exploration and non-consumptive computational analysis. At a webinar April 13, TV News Archive experts shared how they’ve curated the massive collection and leveraged technology so scholars, journalists and the general public can make use of the vast repository.

Roger Macdonald, founder of the TV News Archive, and Kalev Leetaru, collaborating data scientist and GDELT Project founder, spoke at the session. Chris Freeland, director of Open Libraries, served as moderator and Internet Archive founder Brewster Kahle offered opening remarks.

Watch video

“Growing up in the television age, [television] is such an influential, important medium—persuasive, yet not something you can really quote,” Kahle said. “We wanted to make it so that you could quote, compare and contrast.” 

The Internet Archive built on the work of the Vanderbilt Television Archive, and the UCLA Library Broadcast NewsScape to give the public a broader “macro view,” said Kahle. The trends seen in at-scale computational analyses of news broadcasts can be used to understand the bigger picture of what is happening in the world and the lenses through which we see the world around us.

In 2012, with donations from individuals and philanthropies such as the Knight Foundation, the Archive started repurposing the closed captioning data stream required of all U.S. broadcasters into a search index. “This simple approach transformed the antiquated experience of searching for specific topics within video,” said Macdonald, who helped lead the effort. “The TV caption search enabled discovery at internet speed with the ability to simultaneously search millions of programs and have your results plotted over time, down to individual broadcasters and programs.”

“[Television] is such an influential, important medium—persuasive, yet not something you can really quote. We wanted to make it so that you could quote, compare and contrast.”

Brewster Kahle, Internet Archive

Scholars and journalists were quick to embrace this opportunity, but the team kept experimenting with deeper indexing. Techniques like audio fingerprinting, Optical Character Recognition (OCR) and Computer Vision made it possible to capture visual elements of the news and improve access, Macdonald said. 

Sub-collections of political leaders’ speeches and interviews have been created, including an extensive Donald Trump Archive. Some of the Archive’s most productive advances have come from collaborating with outsiders who have requested more access to the collection than is available through the public interface, Macdonald said. With appropriate restrictions to maintain respect for broadcasters and distribution platforms, the Archive has worked with select scientists and journalists as partners to use data in the collection for more complex analyses.

Treating television as data

Treating television news as data creates vast opportunities for computational analysis, said Leetaru. Researchers can track word frequency use in the news and how that has changed over time.  For instance, it’s possible to look at mentions of COVID-related words across selected news programs and see when it surged and leveled off with each wave before plummeting downward, as shown in the graph below.

The newly computed metadata can help provide context and assist with fact checking efforts to combat misinformation. It can allow researchers to map the geography of television news—how certain parts of the world are covered more than others, Leetaru said. Through the collections, researchers have explored  which presidential tweets challenging election integrity got the most exposure on the news.  OCR of every frame has been used to create models of how to identify names of every “Dr.” depicted on cable TV after the outbreak of COVID-19 and calculate air time devoted to the medical doctors commenting on one of the virus variants.  Reverse image lookup of images in TV news has been used to determine the source of photos and videos.  Visual entity search tools can even reveal the increasing prevalence of bookshelves as backdrops during home interviews in the pandemic, as well as appearances of books by specific authors or titles. Open datasets of computed TV news metadata are available that include all visual entity and OCR detections, 10-minute interval captioning ngrams and second by second inventories of each broadcast cataloging whether it was “News” programming, “Advertising” programming or “Uncaptioned” (in the case of television news this is almost exclusively advertising).

From television news to digitized books and periodicals, dozens of projects rely on the collections available at archive.org for computational and bibliographic research across a large digital corpus. Data scientists or anyone with questions about the TV News Archives, can contact info@archive.org.

Up Next

This webinar was the fourth a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The next will be about Analyzing Biodiversity Literature at Scale on April 27. Register here.

TV News Record: Six takeaways from adding Hillary Clinton, Barack Obama & more to Face-o-Matic facial detection

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week we release new data generated by our Face-o-Matic tool, developed in collaboration with Matroid, adding to our list of public figures detected by facial-recognition on major cable news stations on the  TV News Archive.

In addition to President Donald Trump and the four congressional leaders, the expanded list now includes most former living presidents and recent major party presidential contenders, including Hillary Clinton and Barack Obama. (For the full list of public officials tracked, as well as methodical notes, see bottom of the post.)

Detecting faces on TV news and turning them into data provides a new quantitative path for journalists and researchers to explore how news is presented to the public and compare and contrast editorial choices that individual networks make. This new measure shows us the duration that politicians’ faces are actually shown on screen, whether it’s a clip of that person speaking, muted footage, or a still photo shown in the background to illustrate a point.

Adding to the Television Explorer, fueled by closed captions and our Third Eye chyron reading tool, a wealth of information is now available to analyze. (See the TV News Archive home page for examples of visualizations created by journalists and researchers using TV News Archive data.)

Here are six quick takeaways using Face-o-Matic for an analysis covering roughly six months, from November 2017 through May 2018, looking at four cable TV news networks: BBC News, CNN, Fox News, and MSNBC.

Download Face-o-Matic data to explore your own research questions.

1. Trump trumps every other political figure in face-time on cable TV news, all the time, every day, in every way, on every network and program.

As we’ve seen in past analyses with Face-o-Matic data, President Donald Trump is the major political star on cable TV news as compared to other top political figures examined. To put this in perspective: over a six month period stretching from November 2017 to May 2018, the president’s face appeared on TV cable news the equivalent of a full 13.5 days, counting every second of face-time. The next closest political figure we analyzed was House Speaker Paul Ryan, R., Wis., whose visage appeared the equivalent of one day.


  1. After Trump, GOP leaders in Congress are the most popular faces on TV cable news.

The two GOP leaders in Congress, Ryan and Senate Majority Leader Mitch McConnell, R., Ky. are the next most popular faces on TV news cable news networks. Between the two, Ryan ranks first on the TV news cable networks we examined: BBC News, CNN, Fox News, and MSNBC.  McConnell is the next most shown face on these networks, with the exception of BBC News.

Link to interactive version of above chart, where view can be changed to exclude specific politicians.

  1. Hillary Clinton and Barack Obama figure prominently on Fox News.

Fox News airs proportionately more images of failed presidential candidate 2016 Hillary Clinton and former president Barack Obama than other cable TV news networks. Fox News showed Clinton’s face 7.6 times more than CNN did, and Obama’s 3.6 times more. Fox News also showed Clinton 3.6 times more than MSNBC, and Obama, 2.3 times more.


  1. Hannity shows more Hillary Clinton face-time than any other top-rated Fox News show.

Not only does the Fox News “Hannity” program air more images of Hillary Clinton proportionately than any other top rated Fox News show, with just one exception, it is the Fox News show that shows her face more than current congressional leaders–Ryan, McConnell, Schumer or Pelosi. “Hannity” also shows more images of Obama than other top rated Fox News shows.

Link to interactive version of above chart, where view can be changed to exclude specific politicians.

  1. Ryan face-time spikes on news shows aired during morning hours.

All three U.S. cable news networks examined showed high rates of face-time for Ryan on shows airing during morning hours, ranging from 9 am to 11 am. This may be linked to his leadership role in Congress and that morning hours are prime for large announcements. For example, on Fox News’ “America’s Newsroom” and “Happening Now” show spikes of face-time for Ryan. On MSNBC, “Live with Hallie Jackson” and “Live with Velshi and Ruhle” show high rates of images for Ryan. And on CNN, “At This Hour with Kate Bolduan” shows high rates of Ryan as well. 

Links to interactive charts for top-rated news shows; view can be adjusted to exclude specific politicians. The source for top-rated shows is shows with 2017 top viewership by Nielsen.

Top-rated Fox News shows.

Top-rated MSNBC news shows.

Top-rated CNN shows.

  1. BBC News just isn’t that into us.

BBC News provides a window into how news is presented to a major foreign audience. Like U.S. cable news networks, BBC News features more face-time for Trump than other political figures examined. Ryan ranks a distant second. Overall, BBC News, however, shows much lower rates of images of U.S. political figures than U.S. cable news shows do.

Link to interactive version of above chart, where view can be changed to exclude specific politicians.

Methodological notes

The Face-o-Matic data set, available for download on the Internet Archive, uses facial recognition to track the faces of prominent public officials as they appear on major cable TV news networks: BBC News, CNN, Fox News, and MSNBC. The list of public officials tracked, along with the date that detection began, is here:

President & current congressional leaders

President Donald Trump, 7/13/17

Speaker Paul Ryan, R., Wis., 7/13/17

House Minority Leader Nancy Pelosi, D., Calif., 7/13/17

Senate Majority Leader Mitch McConnell, R., Ky., 7/13/17

Senate Minority Leader, Chuck Schumer, D., N.Y., 7/13/17

Former living presidents and recent major party presidential candidates*

George H.W. Bush, 10/5/17

George W. Bush, 11/1/17

Jimmy Carter, 10/21/17

Bill Clinton, 9/12/17

Hillary Clinton, 9/12/17

Barack Obama, 7/13/17

Mitt Romney, 10/4/17

*Note: Our data set does not include Sen. John McCain, R., Ariz., who ran for president opposite Obama in 2008. Sample testing of facial detection for the senator revealed a somewhat frequent rate of false positives  – instances where the identified face was not the senator’s, but rather one of a number of lookalikes. While we make no claim that all of the detections in the Face-o-matic data set are error free, we did test faces to minimize these. Please be sure to notify us if you find errors in the data.

TV News Record: Recognizing Trump’s voice on TV, NYT & Axios coverage, + Ryan fact-check

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week we explore cutting edge work by Joostware that moves us closer to solving the challenge of searching vast archives of video by speaker, note the use of TV News Archive data by The New York Times and Axios, and share a fact-checked interview by exiting House Speaker Paul Ryan about his legacy.

Joostware trained model to recognize Trump’s voice

What if you wanted to search the TV News Archive to find every instance where President Donald Trump is talking?

That’s the research question that the San Francisco-based firm Joostware concentrated on for its Who Said What project, which won a $50,000 prototype grant from the John S. and James L. Knight Foundation. Last week Joostware’s founder, Delip Rao, presented the project’s progress at a gathering in Austin, Texas. (The Internet Archive’s own Dan Schultz, in his Bad Idea Factory incarnation, also presented on Contextubot, which we recently profiled here.)

Audio and video today is viewed as an opaque object and it’s meant for linear consumption,” Rao said in his presentation. “But truly any audio and video especially in the context of news has a lot of structure to it. There are speakers of interest, and these speakers take turns, and then within each turn something was communicated. So our goal is to identify these speakers who are of interest and also the content that was spoken in that turn and indexing that.

Anyone can search the TV News Archive already via closed captions at the Internet Archive or via Television Explorer. Our experiments with facial detection and chyron extraction are another way to find and analyze news clips. But searching a video archive by “speaker id” – finding all the video where a person is actually talking – is a tough technical challenge. Our Trump Archive and congressional, executive branch, and administration archives are all manually curated video collections designed to demonstrate what it would be like to have automated speaker id search.

Joostware researchers have made progress toward this goal. They took material from the Trump Archive, and used it to train a model that recognizes the president’s voice, by using properties of the voice signal. They created a prototype search software that is more than 95% accurate on a human annotated dataset in returning video clips where Trump is actually speaking.

What’s next? With more resources, Joostware hopes to give this technology back to the Internet Archive to improve search within the TV News Archive. And Rao and others continue to work within the larger community of researchers working to crack the code of video to help fact-checkers and journalists hold power accountable.

No one is talking about tax law on cable TV news

Jim Tankersley and Karl Russell, reporters for The New York Times, used TV News Archive captions via GDELT’s Television Explorer to demonstrate how little coverage there is on cable TV news for the newly minted $2.5 trillion tax overhaul:

“Consider one of Mr. Trump’s preferred yardsticks: cable news coverage. Throughout the fall, as Republicans rushed their tax bill through Congress in two breakneck months, CNN, Fox News and MSNBC routinely devoted 10 percent of their daily coverage to tax issues, according to data from the Gdelt Project. Interest spiked as Mr. Trump signed the bill in late December, and then it fell precipitously.”

“Stormy Daniels wins TV war: overshadows taxes, health care”

For Axios, Caitlin Owens used TV New Archive data with GDELT’s Television Explorer to shed light on whether the TV networks are paying attention the priorities of the political parties: “Tax cuts and the Affordable Care Act are supposed to be big issues in the midterm elections, but both have faded from the attention of the cable news networks now that they’re no longer front and center in Congress.” Owens thinks it matters because “Democrats are campaigning hard on the GOP’s unpopular attempt to repeal and replace the ACA, and Republicans are pushing the financial benefits of their tax law.”


Fact-Check: Corporate tax revenues are rising (misleading)

House Speaker Paul Ryan, R., Wisc., announced last week he would not be seeking reelection, prompting television interviews that reflected on his legacy. In a “Meet the Press” interview Sunday on NBC, host Chuck Todd asked Ryan to respond to a statement by Sen. Bob Corker, R., Tenn.:

“’This Congress and this administration likely will go down as one of the most fiscally irresponsible administrations and Congresses that we ever had.’ And he’s referring to the fact that this tax bill spiked the deficit. It’s higher than even what was projected.” Ryan responded “That was going to happen. The baby boomers’ retiring was going to do that. These deficit trillion-dollar projections have been out there for a long, long time. Why? Because of mandatory spending, which we call entitlements. Discretionary spending under the CBO baseline is going up about $300 billion over the next 10 years. Tax revenues are still rising. Income tax revenues are still rising. Corporate income tax revenues. Corporate rate got dropped 40 percent, still rising.”

Eugene Kiely reported for FactCheck.org that “Ryan is right that $1 trillion deficit projections ‘have been out there for a long, long time…But corporate tax revenues are down for the first six months of the fiscal year, and they are projected to be less over the next 10 years than they otherwise would have been because of the law.”

Salvador Rizzo and Meg Kelly reported for The Washington Post’s Fact Checker, “The baby-boom generation is retiring, and Congress at best has taken only modest steps to rein in spending on old-age programs, largely because any serious effort is met with hostility and often-misleading attack ads…But the revenue side of the picture cannot be ignored.” “Congress has not been able to grapple with the spending — and  keeps taking steps to undermine the revenue flow as well.”

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

Audio / Video player updated – to jwplayer v8.2

We updated our audio/video (and TV) 3rd party JS-based player from v6.8 to v8.2 today.

This was updated with some code to have the same feature set as before, as well as new:

  • much nicer cosmetic/look updates
  • nice “rewind 10 seconds” button
  • controls are now in an updated control bar
  • (video) ‘Related Items’ now uses the same (better) recommendations from the bottom of an archive.org /details/ page
  • Airplay (Safari) and Chromecast basic casting controls in player
  • playback speed rate control now easier to use / set
  • playback keyboard control with SPACE and left , right and up, down keys
  • (video) Web VTT (captions) has much better user interface and display
  • flash is now only used to play audio/video if html5 doesnt work (flash does not do layout or controls now)

Here’s some before / after screenshots:

TV News Record: Caption analyses, plus fact-checks on wall & immigrants

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week we bring you analyses of cable TV news coverage and fact-checks of recent statements by President Donald Trump on immigration and his proposed wall on the border with Mexico.

Vox & Post turn TV news captions into media analysis

Vox’s Alvin Chang and The Washington Post’s Philip Bump continue to turn TV News Archive caption data, via Television Explorer, into analyses of current news. Chang analyzes cable TV network coverage of the March for Our Lives, an anti-gun violence demonstration, reporting that on Fox News, “There was a massive spike in mentions of the “Second Amendment” or “Constitution” during the peak of the march, and most of those mentions came from pundits and guests on the network.”

Source: Vox

Bump’s piece examines mentions of Hillary Clinton on cable TV news networks compared to those of Stormy Daniels, the adult entertainer involved in a legal dispute with the president. He finds that Fox News mentions Clinton the most, while CNN features more coverage of Daniels.

Source: The Washington Post

Fact-Check: We’ve started building the wall (Mostly False/Three Pinocchios)

During a press conference with the presidents of Estonia, Latvia, and Lithuania, President Donald Trump talked about his proposed border wall between the United States and Mexico: “We have to have strong borders. We need the wall. We’ve started building the wall, as you know, we have a $1.6 billion toward building the wall and fixing existing wall that’s falling down, it was never appropriate in the first place.”

The funding the president references comes from a spending bill recently passed by Congress. The omnibus “bill included $1.6 billion for some projects at the border, but none of that can be used toward the border wall promised during the presidential campaign.” For PolitiFact, Miriam Valverde rates the president’s claim “Mostly False.”

At The Washington Post’s Fact-Checker, Glenn Kessler gives the same claim “three Pinocchios”:

The White House failed miserably to achieve its objectives on funding for a border wall, receiving relative peanuts. It sought $25 billion, but ended up with just 5 percent of that. Moreover, the money came with strings attached so that it could only be used for fencing, not the “great” and “beautiful wall” promised by Trump.

In Orwellian fashion, fences have now become walls. Even then, the president has only secured enough money to pay for one-tenth of the new fence/wall he has sought.


Fact-Check: Caravans of people are coming to cross the U.S.-Mexico border (Half True)

Just after Fox News aired a segment on a caravan of people from Central America making its way through Mexico toward the United States, the president wrote on Twitter:

“Half True,” writes W. Gardner Shelby for PolitiFact: “President Trump tweeted that caravans of immigrants are coming to the Mexico-U.S. border… We confirmed that a caravan of 1,200 to 1,500 people from Central America–not caravans–was in southern Mexico, about 900 miles from the Rio Grande, when Trump tweeted. Also, accounts vary on whether all participants are bound to enter the U.S. An organizer estimated that most of the people intend to remain in Mexico.”

Reporting for FactCheck.org, Robert FarleyEugene Kiely and Lori Robertson write “Trump’s messages included muddled and inaccurate claims.” They summarize with the following bullet points:

  • Contrary to Trump’s assertion, there is no “liberal (Democrat)” law requiring the “Catch & Release” of people caught illegally crossing the border. There are court cases and laws that require some unaccompanied children, families and asylum-seekers to be released in the U.S., pending an immigration hearing. But it’s a stretch to blame those entirely on Democrats.

  • Trump said “big flows of people” are illegally entering the U.S. from Mexico “to take advantage of DACA.” In fact, current border-crossers are not eligible for the Deferred Action for Childhood Arrivals program.

  • Trump said that “caravans” of people were coming to the Southwest border and that Mexico “must stop them.” The caravan, a yearly demonstration, was organized by the activist group Pueblo Sin Fronteras, which says the people walking in the caravan have “a lot of intentions,” with some wanting to stay in Mexico. The caravan is now in southern Mexico, more than 800 miles from the U.S. border.

    Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

TV News Record: How cable TV news reports news, fact-checks on banking, trade, and public lands

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week, we present a Washington Post analysis of coverage of an alleged affair by the president; a Vox piece examining coverage of Andrew McCabe, the former deputy FBI director; and The Toronto Star’s use of a salient clip to illustrate a point about a presidential appointment. We also show fact-checks from FactCheck.org, PolitiFact, and The Washington Post’s Fact-Checker on claims related to banking, public lands, and trade policy.

Chicken-egg question on cable news coverage of alleged affair

CNN and MSNBC hosts and guests are talking a lot more about the alleged past affair between President Donald Trump and Stormy Daniels than Fox News is, according to Philip Bump’s latest analysis for The Washington Post using TV News Archive data via Television Explorer. 

Bump used the analysis as context to dig into a poll released by Suffolk University earlier this month: “One-fifth of Americans said that Fox News was the news or commentary source they trusted the most, a group that was primarily made up of Republicans… There’s a chicken-egg question here. Does Fox give the Stormy Daniels story a light touch because its audience is largely supportive of Trump or is Fox’s audience largely supportive of Trump because of the coverage they see on Fox? Or is it both?”

Did Fox News reporting contribute to perception of fired FBI official?

Vox’s Alvin Chang argues a connection between the firing of Andrew McCabe, former FBI deputy director, to a narrative built up over the course of months by Fox News. Using TV News Archive data via Television Explorer, Chang reports that “long before he was fired, Fox News… constantly referred to McCabe as the quintessential example of the FBI’s corruption and anti-Trump bias. They hinted that he was plotting several schemes against Trump during the election, leaking information to the press, and was bought and paid for by Hillary Clinton and Democrats.” This, he writes, allowed FOX News viewers to think it made “perfect sense for Attorney General Jeff Sessions (perhaps directed by Trump) to fire McCabe.” Chang goes on to warn, “This alternate reality is being fed into the president’s mind.”


What new presidential economic pick had to say about Canadian PM

The Toronto Star embedded a TV news clip in a piece on Trump’s pick to replace his economic advisor. Larry Kudlow, who is taking over from Gary Cohn as economic advisor, had said of U.S. trade policy:  “NAFTA is the key. And unfortunately we’re going after a major NAFTA ally, and perhaps America’s greatest ally, namely Canada. Even with this left-wing crazy guy Trudeau, they’re still our pals. They’re still our pals. Why are we going after them?” The clip has been viewed more than 112,000 times and counting.


Fact-Check: Senate banking bill a big win for Wall Street (Yes and No)

In a floor speech, Sen. Elizabeth Warren, D., Mass., said of the latest proposal to make changes to Dodd-Frank, “This bill is about goosing the bottom line and executive bonuses at the banks that make up the top one half of 1 percent of banks in this country by size. The very tippy-top.”

Manuela Tobias reported for PolitiFact: “The bill raises the bar of what is considered a big bank five-fold, which effectively relaxes the standards for large regional banks. Experts warn this also could open a door for bigger Wall Street bank giveaways.

The bill also has a few provisions affecting banks above $250 billion in assets. However, the effects would largely depend on the Federal Reserve’s interpretation of the law. The biggest banks might be able to get relaxed regulations, but then again, they might not.”


Fact-Check: Public lands proposal largest in history (False)

In a Senate hearing on the budget for the Dept. of the Interior, Interior Secretary Ryan Zinke said the president’s proposal “is the largest investment in our public lands infrastructure in our nation’s history. Let me repeat that, this is the largest investment in our public lands infrastructure in the history of this country.”

PolitiFact rates the claim false. Louis Jacobson reported: “It’s far from assured that the maximum figure of $18 billion in the proposal will ever be reached if enacted. Beyond that, though, Roosevelt’s $3 billion investment in the Civilian Conservation Corps would amount to $53 billion today, and it accounted for vastly more than the Trump proposal as a percentage of federal spending at the time.”

Fact-Check: U.S. has trade deficit with Canada (Four Pinocchios)

After a private meeting with Canadian Prime Minister Justin Trudeau, Trump defended his view about U.S.-Canada trade, tweeting, “We do have a Trade Deficit with Canada, as we do with almost all countries (some of them massive). P.M. Justin Trudeau of Canada, a very good guy, doesn’t like saying that Canada has a Surplus vs. the U.S.(negotiating), but they do … they almost all do … and that’s how I know!”

Glenn Kessler reports for The Washington Post’s Fact Checker that the president is not including services in his analysis of the trade relationship with Canada. He adds: “The president frequently suggests the United States is losing money with these deficits, but countries do not ‘lose’ money on trade deficits. A trade deficit simply means that people in one country are buying more goods from another country than people in the second country are buying from the first.” Kessler gives the claim four Pinocchios.

Eugene Kiely reports for FactCheck.org that the president’s claim that figures giving the U.S. a trade surplus with Canada are not including timber and energy is “not accurate. The Census Bureau, which is within the U.S. Department of Commerce, said its trade figures do include timber and energy and referred us to two publications that show that the agency does include timber and energy for imports and exports.”

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

TV News Record: Glorious ContextuBot making progress

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week, we present an update on the video context project Glorious Contextubot, two recent news reports that use TV News Archive data, and fact-checks of TV appearances by the DNC chair and the president.

Fueled by TV News Archive, the Glorious Contextubot is making progress

Let’s say a friend posts a YouTube video link to a politician’s statement on Facebook, but you have a feeling it’s taken out of context. The clip is tightly edited, and you’re curious to see the rest of the statement. Was the politician answering a question? Was the statement part of a larger discussion?

Enter the Glorious ContextuBot. For the past nine months, veteran media innovators Mark Boas and Laurian Gridnoc of Hyperaudio and Trint, led by the Internet Archive’s own Dan Schultz, senior creative technologist of the TV News Archive, have been building a prototype of the Contextubot, fueled by the TV News Archive. The Contextubot is one of 20 winners of the Knight Prototype Fund’s $1 million challenge, announced in June 2017.

With the ContextuBot, it’s possible to use video to search video. Just paste a link to a video snippet into an interface and then pull up a transcript that puts things in context of what came before and after. Built from the Duplitron 5000, an audio fingerprinting tool Schultz developed to track political ads for the Political TV Ad Archive, the ContextuBot demonstrates how open technology built by the TV team can be repurposed and improved by motivated technologists – one that’s already captured the attention of the University of Iowa Informatics department, which is considering adopting it for researchers.

To date, the team has:

  • Made it easier to scale audio search. It’s now possible to scale up and down audio fingerprint finding within a corpus of TV news by adding or removing individual computers or compute clusters.  Our Duplitron would take eight hours to search a year of television, but the ContextuBot makes it much easier to spread that computing across multiple machines.
  • Built a demo interface. You can see a clip in context with a transcript of what comes before and after. Click on a word in the transcript, and you’ll be able to jump to that point in the video stream.
  • Begun to explore a “comic view.”  The team’s biggest goal is to explore ways to communicate the essence of a longer clip in a short amount of time.  One approach: converting video into a comic. This would set the groundwork for automatically extracting (and rendering) a storyboard from a video clip.

The team will present the prototype shortly before the International Symposium of Online Journalism conference in Austin in April 2018.


The Washington Post finds stark differences in cable TV coverage of Jared Kushner

After a heavy news week of developments related to Jared Kushner, President Trump’s son-in-law and a senior adviser, The Washington Post’s Philip Bump dug into the TV News Archive and found that while MSNBC and CNN had numerous mentions of Kushner’s name, Fox News had just ten.


The Washington Post examines coverage of Parkland shooting

Rachel Siegal used the TV News Archive to compare coverage of the Parkland shooting with several other high-profile shootings, and found that this time cable TV attention spans are a bit longer.


Fact-Check: the DNC raised record-making amounts in January. (Two Pinocchios)

In a recent interview, Democratic National Committee Chairman Tom Perez said, “We raised more money in January… of 2018 than any January in our history. So if the question is, ‘Do we have enough money to implement our game plan?’ Absolutely.”

This claim earned “two Pinocchios” from Salvador Rizzo, reporting for The Washington Post’s Fact Checker:  the “DNC raised $6 million in January 2018… That was below what it raised in January 2014 ($6.6 million), January 2012 ($13.2 million), January 2011 ($7.1 million) and January 2010 ($9.1 million).”  A spokesman for Perez “backed off from those comments when we reached out with FEC figures that told a different story.”


Fact-Check: Congressman fears NRA downgrade for gun legislation (misleading)

In a meeting with lawmakers to talk gun legislation, President Donald Trump suggested that an age requirement increase for purchasing guns was not included in a 2013 reform effort by Rep. Pat Toomey, R., Pa., “because you’re afraid of the NRA, right?”

Reporting by FactCheck.org’s Eugene Kiley, Lori Robertson, and Robert Farley calls this statement misleading.  “As a result of the legislation, Toomey’s rating with the NRA dropped from an “A” to a “C,” and the endorsements and contributions Toomey got from the NRA in previous House and Senate races disappeared. In 2016, the NRA stayed out of Toomey’s Senate race altogether; his Democratic opponent, Katie McGinty, had an “F” grade from the NRA. In that race, Toomey got the endorsement of a gun-control group, Everytown for Gun Safety, which ran ads supporting him.”


Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

Archive video now supports WebVTT for captions

We now support .vtt files (Web Video Text Tracks) in addition to .srt (SubRip) (.srt we have supported for years) files for captioning your videos.

It’s as simple as uploading a “parallel filename” to your video file(s).

Examples:

  • myvid.mp4
  • myvid.srt
  • myvid.vtt

Multi-lang support:

  • myvid.webm
  • myvid.en.vtt
  • myvid.en.srt
  • myvid.es.vtt

Here’s a nice example item:
https://archive.org/details/cruz-test

VTT with caption picker (and upcoming A/V player too!)

(We will have an updated A/V player with a better “picker” for so many language tracks in days, have no fear 😎

Enjoy!