McConnell, Schumer, Ryan, Pelosi fact-checked clips featured in new TV News Archive collections

Today the Internet Archive’s TV News Archive unveils growing TV news collections focused on congressional leadership and top Trump administration officials, expanding our experimental Trump Archive to other newsworthy government officials. Together, all of the collections include links to more than 1,200 fact-checked clips–and counting–by our national fact-checking partners, FactCheck.org, PolitiFact, and The Washington Post‘s Fact Checker.

These experimental video clip collections, which contain more than 3,500 hours of video, include archives focused on Senate Majority Leader Mitch McConnell, R., Ky.; Sen. Minority Leader Charles (“Chuck”) Schumer, D., N.Y.; House Speaker Paul Ryan, R., Wis.; and House Minority Leader, Nancy Pelosi, D., Calif., as well as top Trump officials past and present such as Secretary of State Rex Tillerson and former White House Press Secretary Sean Spicer.

Download a csv of fact-checked video statements or see all the fact-checked clips.

Visit the U.S. Congress archive.

Visit the Executive Branch archive.

Visit the Trump Archive.

We created these largely hand-curated collections as part of our experimentation in demonstrating how Artificial Intelligence (AI) algorithms could be harnessed to create useful, ethical, public resources for journalists and researchers in the months and years ahead. Other experiments include:

  • the Political TV Ad Archive, which tracked airings of political ads in the 2016 elections by using the Duplitron, an open source audio fingerprinting tool;
  • the Trump Archive, launched in January;
  • Face-O-Matic, an experimental Slack app created in partnership with Matroid that uses facial detection to find congressional leaders’ faces on TV news. Face-O-Matic has quickly proved its mettle by helping our researchers find clips suitable for inclusion in the U.S. Congress Archive; future plans include making data available in CSV and JSON formats.
  • in the works: TV Architect Tracey Jaquith is experimenting with detection of text in the chyrons that run on the bottom third of cable TV news channels. Stay tuned.

Red check mark shows there’s a fact-check in this footage featuring House Minority Leader Nancy Pelosi, D., Calif. Follow the link below the clip to see the fact-check, in this case by The Washington Post’s Fact Checker.

At present, our vast collection of TV news –1.4 million shows collected since 2009–is searchable via closed-captioning. But closed captions, while helpful, can’t help a user find clips of a particular person speaking; instead, when searching a name such as “Charles Schumer” it returns a mix of news stories about the congressman, as well as clips where he speaks at news conferences, on the Senate floor, or in other venues.

We are working towards a future in which AI enrichment of video metadata will more precisely identify for fact-checkers and researchers when a public official is actually speaking, or some other televised record of that official making an assertion of fact. This could include, for example, camera footage of tweets.

Such clips become a part of the historical record, with online links that don’t rot, a central part of the Internet Archive’s mission to preserve knowledge. And they can help fact-checkers decide where to concentrate their efforts, by finding on-the-record assertions of fact by public officials. Finally, these collections could prove useful for teachers, documentary makers, or anybody interested in exploring on-the-record statements by public officials.

For example, here are two dueling views of the minimum wage, brought to the public by McConnell and Schumer.

In this interview on Fox News in January 2014, McConnell says, “The minimum wage is mostly an entry-level wage for young people.” PolitiFact’s Steve Contorno rated this claim as “mostly true.” While government statistics do show that half of the people making the minimum wage are young, 20 percent are in their late 20s or early 30s and another 30 percent are 35 or older. Contorno also points out that it’s a stretch to call these jobs “entry-level,” but rather are “in the food or retail businesses or similar industries with little hope for career advancement.”

Schumer presents a different assertion on the minimum wage, saying on “Morning Joe” in May 2014 that with a rate of $10.10/hour “you get out of poverty.” PolitiFact’s Louis Jacobson rated this claim as “half true”: “Since the households helped by the $10.10 wage account for 46 percent of all impoverished households, Schumer is right slightly less than half the time.”

These new collections reflect the hard work of many at the Internet Archive, including Robin Chin, Katie Dahl, Tracey Jaquith, Roger MacDonald, Dan Schultz, and Nancy Watzman.

As we move forward, we would love to hear from you. Contact us with questions, ideas, and concerns at tvnews@archive.org. And to keep up-to-date with our experiments, sign up for our weekly TV News Archive newsletter.

 

Posted in News, Television Archive | Tagged , , , , , , , , , , , , , , , , , , , , , , | 3 Comments

Canadian Library Consortia OCUL and COPPUL Join Forces with Archive-It to Expand Web Archiving in Canada

The Council of Prairie and Pacific University Libraries (COPPUL) and the Ontario Council of University Libraries (OCUL) have joined forces in a multi-consortial offering of Archive-It, the web archiving service of the Internet Archive. Working together, COPPUL and OCUL are considering ways that they can significantly expand web archiving in Canada.

A coordinated subscription to Archive-It builds on the efforts of Canadian universities that have developed web archiving programs over the years, and the past work of Archive-It with both COPPUL and OCUL members.  With 12 COPPUL members and 12 OCUL members (more than half the total membership) now subscribing to Archive-It, there is an opportunity to build a foundation for further collaboration supporting research services and other digital library initiatives. In addition, participation by so many libraries helps lower the barrier of entry for additional member institutions to join in web archiving efforts across Canada.

“OCUL is very pleased to be able to offer Archive-It to our members,” said Ken Hernden, University Librarian at Algoma University and OCUL Chair. “Preservation of information and research is an important aspect of what libraries do to benefit scholars and communities. Preserving information for the future was challenging in a paper-and-print environment. It has become even more so in the digital information environment. We hope that enabling access to this tool will help build capacity for web archiving across Ontario, and beyond.”

“Tools like Archive-It enable libraries and archives of all sizes to build news kinds of collections to support their communities in an environment where more and more of our cultural memory has moved online. We’re absolutely thrilled to be working with our OCUL colleagues in this critically important area,” said Corey Davis, COPPUL Digital Preservation Network Coordinator.

“Archive-It is excited to ramp up its support for web archiving in Canada. The joint subscription is a strategic and cost-effective way to expand web archiving among Canadian universities and to encourage participation from smaller universities who may not have felt they had the institutional resources to develop a web archiving program without the support of the consortiums.” said Lori Donovan, Senior Program Manager for Archive-It.

OCUL is a consortium of Ontario’s 21 university libraries. OCUL provides a range of services to its members, including collection purchasing and a shared digital information infrastructure, in order to support to support high quality education and research in Ontario’s universities. In 2017, OCUL commemorates its 50th anniversary.

Working together, COPPUL members leverage their collective expertise, resources, and influence, increasing capacity and infrastructure, to enhance learning, teaching, student experiences and research at our institutions. The consortium comprises 22 university libraries located in Manitoba, Saskatchewan, Alberta and British Columbia, as well as 15 affiliate members across Canada. First deployed in 2006, Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of web-published digital content.

Additionally, the recently created Canadian Web Archiving Coalition (CWAC) will help build a community of practice for Canadian organizations engaging in web archiving and create a network for collaboration, support, and knowledge sharing. Under the auspices of Canadian Association of Research Libraries (CARL) and in collaboration with Library and Archives Canada (LAC), the CWAC plans to hold an inaugural meeting in conjunction with the Internet Preservation Coalition General Assembly this September at LAC’s Preservation Centre in Gatineau, QC.  For more information about the CWAC, including how to join, please contact corey@coppul.ca.

For more information on the consortial subscription, contact carol@coppul.ca or jacqueline.cato@ocul.on.ca or lori@archive.org.

Posted in Announcements, News | 1 Comment

Using Kakadu JPEG2000 Compression to Meet FADGI Standards

The Internet Archive is grateful to the folks at Kakadu Software for contributing to Universal Access to Knowledge by providing the world’s leading implementation of the JPEG2000 standard, used in the Archive’s image processing systems.

Here at the Archive, we digitize over a thousand books a day. JPEG2000, an image coding system that uses compression techniques based on wavelet technology, is a preferred file format for storing these images efficiently, while also providing advantages for presentation quality and metadata richness. The Library of Congress has documented its adoption of the JPEG2000 file format for a number of digitization projects, including its text collections on archive.org.

Recently we started using their SDK to apply some color corrections to the images coming from our cameras. This has helped us achieve FADGI standards in our work with the Library of Congress.

Thank you, Kakadu, for helping make it possible for millions of books to be digitized, stored, and made available with high quality on archive.org!

If you are interested in finding out more about Kakadu Software’s powerful software toolkit for JPEG2000 developers, visit kakadusoftware.com or email info@kakadusoftware.com.

Posted in Technical | 1 Comment

TV News Record: McCain returns to vote, Spicer departs

A weekly round up on what’s happening and what we’re seeing at the TV News Archive by Katie Dahl and Nancy Watzman. Additional research by Robin Chin.

Last week, Sean Spicer left his White House post and Anthony Scaramucci, the new communications director, made his mark; Sen. John McCain, R., Ariz., returned to the Senate floor to debate–and cast a deciding vote on–health care reform; and fact-checkers examined claims about Trump’s off-the-record meeting with Russian President Vladimir Putin, and more.

McCain shows up in D.C. – and on Face-O-Matic

Last week, after we launched Face-O-Matic, an experimental Slack app that recognizes the faces of top public officials when they appear on TV news, we received a request from an Arizona-based journalism organization to track Sen. John McCain, R., Ariz.. Soon after we added the senator’s visage to Face-O-Matic, we started getting the alerts.

News anchors talked about how McCain’s possible absence because of his brain cancer diagnosis could affect upcoming debates and votes on health care.

Reporters gave background on how the Senate has dealt with absences due to illness in the past.

Pundits discussed McCain’s character, and his daughter provided a “loving portrait.” Then coverage shifted to report the senator’s return to Washington, and late last night his key no vote on the “skinny” health care repeal.



White House: Spicer out, Scaramucci in 

After Sean Spicer resigned as White House communications director, Fox News and MSNBC offered reviews of his time at the podium.

On Fox News, Howard Kurtz introduced Spicer as someone “long known to reporters as an affable spokesman; he became the president’s pit bull,” and went on to give a run-down of his controversial relationship with the press. The conclusion, “He lasted exactly, six months.”

MSNBC offered a mashup of some of Spicer’s most famous statements. These include: “This was the largest audience to ever witness an inauguration, period, both in person and around the globe,” and “But you had a – you know, someone who is as despicable as Hitler who didn’t even sink to using chemical weapons.”

Late this week, Ryan Lizza published an article in The New Yorker based on a phone call he received from the new White House communications director, Anthony Scaramucci, in which the new White House communications director used profanity to describe other members of the White House staff he accused of leaking information. That article soon became fodder for cable TV.



Schumer, Ryan weigh in on Mueller

As Special Counsel Robert Mueller widens his investigation into Russian interference in U.S. elections, speculation is running high on TV news that President Donald Trump might fire him.

Fox News ran a clip of Senate Minority Leader Chuck Schumer, D., NY., saying, “I think it would cause a cataclysm in Washington.”

MSNBC ran a radio clip from House Speaker Paul Ryan, R., Wis.:  “I don’t think many people are saying Bob Mueller is a person who is a biased partisan. We have an investigation in the House, an investigation in the Senate, and a special counsel which sort of depoliticizes this stuff and gets it out of the political theater.”



Fact-check: Transgender people in the military would lead to tremendous medical costs and disruption (lacks context)

In a series of tweets this week, President Trump wrote, “After consultation with my Generals and military experts, please be advised that the United States Government will not accept or allow… Transgender individuals to serve in any capacity in the U.S. Military. Our military must be focused on decisive and overwhelming… victory and cannot be burdened with the tremendous medical costs and disruption that transgender in the military would entail. Thank you.”

For FactCheck.org, Eugene Kiely reported, “Although Trump described the cost as ‘tremendous,’ RAND estimated that providing transition-related health care would increase the military’s health care costs for active-duty members ‘by between $2.4 million and $8.4 million annually.’ That represents an increase of no more than 0.13 percent of the $6.27 billion spent on the health of active-duty members in fiscal 2014.”



Fact Check: Nixon held meetings with heads of state without an American interpreter (true)

Speaking on “The Rachel Maddow Show,” Ian Bremmer, president of the Eurasia Group, said:  “Apparently, President Nixon used to do it because he felt, didn’t really trust the State Department, at that point, providing the translators and didn’t necessarily want information getting out, leaking, that he would want to keep private.”

“True,” wrote Joshua Gillan for PolitiFact: “Presidential historians, historical accounts and Nixon’s own memoir show this was the case. But it’s notable that even in the example most comparable to Trump’s meeting with Putin, when Nixon used only a Soviet translator during two meetings with Brezhnev, official records of the meeting exist.”



Fact-check: Allowing insurers to sell plans across state lines will mean premiums go down 60-70% (no evidence)

Not long before the Senate took up health care reform, President Donald Trump said “We’re putting it [allowing insurers to sell plans across state lines] in a popular bill, and that will come. And that will come, and your premiums will be down 60 and 70 percent.”

FactCheck.org’s Lori Robertson reported the “National Association of Insurance Commissioners — a support organization established by the country’s state insurance regulators — said the idea that cross-state sales would bring about lower premiums was a ‘myth.’”



Fact-Check: When the price for oil goes up, it goes up, and never goes down (false)

In an interview Sunday about the new Democratic Party national agenda, Senate Minority Leader Chuck Schumer, D., N.Y., said, “We have these huge companies buying up other big companies. It hurts workers and it hurts prices. The old Adam Smith idea of competition, it’s gone. So people hate it when their cable bills go up, their airline fees. They know that gas prices are sticky. You know … when the price for oil goes up on the markets, it goes right up, but it never goes down.”

For PolitiFact, Louis Jacobson reported, “This comment takes a well-known phenomenon and exaggerates it beyond recognition. While experts agree that prices tend to go up quickly after a market shock but usually come down more slowly once the shock is resolved, this phenomenon only occurs on a short-term basis – a couple of weeks in most cases.”

To receive the TV News Archive’s email newsletter, subscribe here.

Posted in Announcements, News, Television Archive | Tagged , , , , , , , , , , , , , , , , , | 1 Comment

You’re Invited to a Community Screening of PBS series, AMERICAN EPIC: Sunday July 30 & Aug 6

In celebration of the launch of the “Great 78 Project” the Internet Archive is sponsoring a Community Screening of the PBS documentary series “American Epic”, an inside look at one of the greatest-ever untold stories: how the ordinary people of America were given the opportunity to make 78 records for the first time.

“Without the recording lathe, Willie Nelson would have never heard the Carter Family sing. Neither would Merle Haggard or Johnny Cash. These portable machines toured the country in the 1920s, visiting rural communities like Poor Valley, West Virginia, and introducing musicians like the Carter Family to new audiences. This remarkable technology forever changed how people discover and share music, yet it was almost lost to history until music legend T Bone Burnett and a few friends decided to bring it back.” Charlie Locke – WIRED

The program will be introduced by Brewster Kahle of the Internet Archive.

Please RSVP on our free Eventbrite page.

Date: Sunday July 30th  – “The Big Bang” (:54 min) & “Blood and Soil” (:54 min)

Date: Sunday August 6th  – “Out of the Many, the One” (1:24min) & “Sessions” (1:57min)

Time: Doors Open at 6:30 pm – Screening(s) at 7:00 pm

Cost: FREE and open to the public

Where: Internet Archive Headquarters 300 Funston Avenue, San Francisco, CA

“American Epic” Teaser: https://youtu.be/jcbATyomETw

Posted in 78rpm, Announcements, Event | 5 Comments

Internet Archive Artist in Residence Exhibition — August 5–26

By Amir Esfahani

Ever Gold [Projects] is pleased to present The Internet Archive’s 2017 Artist in Residence Exhibition, an exhibition conceived in collaboration with the Internet Archive presenting the culmination of the first year of the Internet Archive’s visual arts residency program, featuring work by artists Laura Hyunjhee Kim, Jeremiah Jenkins, and Jenny Odell.

The Internet Archive visual arts residency is organized by Amir Saber Esfahani, and is designed to connect emerging and mid-career artists with the archive’s collections and to show what is possible when open access to information meets the arts. The residency is one year in length during which time each artist will develop a body of work that culminates in an exhibition utilizing the resource of the archive’s collection in their own practice.

During the residency Kim, Jenkins, and Odell worked with specific aspects of the Internet Archive, both at its Bay Area facilities and remotely in their studios, producing multi-media responses that employ various new media as well as more traditional materials and practices.

Public Programming: Saturday, August 5th, 4-5pmBrewster Kahle, Founder & Digital Librarian, Internet Archive, in conversation with Laura Hyunjhee Kim and Jeremiah Jenkins. Moderated by Andrew McClintock, Owner/Director of Ever Gold [Projects].
Opening Reception: Saturday, August 5th, 5-8pm
Location: Ever Gold [Projects] 1275 Minnesota St
Exhibit Dates: Aug 5–26, 2017

Jenny Odell: “For my projects, I’m extracting “specimens” from 1980s Byte magazines and animation demo reels—specimens being objects or scenes that are intentionally or unintentionally surreal. These collected and isolated images inadvertently speak volumes about some of the stranger and more sinister aspects that technology has come to embody.”

Jeremiah Jenkins: “Browser History is a project is about preserving the Internet for the very distant future. I will be transferring webpages from the Internet Archive and elsewhere onto clay tablets by creating stamps with the text and images, then pressing them into wet clay. After being fired, the slabs will be hidden in caves, buried strategically, and submerged in the sea to await discovery in the distant future. The oldest known clay tablet is a little over 4,000 years old. The cave paintings in Lascaux are around 14,000 years old. The oldest known petroglyphs are near 46,000 years old. It’s conceivable that these fired clay tablets could last for 50,000 years or more. The tablets will be pages from websites that document trade, lifestyle, art, government, and other aspects of our society that are similar to the kinds of information we have about ancient civilizations.”

Laura Hyunjhee Kim: “The Hyper Future Wave Machine is a project that positions the years 2017 and beyond as a speculative future based on audiovisual ephemera published in the years 1987 to 1991. Born in the late ’80s, I wanted to explore the technological advancements and innovations that were popularized during the nascent years of the World Wide Web. Utilizing the Internet Archive as a time machine, I searched through the archived commercial and educational media representations of networked technology, personalized computers, and information systems. Often hyperbolic with a heightened emphasis on speed, power, and the future, slogans from those past years are still relevant and surface aspirations that continue to introduce the “next big thing” to the present generation: “REALIZE THE FUTURE, YOU ALREADY LIVE IN.” As the title of the project suggests, the work revolves around an imaginary media access system, namely the Hyper Future Wave Machine (HFWM). Described as a three-way-cross-hybrid existing/nonexistent/and-yet-to-exist metaphysical machine, the concept came from contemplating data portability and the trajectory of human-machine interface technology that seamlessly minimizes physical interaction. From buttons to touchscreens to speech, would the next ubiquitously applied interface operate using some sort of nonverbal neural command?”

Posted in Announcements, Event | 1 Comment

TV News Record: adventures with Face-O-Matic

A weekly round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman

This week we bring you adventures with Face-O-Matic; fact-checks on President Donald Trump’s legislative record and on health care reform; and we follow the TV on use of the term “lies” and “lying.”

Here’s a face, there’s a face, everywhere a face…

Face-O-Matic, our new experimental Slack app that finds faces of political leaders on major national cable networks, has given us a whole new perspective on how imagery is used in news production. Sure, Face-O-Matic picks up clips of President Donald Trump, Senate Majority Leader Mitch McConnell, R., Ky., and others speaking, whether on the floor, at press conferences, or at a luncheon.

However, often these elected officials’ faces are used to illustrate a point a news anchor is making, or in footage without audio, sometimes as a floating head somewhere on the screen, or as part of a tweet. Face-O-Matic even picks up faces in a crowd.

Face-O-Matic can help find frequently re-aired clips.

Face-O-Matic can find even images that are only briefly displayed on the screen.

Face-O-Matic finds both still and video of Trump in a single clip.

Please take Face-O-Matic on a spin and share your feedback with us, tvnews@archive.org. This blog post explains how it fits into our overall plan to turn TV news into data. To install,  for now you’ll need to ask your Slack team administrator or owner to set it up. The administrator can click on the button below to get started. Visit Slack to learn how to set up or join a Slack team. Questions? Contact Dan Schultz, dan.schultz@archive.org.

Fact-check: Trump has signed more bills than any president ever (wrong)

News cameras captured Trump saying, “We’ve signed more bills — and I’m talking about through the legislature — than any president ever.” (A moment later, he commented that he doesn’t “like Pinocchios,” referring to The Washington Post’s Fact Checker rating system.)

Glenn Kessler, reporting for that same fact-checking site, explained why Trump is not, in fact, besting his predecessors in the White House when it comes to bill signing. But, he refrains from stating how many Pinocchios the president had earned: “Tempted as we are to give the president Pinocchios for his statement, he seemed to be speaking off the cuff and was operating on outdated information from his first 100 days. We don’t play gotcha here at The Fact Checker, and we appreciate that he added a caveat. He certainly appeared to pause for a moment and wonder if he was right. For Trump, that’s a step in the right direction…But he’s way off the mark and actually falling behind in legislative output.”



Fact-check: “bushel” of Pence claims on health care reform (range from “twists the facts” to “false”)

Also writing for The Washington Post’s Fact Checker, Michelle Ye Hee Lee checked a number of statements Vice President Mike Pence made about the Senate health care reform bill during an appearance at the National Governors Association.

These included, for example, the claim, “I know Governor Kasich isn’t with us, but I suspect that he’s very troubled to know that in Ohio alone, nearly 60,000 disabled citizens are stuck on waiting lists, leaving them without the care they need for months or even years.”

Lee wrote that this claim is false: “[T]here’s no evidence the wait lists are tied to Medicaid expansion. We previously gave four Pinocchios to a similar claim….The expansion and wait list populations are separate, and expansion doesn’t necessarily affect the wait list population….Whether people move off the wait list depends on many factors, such as how urgent their needs are, how long they’ll need services and whether the states have money to pay for them. Many times, a slot opens up only if someone receiving services moves out of the state or dies.”



Follow the TV

There’s been much controversy in news gathering circles about when, whether, and how to invoke the word “lie” when reporting on public officials. One of our archivists, Robin Chin, has noticed a number of prominent uses of the term by commentators in recent TV news coverage.

For example, here’s Shepard Smith on Fox News on July 14 saying, “Jared Kushner filled out his form. I think it’s an F-86 saying who he met with and what he had done… He went back and added 100 names and places. None of these people made it… Why is it lie after lie after lie? … My grandmother used to say when first we practice to — oh, what a tangled web we weave when first we practice to deceive. The deception, Chris, is mind boggling.”

And here’s Tom Brokaw on July 16 on NBC’s “Meet the Press,” saying: “Certainly there are atmospherics here that call to mind Watergate, the kind of denial of the obvious and the petty lying that is going on. But at the same time, Watergate, I like to think, was there by itself and this president is entangling himself in that kind of discussion they we’re having here today when it’s not in the interest of anyone, most of all this country, when we have so many issues before us. It’s got to get cleaned up.

On July 17, on “CNN: Tonight With Don Lemon,” here is David Gergen saying: “Other presidents succeed at this by just being straightforward about the facts. And it’s gone on for so long and so duplicitous and so much double speak that you begin to wonder, this is quite intentional. This may be quite intentional. You create a fog bank of lies and uncertainties and vagueness and create so many different details that people just sort of say, the hell with that, I don’t want to watch this… My sense is that a lot of Americans are starting to tune out…”

Search captions for terms you are interested in at the TV News Archive. For trends, try the Television Explorer, built by data scientist Kalev Leetaru, and powered by TV News Archive data, which can provide quick visualizations of terms broken down by network.

To receive the TV News Archive’s email newsletter, subscribe here.

Posted in News, Television Archive | Tagged , , , , , , , , , , , , | 1 Comment

Internet Archive TV News Lab: Introducing Face-O-Matic, experimental Slack alert system tracking Trump & congressional leaders on TV news

Working with Matroid, a California-based start up specializing in identifying people and objects in images and video, the Internet Archive’s TV News Archive today releases Face-O-Matic, an experimental public service that alerts users via a Slack app whenever the faces of President Donald Trump and congressional leaders appear on major TV news cable channels: CNN, Fox News, MSNBC, and the BBC. The alerts include hyperlinks to the actual TV news footage on the TV News Archive website, where the viewer can see the appearances in context of the entire broadcast, what comes before and what after.

The new public Slack app, which can be installed on any Slack account by the team’s administrator, marks a milestone in our experiments using machine learning to create prototypes of ways to turn our public, free, searchable library of 1.3 million+ TV news broadcasts into data that will be useful for journalists, researchers, and the public in understanding the messages that bombard all of us day-to-day and even minute-to-minute on TV news broadcasts. This information could provide a way to quantify “face time”–literally–on TV news broadcasts. Researchers could use it to show how TV material is recycled online and on social media, and how editorial decisions by networks help set the terms of public debate.

If you want Face-O-Matic to post to a channel on your team’s Slack, ask an administrator or owner to set it up. The administrator can click on the button below to get started. Visit Slack to learn how to set up or join a Slack team. Questions? Contact Dan Schultz, dan.schultz@archive.org.

Add to Slack

To begin, Dan Schultz, senior creative technologist for the TV News Archive, trained Matroid’s facial detection system to recognize the president;  Senate Majority Leader Mitch McConnell, R., Ky., and Senate Minority Leader Charles Schumer, D, NY; and House Speaker Paul Ryan, R-Wis. and House Minority Leader Nancy Pelosi, D., Calif. All are high-ranking elected officials who make news and appear often on TV screens. The alerts appear in a constantly updating stream as soon as the TV shows appear in the TV News Archive

For example, on July 15, 2017 Face-O-Matic detected all five elected officials in an airing of MSNBC Live.

As can be seen, the detections in this case last as little as a second – for example, this flash of Schumer’s and McConnell’s faces alongside each other is a match for both politicians. The moment is from a promotion for “Morning Joe,” the MSNBC show that made headlines in late June when co-hosts Mika Brzezinski and Joe Scarborough were the targets of angry tweets from the president.  

The longest detected segment in this example is 24 seconds featuring Trump, saying “we are very very close to ending this health care nightmare. We are so close. It’s a common sense approach that restores the sacred doctor-patient relationship. And you’re going to have great health care at a lower price.”

Why detect faces of public officials?

First, our concentration on public officials is purposeful; in experimenting with this technology, we strive to respect individual privacy and harvest only information for which there is a compelling public interest, such as the role of elected officials in public life. The TV News Archive is committed to these principles developed by leading artificial intelligence researchers, ethicists, and others at a January 2017 conference organized by the Future of Life Institute.

Second, developing the technology to recognize faces of public officials contained within the TV News Archive and turning it into data opens a whole new dimension for journalists and researchers to explore for patterns and trends in how news is reported.  

For example, it will eventually be possible to trace the origin of specific video clips found online; to determine how often the president’s face appears on TV networks and programs compared to other public officials; to see how often certain video clips are repeated over time; to determine the gender ratio of people appearing on TV news; and more. It will become useful not just in explaining how media messages travel, but also as a way to counter misinformation, by providing a path to verify source material that appears on TV news.

This capability adds to the toolbox we’ve already begun with the Duplitron, the open source audio fingerprinting tool developed by Schultz that the TV News Archive used to track political ads and debate coverage in the 2016 elections for the Political TV Ad Archive. The Duplitron is also the basis for The Glorious ContextuBot, which was recently awarded a Knight Prototype Fund grant.

All of these lines of exploration should help journalists and researchers who currently can only conduct such analyses by watching thousands of hours of television and hand coding it or by using an expensive private service. Because we are a public library, we make such information available free of charge.

What’s next?

The TV News Archive will continue to work with partners such as Matroid to develop methods of extracting metadata from the TV News Archive and make it available to the public. We will develop ways to deliver such experimental data in structured formats (such as JSON, csv, etc.) to augment Face-O-Matic’s Slack alert stream. Such data could help researchers conduct analyses of the different amounts of “face-time” public officials enjoy on TV news.

Schultz also hopes to develop ways to augment the facial detection data with closed captioning, with for example OpenedCaptions, another open source tool he created that provides a constant stream of data from TV for any service set up to listen. This will make it simpler to search such data sets to find a particular moment that a researcher is looking for. (Accurate captioning presents its own technological challenges: see this post on Hyper.Audio’s work.)

Beyond this experimental facial detection, we have big plans for the future.  We are planning to make more than a million hours of TV news available to researchers from both private and public institutions via a digital public library branch of the Internet Archive’s TV News Archive. These branches would be housed in computing environments, where networked computers provide the processing power needed to analyze large amounts of data.

Researchers will be able to conduct their own experiments using machine learning to extract metadata from TV news. Such metadata could include, for example, speaker identification–a way to identify not just when a speaker appears on a screen, but when she or he is talking. Researchers could create ways to do complex topic analysis, making it possible to trace how certain themes and talking points travel across the TV news universe and perhaps beyond. Metadata generated through these experiments would then be used to enrich the TV News Archive, so that any member of the public could do increasingly sophisticated searches.

Feedback! We want it 

We are eager to hear from people using the Face-O-Matic Slack app and get your feedback.

  • Is the Face-O-Matic Slack app useful? What would make it more useful?
  • Would a structured data stream delivered via JSON, csv, and/or other means be helpful? What sort of information would you like to be included in such a data set?
  • Who is it important for us to track?
  • What else?

Please reach us by email at: tvnews@archive.org, or via twitter @tvnewsarchive. Also please consider signing up for our weekly TV News Archive newsletter. Or, comment or make contributions over here, where Schultz is documenting his progress; all the code developed is open source. (One observer already provided images for a training set to track Mario, the cartoon character.)

The weeds

The TV News Archive, our collection of 1.3 million+ TV news broadcasts dating back to 2009, is already searchable through closed captions.

But captions don’t always get you everything you want. If you search, for example, on the words “Donald Trump” you get back a hodge-podge of clips in which Trump is speaking and clips where reporters are talking about Trump. His image may not appear on the screen at all. The same is true for “Barack Obama,” “Mitch McConnell,” “Chuck Schumer,” or any name.

.

Search “Barack Obama” and the result is a hodge podge of clips.

Developing the ability to search the TV News Archive by recognizing the faces of public officials requires applying algorithms such as those developed by Matroid. In the future we hope to work with a variety of firms and researchers; for example, Schultz is also working on a separate facial detection experiment with the firm Datmo.

Facial detection requires a number of related steps: first, training the system to recognize where a face appears on a TV screen; second, extracting that image so it can be analyzed; and third, comparing that face to a set known to be a particular person to discover matches.

In general, facial recognition algorithms tend to rely on the work of FaceNet, described in this 2015 paper, in which researchers describe creating a way of “mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.” In other words, it’s a way of turning a face into a pattern of data, and it’s sophisticated enough to describe faces from various vantage points – straight ahead, three-quarter view, side view, etc. To develop Face-o-Matic, TV News Archive staff collected public images of elected officials from different vantage points to use as training sets for the algorithm.

The Face-O-Matic Slack app is meant to be a demonstration project that allows the TV News Archive a way to experiment in two ways: first, by creating pipelines that run the TV News Archive video streams through Artificial Intelligence models to explore whether the resulting information is useful; second, by using a new way to distribute TV News Archive information through the popular Slack service, used widely in journalistic and academic settings.  

We know some ways it can be improved, but we also want to hear from you, the user, with your ideas. In the words of Thomas the Tank Engine, we aspire to be a “really useful engine.”

Face-O-Matic on GitHub

Follow TV News Archive progress in recognizing faces on TV on the following GitHub pages:

Tvarchive-faceomatic. The Face-o-Matic 2000 finds known faces on TV.

Tvarchive-ai_suite. A suite of tools for exploring AI research against video

This post is part of a blog series, TV News Lab, in which we demonstrate how the Internet Archive is partnering with technology, journalism, and academic organizations to experiment with and improve the TV News Archive, our free, public, online library of TV news shows. 

 

 

Posted in Announcements, News, Television Archive | Tagged , , , , , , , , , , , , , , , , | Comments Off on Internet Archive TV News Lab: Introducing Face-O-Matic, experimental Slack alert system tracking Trump & congressional leaders on TV news

IMLS Grant to Advance Web Archiving in Public Libraries

We are excited to announce that the Institute of Museum and Library Services (IMLS) has recently awarded our Archive-It service a Laura Bush 21st Century Librarian grant from its Continuing Education in Curating Collections program for the project Community Webs: Empowering Public Librarians to Create Community History Web Archives.

Working with partners from Queens Public Library, Cleveland Public Library, and San Francisco Public Library, and with OCLC’s WebJunction, which offers education and training to public libraries nationwide, the “Community Webs” project will provide training, cohort support, and services, for a group of librarians at 15 different public libraries to develop expertise in creating collections of historically valuable web materials documenting their local communities. Project outputs will include over 30 terabytes of community history web archives and a suite of open educational resources, from guides to videos, for use by any librarian, archivist, or heritage professional working to preserve collections of local history comprised of online materials.

We are now accepting applications from public libraries to participate in the program! Please help us spread the word about this opportunity to the entire public library community. You can also visit the program’s webpage for more information and the project’s grant materials are available through the IMLS award page.

Curating web archives documenting the lives of their patrons offers public librarians a unique opportunity to position themselves as the natural stewards of web-published local history and solidifies their role as information custodians and community anchors in the era of the web. We owe a debt of thanks to IMLS for supporting innovative tools and training for librarians and look forward to working with our public library friends and colleagues to advance web archiving within their profession and for the benefit of their local communities.

Posted in News | 1 Comment

Film Screening: Lost Landscapes of LA on August 7

By Rick Prelinger

Lost Landscapes of Los Angeles (2016, 83 minutes) is an experimental documentary tracing the changing city of Los Angeles (1920s-1960s), showing how its landscape expresses an almost infinite collection of mythologies. Made from home movies and studio-produced “process plates” — background images of the city shot by studio cinematographers for rear projection in feature films — Lost Landscapes depicts places, people, work and daily life during a period of rapid urban development. While audience  members are encouraged to comment, discuss and ask questions during the screening of this silent film, it is also a contemplative film that shows the life and growth of the U.S.’s preeminent Western metropolis as the sum of countless individual acts.

Lost Landscapes of Los Angeles is the latest of Rick Prelinger’s “urban history film events,” featuring rediscovered and largely-unseen archival film footage arranged into feature-length programs. Unlike most screenings, the audience makes the soundtrack — viewers are encouraged to identify places, people and events; ask questions; and engage with fellow audience members. While the films show Los Angeles as it was, the event encourages viewers to think about (and share) their ideas for the city’s future. What kind of a city do we want to live in?

Rick Prelinger is an archivist, filmmaker, and educator. He teaches at UC Santa Cruz and is a board member of Internet Archive. His films made from archival material have played at festivals, museums, theaters, and educational institutions around the world. Lost Landscapes of San Francisco (11 episodes, 2006-2016) plays every autumn in San Francisco. He has also made urban history films in Oakland and Detroit, and is currently producing a New York film for an autumn premiere. He thanks Internet Archive and its staff for making this film possible.

Get Tickets Here

Monday, August 7th, 2017
6:30 pm Reception
7:30 pm Interactive Film Program

Internet Archive
300 Funston Ave.
San Francisco, CA 94118

Posted in Announcements, Event, Movie Archive | 1 Comment