Working with Matroid, a California-based start up specializing in identifying people and objects in images and video, the Internet Archive’s TV News Archive today releases Face-O-Matic, an experimental public service that alerts users via a Slack app whenever the faces of President Donald Trump and congressional leaders appear on major TV news cable channels: CNN, Fox News, MSNBC, and the BBC. The alerts include hyperlinks to the actual TV news footage on the TV News Archive website, where the viewer can see the appearances in context of the entire broadcast, what comes before and what after.
The new public Slack app, which can be installed on any Slack account by the team’s administrator, marks a milestone in our experiments using machine learning to create prototypes of ways to turn our public, free, searchable library of 1.3 million+ TV news broadcasts into data that will be useful for journalists, researchers, and the public in understanding the messages that bombard all of us day-to-day and even minute-to-minute on TV news broadcasts. This information could provide a way to quantify “face time”–literally–on TV news broadcasts. Researchers could use it to show how TV material is recycled online and on social media, and how editorial decisions by networks help set the terms of public debate.
If you want Face-O-Matic to post to a channel on your team’s Slack, ask an administrator or owner to set it up. The administrator can click on the button below to get started. Visit Slack to learn how to set up or join a Slack team. Questions? Contact Dan Schultz, firstname.lastname@example.org.
To begin, Dan Schultz, senior creative technologist for the TV News Archive, trained Matroid’s facial detection system to recognize the president; Senate Majority Leader Mitch McConnell, R., Ky., and Senate Minority Leader Charles Schumer, D, NY; and House Speaker Paul Ryan, R-Wis. and House Minority Leader Nancy Pelosi, D., Calif. All are high-ranking elected officials who make news and appear often on TV screens. The alerts appear in a constantly updating stream as soon as the TV shows appear in the TV News Archive.
For example, on July 15, 2017 Face-O-Matic detected all five elected officials in an airing of MSNBC Live.
As can be seen, the detections in this case last as little as a second – for example, this flash of Schumer’s and McConnell’s faces alongside each other is a match for both politicians. The moment is from a promotion for “Morning Joe,” the MSNBC show that made headlines in late June when co-hosts Mika Brzezinski and Joe Scarborough were the targets of angry tweets from the president.
The longest detected segment in this example is 24 seconds featuring Trump, saying “we are very very close to ending this health care nightmare. We are so close. It’s a common sense approach that restores the sacred doctor-patient relationship. And you’re going to have great health care at a lower price.”
Why detect faces of public officials?
First, our concentration on public officials is purposeful; in experimenting with this technology, we strive to respect individual privacy and harvest only information for which there is a compelling public interest, such as the role of elected officials in public life. The TV News Archive is committed to these principles developed by leading artificial intelligence researchers, ethicists, and others at a January 2017 conference organized by the Future of Life Institute.
Second, developing the technology to recognize faces of public officials contained within the TV News Archive and turning it into data opens a whole new dimension for journalists and researchers to explore for patterns and trends in how news is reported.
For example, it will eventually be possible to trace the origin of specific video clips found online; to determine how often the president’s face appears on TV networks and programs compared to other public officials; to see how often certain video clips are repeated over time; to determine the gender ratio of people appearing on TV news; and more. It will become useful not just in explaining how media messages travel, but also as a way to counter misinformation, by providing a path to verify source material that appears on TV news.
This capability adds to the toolbox we’ve already begun with the Duplitron, the open source audio fingerprinting tool developed by Schultz that the TV News Archive used to track political ads and debate coverage in the 2016 elections for the Political TV Ad Archive. The Duplitron is also the basis for The Glorious ContextuBot, which was recently awarded a Knight Prototype Fund grant.
All of these lines of exploration should help journalists and researchers who currently can only conduct such analyses by watching thousands of hours of television and hand coding it or by using an expensive private service. Because we are a public library, we make such information available free of charge.
The TV News Archive will continue to work with partners such as Matroid to develop methods of extracting metadata from the TV News Archive and make it available to the public. We will develop ways to deliver such experimental data in structured formats (such as JSON, csv, etc.) to augment Face-O-Matic’s Slack alert stream. Such data could help researchers conduct analyses of the different amounts of “face-time” public officials enjoy on TV news.
Schultz also hopes to develop ways to augment the facial detection data with closed captioning, with for example OpenedCaptions, another open source tool he created that provides a constant stream of data from TV for any service set up to listen. This will make it simpler to search such data sets to find a particular moment that a researcher is looking for. (Accurate captioning presents its own technological challenges: see this post on Hyper.Audio’s work.)
Beyond this experimental facial detection, we have big plans for the future. We are planning to make more than a million hours of TV news available to researchers from both private and public institutions via a digital public library branch of the Internet Archive’s TV News Archive. These branches would be housed in computing environments, where networked computers provide the processing power needed to analyze large amounts of data.
Researchers will be able to conduct their own experiments using machine learning to extract metadata from TV news. Such metadata could include, for example, speaker identification–a way to identify not just when a speaker appears on a screen, but when she or he is talking. Researchers could create ways to do complex topic analysis, making it possible to trace how certain themes and talking points travel across the TV news universe and perhaps beyond. Metadata generated through these experiments would then be used to enrich the TV News Archive, so that any member of the public could do increasingly sophisticated searches.
Feedback! We want it
We are eager to hear from people using the Face-O-Matic Slack app and get your feedback.
- Is the Face-O-Matic Slack app useful? What would make it more useful?
- Would a structured data stream delivered via JSON, csv, and/or other means be helpful? What sort of information would you like to be included in such a data set?
- Who is it important for us to track?
- What else?
Please reach us by email at: email@example.com, or via twitter @tvnewsarchive. Also please consider signing up for our weekly TV News Archive newsletter. Or, comment or make contributions over here, where Schultz is documenting his progress; all the code developed is open source. (One observer already provided images for a training set to track Mario, the cartoon character.)
The TV News Archive, our collection of 1.3 million+ TV news broadcasts dating back to 2009, is already searchable through closed captions.
But captions don’t always get you everything you want. If you search, for example, on the words “Donald Trump” you get back a hodge-podge of clips in which Trump is speaking and clips where reporters are talking about Trump. His image may not appear on the screen at all. The same is true for “Barack Obama,” “Mitch McConnell,” “Chuck Schumer,” or any name.
Search “Barack Obama” and the result is a hodge podge of clips.
Developing the ability to search the TV News Archive by recognizing the faces of public officials requires applying algorithms such as those developed by Matroid. In the future we hope to work with a variety of firms and researchers; for example, Schultz is also working on a separate facial detection experiment with the firm Datmo.
Facial detection requires a number of related steps: first, training the system to recognize where a face appears on a TV screen; second, extracting that image so it can be analyzed; and third, comparing that face to a set known to be a particular person to discover matches.
In general, facial recognition algorithms tend to rely on the work of FaceNet, described in this 2015 paper, in which researchers describe creating a way of “mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.” In other words, it’s a way of turning a face into a pattern of data, and it’s sophisticated enough to describe faces from various vantage points – straight ahead, three-quarter view, side view, etc. To develop Face-o-Matic, TV News Archive staff collected public images of elected officials from different vantage points to use as training sets for the algorithm.
The Face-O-Matic Slack app is meant to be a demonstration project that allows the TV News Archive a way to experiment in two ways: first, by creating pipelines that run the TV News Archive video streams through Artificial Intelligence models to explore whether the resulting information is useful; second, by using a new way to distribute TV News Archive information through the popular Slack service, used widely in journalistic and academic settings.
We know some ways it can be improved, but we also want to hear from you, the user, with your ideas. In the words of Thomas the Tank Engine, we aspire to be a “really useful engine.”
Face-O-Matic on GitHub
Follow TV News Archive progress in recognizing faces on TV on the following GitHub pages:
Tvarchive-faceomatic. The Face-o-Matic 2000 finds known faces on TV.
Tvarchive-ai_suite. A suite of tools for exploring AI research against video
This post is part of a blog series, TV News Lab, in which we demonstrate how the Internet Archive is partnering with technology, journalism, and academic organizations to experiment with and improve the TV News Archive, our free, public, online library of TV news shows.