Today the Internet Archive’s TV News Archive announces a new way to plumb our TV news collections to see how news stories are reported: data feeds for the news that appears as chyrons on the lower thirds of TV screens. Our Third Eye project scans the lower thirds of TV screens, using OCR, or optical character recognition, to turn these fleeting missives into downloadable data ripe for analysis. At launch, Third Eye tracks BBC News, CNN, Fox News, and MSNBC, and contains more than four million chyrons captured in just over two weeks.
Download Third Eye data. API and TSV options available.
Third Eye joins a growing suite of TV News Archive tools that help researchers, journalists, and the public analyze how news is filtered through TV and presented to the public. These include Face-o-Matic, created through a partnership with Matroid, which uses facial recognition to find top political leaders on TV news shows; and Television Explorer, an interface created by data scientist Kalev Leetaru that allows easy searching and visualization of TV News Archive closed captioning. The Political TV Ad Archive used audio fingerprinting to find airings of political ads in the 2016 elections, and the Trump and U.S. Congress archives provide a quick way to see news clips featuring top political figures, alongside associated fact checks by FactCheck.org, PolitiFact, and The Washington Post‘s Fact Checker.
Breaking news often appears as chyrons on TV before newscasters begin reporting or video is available, whether the subject is a hurricane or a breaking political story. Which chyrons a TV news network chooses to display often reveals editorial decisions that can demonstrate a particular slant on the news. With Third Eye data, investigations by journalists, fact-checkers, researchers, can explore how messages are delivered to the public in near real-time.
Third Eye on Twitter tweets the most clear, representative chyron from a one-minute period on a particular TV news channel. This can serve as an alert system, showing how TV networks are reporting news.
For example, on September 6, 2017, in the midst of a heavy news day featuring Hurricane Irma, the debate over a deal on immigration, and other stories, TV news cable networks began to show the breaking news that Facebook had turned over information about $100,000 in ads purchased by Russian sources during the 2016 elections to Robert S. Mueller III, the special counsel investigating ties between the Trump campaign and Russia. Our Third Eye CNN Twitter bot tweeted out this chyron recorded at 2:38 pm Pacific Standard Time.
CNN 2:38pm FACEBOOK: SOLD ADS TO FAKE RUSSIAN ACCOUNTS DURING ELECTION CAMPAIGN
CNN 2:38pm FACEBOOK: SOLD ADS TO FAKE…
— The Third Eye (@tvThirdEye) September 6, 2017
Here is the corresponding clip as it appears on the TV News Archive.
At 2:51 p.m., MSNBC ran this chyron: “FACEBOOK: WE SOLD POLITICAL ADS DURING ELECTION TO COMPANY LIKELY OPERATED IN RUSSIA.” The corresponding clip is below.
However, our data do not show Fox News running any chyrons on the Facebook ad news that day. To cross-check, we used Television Explorer, a tool for searching TV News Archive closed captions. (Captions differ from chyrons; captions capture what news anchors are actually saying, as opposed to chyrons, which feature text chosen by the TV channel to run at the bottom of the screen.) Television Explorer shows CNN and MSNBC covering the story on September 6, but not Fox News.
However, the Facebook ad story did make it on to the Fox News website during the 2 p.m. hour, as this search on the Wayback Machine shows.
This is just one example of the way that researchers might use Third Eye chyron data in conjunction with other tools to explore how a particular story is portrayed on TV news. We’d love for others to dig in, explore, and give us feedback on this new public data source.
More on Third Eye data
The work of the Internet Archive’s TV architect Tracey Jaquith, the Third Eye project applies OCR to the “lower thirds” of TV cable news screens to capture the text that appears there. The chyrons are not captions, which provide the text for what people are saying on screen, but rather are narrative display text that accompanies news broadcasts.
Created in real-time by TV news editors, chyrons sometimes include misspellings. The OCR process also frequently adds another element where text is not rendered correctly, leading to entries that may be garbled. To make sense out of the noise, Jaquith applies algorithms that choose the most representative chyrons from each channel collected over 60-second increments. This cleaned-up feed is what fuels the Twitter bots that post which chyrons are appearing on TV news screens.
We provide options to download this filtered feed and/or the raw feed nearly as soon as it appears on the TV screen. Both may be useful depending on the type of project. In addition, the Twitter feed itself is a good source to see what the filtered feed looks like.
- Chryons are derived in near real-time from the TV News Archive‘s collection of TV news. The constantly updating public collection contains 1.4 million TV news shows, some dating back to 2009.
- At launch, Third Eye captures four TV cable news channels: BBC News, CNN, Fox News, and MSNBC.
- Data can be affected by temporary collection outages, which typically can last minutes or hours, but rarely more. If you are concerned about a specific time gap in a feed and would like to know if it’s the result of an outage, please inquire at email@example.com.
- The “raw feed” option provides all of the OCR’ed text from chryons at the rate of approximately one entry per second. The “filtered tweets feed” provides the data that fuels our Twitter bots; this has been filtered to find the most representative, clearest chyrons from a 60-second period, with no more than one entry/tweet per minute (though the duration may be shorter than 60 seconds.) The filtered feed relies on algorithms that are a work in progress; we invite you to share your ideas on how to effectively filter the noise from the raw data.
- Dates/times are in UTC (Coordinated Universal Time) in API feeds, PST (Pacific Standard Time) in tweets.
- Because the size of the raw data is so large (about 20 megabytes per day), we limit results to seven days per request.
- We began collecting raw data on August 25, 2017; the filtered feed begins on September 7, 2017.
- “Duration” column is in seconds–the amount of time that particular chyron appeared on the screen.
- To view clips in context on the TV News Archive, paste “https://archive.org/details/” before the field that begins with a channel name. For example, “FOXNEWSW_20170919_100000_FOX__Friends/start/792” becomes “https://archive.org/details/FOXNEWSW_20170919_100000_FOX__Friends/start/792”
We want to hear from you! Please contact us with questions, feedback, concerns – and also to tell us what project you’ve done with the TV News Archive’s Third Eye project: firstname.lastname@example.org. Follow us @tvnewsarchive, and subscribe to our weekly newsletter here.
Thanks to Robin Chin, Katie Dahl, Dan Schultz, and the TV News Archive director, Roger Macdonald, for contributing to this project.