Hyperaudio | Internet Archive Blogs

We are delighted to report that two of 20 winners recently announced Knight Prototype Fund’s $1 million challenge to combat misinformation will directly draw on the TV News Archive.

One of these is the Bad Idea Factory’s “Glorious ContextuBot.” Let’s say you come across a tweet that is brandishing a TV news clip to bolster a strong statement on a controversial issue – but you have no idea where or when or whether that clip was aired, by whom, and whether or not it’s legitimate.

The ContextuBot will take the Duplitron 5000, an audio fingerprinting tool developed to track political ads for the Political TV Ad Archive, and build upon it to help users find any relevant TV news coverage of that video snippet. If the video was aired, users will be able to see what came before and after the report.

The team, led by the Dan Schultz, senior creative technologist of the TV News Archive, will bring in veteran media innovators Mark Boas and Laurian Gridnoc of Hyperaudio and Trint. Together they will not only build the ContextuBot, but will also work to improve the speed and accuracy of the Duplitron.

From sticky notes to Glorious Contextubot: on right, Dan Schultz, TV News Archive senior creative technologist, plots prototype plans

Second, Joostware‘s “Who Said What” project will use deep learning algorithms to annotate TV news clips to identify speakers and what they are talking about. Developing this capability will help fact checkers sort through and identify claims by public officials, pundits, and others that bear examination.

We are delighted to work with Joostware as part of our ongoing goal to collaborate with researchers, companies, and others who are exploring how to use Artificial Intelligence tools to draw on the Internet Archive’s collections to enhance journalism and research.

The Knight Prototype Fund will award $50,000 apiece to each of the 20 winners, who are now charged with developing their ideas over the next nine months. The John S. and James L. Knight Foundation, the Democracy Fund, and the Rita Allen Foundation all support the effort. Winners attended a human-centered design training workshop last week in Phoenix, Arizona, as part of the support offered by the foundations.

In a new blog series, TV News Lab, we’ll demonstrate how the Internet Archive is partnering with technology, journalism, and academic organizations to experiment with and improve the TV News Archive, our free, public, online library of TV news shows. Here we interview Mark Boas, founder of the Hyperaudio project, an organization that works to make audio and video more accessible and shareable on the web, by providing an easy-to-use interface for copying and pasting bits of transcripts to create mash ups of shareable video. You can find the open source code powering Hyperaudio on GitHub.

Mark Boas talks to the Internet Archive about Hyperaudio.

NW: What is the problem you’re trying to solve by applying Hyperaudio technology to the TV News Archive?

MB: People find TV news credible. It’s very hard to fake TV news. I’d love to see people using TV news to back up any sort of political or other expression a public official is trying to make, by showing the source material and also the arguments about those statements. I think this also has implications for improving media literacy.

(An example mix made at Chattanooga Public Library.)

NW: What stands in the way of people sharing TV news video right now?

MB: One of the problems is that audio and video on the web has been a black box in a way. It has not been very well integrated into the web because it’s difficult to do that. If you see a big block of text, it’s easy to highlight, copy, paste and send it off. But if you have an interesting piece of audio to share, how do you do that? There are ways to do it, but it’s not intuitive.

Coupled with that is it’s also hard to find audio on the internet. If you’re searching for search terms, you may or may not find what you want, but only if someone has added sufficient metadata so it’s discoverable. Transcripts allow you to search, but also provide a way to share. And the key to that is that you need not just the transcript, but also you need to match the words in the transcript to the proper times in the audio.

NW: Why is it hard to match the transcript to the audio in a video?

The first step is getting a good quality transcript. It’s great that the TV News Archive uses open captions, but it’s not perfect. (Note: the TV News Archive is searchable via closed captioning, but there’s often a several-second lag between the captions and the video, as well as other quality issues.) The transcript usually needs to be cleaned up. The better the transcript, the better the match. Closed captions are done in real time by humans who make mistakes.

The next challenge is to try and minimize the time it takes to match the words in the transcript to the audio. If we want to automate the process, we need to figure out how to do that more quickly. It’s very intensive on the computing side. I’m experimenting with chunking up the video to speed up the process. I think we’ll see that the matching is an exponential task: a one hour transcript might take 30 minutes and a three hour transcript might take more than three times that. But if we split it up into smaller chunks, the processing might become more efficient

How do people try out Hyperaudio?

Hyperaudio is not a commercial software as a service. It’s more of a demo of the underlying technology. We work with groups like the Studs Terkel Radio Archive (WFMT Chicago), to help them make the most of their content and data; whatever we make flows back into our open source code on GitHub. What we do is very experimental, but it will give you an idea what’s possible. If you want to experiment with TV News Archive, you can do that at http://newsarchive.hyperaud.io/. More info on our experiments and collaborations can be found on our blog.

Internet Archive Blogs

A blog from the team at archive.org

Tag Archives: Hyperaudio

Knight Prototype Fund winners use TV News Archive to fight misinformation

TV News Lab: Hyperaudio improving TV news video captioning and sharing