Volunteers needed: We have a fabulous TV collection, and the US is going into an election period. We would like to pull out the TV Commercials, including the political ads, and match them with the other occurrences, and then put names on them. Then we and others can datamine and surface this information.
We hope we could find all ads so we can know when and were they ran. We would like to not just limit this to political ads because sometimes the ads are the best parts of shows, and many ads are stealthy-political.
To help in this process, we have closed caption transcripts of what is said in US TV as well as full resolution TV recordings. We also often have a rebroadcast of the same program which would likely then have different commercials. We do have to be careful with this data so, we would like to run this locally in our virtual machine “virtual reading room“.
We tried the open source commercial detector included in MythTV, but it seemed to leave all the commercials in a commercial break in a block. Also it was not that reliable. It needs more work.
This is not an easy project, and do not have a budget (yet) to pay for it, unfortunately, so maybe fame and helping the open world. If you can help in this project, we would appreciate it.
Please leave a comment on this post or send a note to Roger Macdonald, the leader of the TV News project.
i need experience in the software development world, i only have 1 yrs worth so far and i am unemployed, if you get some volunteers on board i could work under and learn from i will be more then willing to help out, and i would be willing to work on documentation for the project as well.
I don’t normally work with computer vision but you may run something like BRISK on a bunch of videos to find patterns:
Could run if every 20 frames for example, to save disk space and speed up computation.
If a sets of points are very similar over 15 seconds or so, they may be the same video clip.
Check out http://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision
You might even be able to do some logo recognition using partial matches.
You could contact a bunch of people who have implemented cool things in this area and mention that the Archive.org needs help.
Can you provide more technical information. In what format is the source material stored, or do you need something that will grab it live? What type of development tools do need this program written in? Where will it run?
I need to know if a project like this falls within my skill set.
Would you please tell us more about the API? I couldn’t seem to find a link to it in the article, and a Google search turned up an API for books, but not for television programs.
How would I access the closed-captioned text?