Category Archives: Video Archive

Archive video now supports WebVTT for captions

We now support .vtt files (Web Video Text Tracks) in addition to .srt (SubRip) (.srt we have supported for years) files for captioning your videos.

It’s as simple as uploading a “parallel filename” to your video file(s).

Examples:

  • myvid.mp4
  • myvid.srt
  • myvid.vtt

Multi-lang support:

  • myvid.webm
  • myvid.en.vtt
  • myvid.en.srt
  • myvid.es.vtt

Here’s a nice example item:
https://archive.org/details/cruz-test

VTT with caption picker (and upcoming A/V player too!)

(We will have an updated A/V player with a better “picker” for so many language tracks in days, have no fear 😎

Enjoy!

 

30 Days of Stuff

Jason Scott, free-range archivist, reporting in as 2017 draws to a close.

As part of our end-of-year fundraising drive, I thought it might be fun to tweet highlighted parts of the vast stacks of content that the Internet Archive makes available for free to millions. A lot of folks know about our Wayback Machine and its 20+ years of website history, but there’s petabytes of media and works available to see throughout the site. I called it “30 Days of Stuff”, and for the last 30 days I’ve been pointing out great items at the Archive, once a day.

You won’t have to swim upstream through my tweets; here on the last day, I’ve compiled the highlighted items in this entry. Enjoy these jewels in the Archive’s collection, a small sample of the wide range of items we provide.

Books and Texts

  • The Latch Key of my Bookhouse was one of the first books scanned by the Internet Archive in its book scanner tests, and it’s a 1921 directory of Children’s Literature that is filled with really nice illustrations that came out great.
  • As part of our ever-growing set of Defense Technical Information Center collection, we have The Role of the Citizens Band Radio Service and Travelers Information Stations In Civil Preparedness Emergencies Final Report, a 1978 overview of CB Radio and what role it might play in civil emergencies. Many thousands of taxpayer-funded educational and defense items are mirrored in this collection.
  • Also in the DTIC collection is The Battalion Commander’s Handbook 1980, which besides the crazy front page of stamps, approvals and sign-offs, is basically a manager’s handbook written from the point of view of the US Army.
  • There are hundreds of tractor manuals at the Archive. Hundreds! Of all types, languages (a lot of them Russian) and level of information. Tractors are one of those tools that can last generations and keeping the maintenance on them in the field can make a huge difference in livelihood.
  • A lovely 1904 catalog for plums called The Maynard Plum Catalogue was scanned in with one of our partner organizations and it’s a breathless and inspiring declaration of the future wonder of the plums this wizard of plum-growing, Luther Burbank, was bringing to the world.
  • Xerox Corporation released “A Metamorphosis of Creative Copying” in 1964, which seems to function as both promotion for Xerox and a weird gift to give to your kids to color in.
  • In 2014, a short zine called The Tao of Bitcoin was released, telling people the dream of $10,000 bitcoin would be real.
  • The 1888 chapbook Goody Two-Shoes has lovely illustrations, and a fine short story.
  • Working with a lovely couple who brought in a 1942 black-owned-businesses directory, I scanned the pages by hand and put them up into this item.
  • Inside that directory was an ad for a school of whistling that said it taught using the methods of Agnes Woodward, and a quick scan of the Archive’s stacks showed that we had an entire copy of her book Whistling as an Art!
  • The medical treatise Sleep and Its Derangements, from 1869, is William A. Hammond, MD’s overview of sleep, and what can go wrong. Scanned from the Francis A. Countway Library of Medicine, it’s one of many thousands of books we’ve scanned with partners.
  • Let Hartman Feather Your Nest could be described as “A furniture catalog” in the same way the Sistine Chapel could be described as “a place of worship”. The catalog is a thundering, fist-pounding declaration of the superiority of the Hartman enterprise and the quality and breadth of furniture and service that will arrive at your door and be backed up to the far reaches of time.

Magazines

  • Photoplay considered itself the magazine for the motion picture industry in the first part of the 20th century, and this multi-volume compilation of photos, articles and advertisements is a truly lovely overview.
  • There’s over 140 issues of the classic Maximum RockNRoll zine, truly the king of music zines for a very long time. On its newsprint pages are howls and screeches of all manner of punk, rock and the needs of musicians.
  • A magazine created by the Walt Disney Company to trumpet various parts of Disneyland and its attractions was called Vacationland, and this Fall 1965 issue covers all sorts of stuff about the park’s first decade.

Movies

  • Rescued from a warehouse years ago, a collection of Hollywood movie “B-Roll”, unused secondary scenes often filmed by different crew, has been digitized. My personal favorite is [Western Film Scenes], which is circa 1950s footage of a Western Town, all of it utterly fake but feeling weirdly real, to be used in a western. Don’t miss everyone standing around looking right at you and looking like they agree quite energetically with you!
  • No compilation could be complete without the legendary Duck and Cover, a cartoon/PSA that explained the simple ways to avoid injury in a nuclear blast. Just lie down! It’ll be fine. Please note: This Probably Won’t Work. But the song is very catchy.
  • The very weird Electric Film Format Acid Test from 1990 has a semi-interested model holding up a color bar plate in a wide, wide variety of film and video formats. Filmed just a few blocks away from the Internet Archive’s current headquarters.
  • I snuck in a 1992 interview with the Archive’s founder, Brewster Kahle, back when he was 33 and working at WAIS, a company or two before the Archive and where he is asked about his thoughts on information and gathering of data. It’s quite interesting to hear the consistency of thought.
  • The Office of War Information worked with Disney to create “Dental Health“, a film to show to troops about proper dental care. It’s a combination of straightforward animation and industrial film-making worth enjoying.

Audio

  • We have a collection of hours of the radio show The Shadow from 1938-1939, starring  Orson Welles at 23, at the height of his performance powers, playing the dual main role.
  • For Christmas Eve, we pointed to “Christmas Chopsticks”, a 1953 78rpm record of “Twas the Night Before Christmas” performed to the tune of the classic piano piece “Chopsticks”; one of tens of thousands of 78rpm records the Archive has been adding this year.
  • On Christmas, a user of the Archive uploaded two obscure albums he’d purchased on eBay – remnants of the S. S. Kresge Company, which became K-Mart, and which were played over the PA system for shoppers. He got his hands on Albums #261 and #294.
  • Earlier in the month before the user uploaded those Christmas albums, I linked to a different holiday collection of K-Mart items, a 1974 Reel-to-Reel that started with a K-Mart jingle and went full holiday from there.
  • Before he was a (retired) talk show host, and before he was a stand-up comedian, David Letterman worked and trained in radio. Happily, we have recordings of Dave Letterman, DJ, from when he was 22, at Ball State University.
  • Ron “Boogiemonster” Gerber has been hosting his weekly pop music recycling radio show, “Crap from the Past”, for over 25 years, and he’s been uploading and cataloging his show to the Archive for well over 10 of those years, including all the way back to the beginning of his show. The full Crap From The Past archive is up and is hundreds of hours of fun.
  • The truly weird “Conquer the Video Craze” is a 1982 record album with straightforward descriptions of how to beat games like Centipede, Defender, Stargate, Dig Dug, and more. This album has been sampled from by multiple DJs to bring that extra spice to a track.
  • Over 3,000 shows at the DNA Lounge are at the archive, including “Bootie: Gamer Night“, which combines mash-up tracks and video games. Bootie has been playing at DNA Lounge for years, and puts the audio from one song with the singing from another, and… it’s quite addicting, like games. This night was for the nearby Game Developers’ Conference being held the same week.

Software

  • In 2011, as part of a “retrocomputing” competition, we saw the release of “Paku-Paku”, a pac-clone program which ran in an obscure early PC-Compatible graphics mode that was very colorful and very small (160×100) and was built perfectly for it. You can play the game in your browser by clicking here.
  • Psion Chess is a game for the Macintosh that can play both you and itself with pretty high levels of skill and really sharp and crisp black and white graphics.  It makes a really great screensaver in self-playing mode.

People often overuse a phrase like “Barely scratched the surface”, but I assure you there are millions of amazing items in the archive, and it’s been a pleasure to bring some to light. While the 30 Days of Stuff was a fun way to stretch out a month of fundraising with stuff to see every day, we’re here 24/7 to bring you all these items, and welcome you finding jewels, gems and clunkers throughout our hard drives whenever you want.

Thanks for another year!

archive.org download counts of collections of items updates and fixes

Every month, we look over the total download counts for all public items at archive.org.  We sum item counts into their collections.  At year end 2014, we found various source reliability issues, as well as overcounting for “top collections” and many other issues.

archive.org public items tracked over time

archive.org public items tracked over time

To address the problems we did:

  • Rebuilt a new system to use our database (DB) for item download counts, instead of our less reliable (and more prone to “drift”) SOLR search engine (SE).
  • Changed monthly saved data from JSON and PHP serialized flatfiles to new DB table — much easier to use now!
  • Fixed overcounting issues for collections: texts, audio, etree, movies
  • Fixed various overcounting issues related to not unique-ing <collection> and <contributor> tags (more below)
  • Fixes to character encoding issues on <contributor> tags

Bonus points!

  • We now track *all collections*.  Previously, we only tracked items tagged:
    • <mediatype> texts
    • <mediatype> etree
    • <mediatype> audio
    • <mediatype> movies
  • For items we are tracking <contributor> tags (texts items), we now have a “Contributor page” that shows a table of historical data.
  • Graphs are now “responsive” (scale in width based on browser/mobile width)

 

The Overcount Issue for top collection/mediatypes

  • In the below graph, mediatypes and collections are shown horizontally, with a sample “collection hierarchy” today.
  • For each collection/mediatype, we show 1 example item, A B C and D, with a downloads/streams/views count next to it parenthetically.   So these are four items, spanning four collections, that happen to be in a collection hierarchy (a single item can belong to multiple collections at archive.org)
  • The Old Way had a critical flaw — it summed all sub-collection counts — when really it should have just summed all *direct child* sub-collection counts (or gone with our New Way instead)

overcount

So we now treat <mediatype> tags like <collection> tags, in terms of counting, and unique all <collection> tags to avoid items w/ minor nonideal data tags and another kind of overcounting.

 

… and one more update from Feb/1:

We graph the “difference” between absolute downloads counts for the current month minus the prior month, for each month we have data for.  This gives us graphs that show downloads/month over time.  However, values can easily go *negative* with various scenarios (which is *wickedly* confusing to our poor users!)

Here’s that situation:

A collection has a really *hot* item one month, racking up downloads in a given collection.  The next month, a DMCA takedown or otherwise removes the item from being available (and thus counted in the future).  The downloads for that collection can plummet the next month’s run when the counts are summed over public items for that collection again.  So that collection would have a negative (net) downloads count change for this next month!

Here’s our fix:

Use the current month’s collection “item membership” list for current month *and* prior month.  Sum counts for all those items for both months, and make the graphed difference be that difference.  In just about every situation that remains, graphed monthly download counts will be monotonic (nonnegative and increasing or zero).

 

 

Create playlists with CratePlayer

rudolphI find great stuff on the Internet Archive all the time, and now I can use a tool called CratePlayer to create playlists from archive.org movie and audio files.  For example, I want to play a bunch of old Christmas movies at my holiday party this year so I found some cartoons and added them to a Crate.  Now all I have to do is hook my computer up to the TV, press play, and poof!  Instant entertainment!

crateplayerCratePlayer is a curation tool that lets you gather audio and video content from online sources into collections that can be played and shared.  When they approached us about incorporating Internet Archive items into their platform, we said “yes!” and gave them some pointers about accessing archive.org content.  Off they went, and in short order they had it all working.

Try using their bookmarklet as you’re poking around among archive.org audio and video content.  It’s easy to use and might help you keep track of all the great things you find.

NSA TV Clip Library


When the American people find out how their government has secretly interpreted the Patriot Act, they are going to be stunned and they are going to be angry.  Senator Ron Wyden May 26, 2011

Recent revelations of the extent of National Security Agency surveillance and weakening of our digital infrastructure give substance to the warnings of Senator Wyden and others. To assist journalists and other concerned citizens in reflecting on these issues, the Internet Archive has created a curated library of short television news clips presenting key statements and other representations.

NSA-issues TV News Quote Library

The experimental, Chrome and Safari only, library launches today with more than 700 chronologically ordered television citations drawn from the Archive’s television news research service. The TV quotes can be browsed by rolling over clip thumbnails, queried via transcripts and sorted for specific speakers. Citation links, context, links to source broadcasters and options to borrow can be explored by following the More/Borrow links on each thumbnail.

new mp4 (h.264) derivative technique — simpler and easy!

Greetings video geeks!  😎

We’ve updated the process and way we create our .mp4 files that are shown on video pages on archive.org

It’s a much cleaner/clearer process, namely:

  • We opted to ditch ffpreset files in favor of command-line argument 100% equivalents.  It seems a bit easier for someone reading the task log of their item, trying to see what we did.
  • We no longer need qt-faststart step and dropped it.  we use the cmd-line modern ffmpeg “-movflags faststart”
  • Entire processing is now done 100% with ffmpeg, in the standard “2-pass” mode
  • As before, this derivative plays in modern html5 video tag compatible browsers, plays in flash plugin within browsers, and works on all iOS devices.   it also makes sure the “moov atom” is at the front of the file, so browsers can playback before downloading the entire file, etc.)
Here is an example (you would tailor especially the “scale=640:480” depending on source aspect ratio and desired output size;  change or drop altogether the “-r 20” option (the source was 20 fps, so we make the dest 20 fps);  tailor the bitrate args to taste):
  • ffmpeg -y -i stairs.avi -vcodec libx264 -pix_fmt yuv420p -vf yadif,scale=640:480 -profile:v baseline -x264opts cabac=0:bframes=0:ref=1:weightp=0:level=30:bitrate=700:vbv_maxrate=768:vbv_bufsize=1400 -movflags faststart -ac 2 -b:a 128k -ar 44100 -r 20 -threads 2 -map_metadata -1,g:0,g -pass 1 -map 0:0 -map 0:1 -acodec aac -strict experimental stairs.mp4;
  • ffmpeg -y -i stairs.avi -vcodec libx264 -pix_fmt yuv420p -vf yadif,scale=640:480 -profile:v baseline -x264opts cabac=0:bframes=0:ref=1:weightp=0:level=30:bitrate=700:vbv_maxrate=768:vbv_bufsize=1400 -movflags faststart -ac 2 -b:a 128k -ar 44100 -r 20 -threads 2 -map_metadata -1,g:0,g -pass 2 -map 0:0 -map 0:1 -acodec aac -strict experimental -metadata title='”Stairs where i work” – lame test item, bear with us – http://archive.org/details/stairs’ -metadata year=’2004′ -metadata comment=license:’http://creativecommons.org/licenses/publicdomain/’ stairs.mp4;

Happy hacking and creating!

PS: here is the way we compile ffmpeg (we use ubuntu linux, but works on macosx, too).

new video and audio player — video multiple qualities, related videos, and more!

Many of you have already noticed that since the New Year, we have migrated our new “beta” player to be the primary/default player, then to be the only player.

We are excited about this new player!
It features the very latest release of jwplayer from longtailvideo.com.

Here’s some new features/improvements worth mentioning:

  • html5 is now the default — flash is a fallback option.  a final fallback option for most items is a “file download” link from the “click to play” image
  • videos have a nice new “Related Videos” pane that shows at the end of playback
  • should be much more reliable — I had previously hacked up a lot of the JS and flash from the jwplayer release version to accommodate our various wants and looks — now we use mostly the stock player with minimal JS alterations/customizations around the player.
  • better HD video and other quality options — uploaders can now offer multiple video size and bitrate qualities.  If you know how to code web playable (see my next post!) h.264 mp4 videos especially, you can upload different qualities of our source video and the viewer will have to option to pick any of them (see more on that below).
  • more consistent UI and look and feel.  The longtailvideo team *really* cleaned up and improved their UI, giving everything a clean, consistent, and aesthetically pleasing look.  Their default “skin” is also greatly improved, so we can use that now directly too
  • lots of cleaned up performance and more likely to play in more mobile, browsers, and and OS combinations under the hood.

Please give it a try!

-tracey

 

For those of you interested in trying multiple qualities, here’s a sample video showing it:

http://archive.org/details/kittehs

To make that work, I made sure that my original/source file was:

  • h.264 video
  • AAC audio
  • had the “moov atom” at the front (to allow instant playback without waiting to download entire file first) (search web for “qt-faststart” or ffmpeg’s “-movflags faststart” option, or see my next post for how we make our .mp4 here at archive.org)
  • has a > 480P style HD width/height
  • has filename ending with one of:   .HD.mov   .HD.mp4   .HD.mpeg4    .HD.m4v

When all of those are true, our system will automatically take:

  • filename.HD.mov

and create:

  • filename.mp4

that is our normal ~1000 kb/sec “derivative” video, as well as “filename.ogv”

The /details/ page will then see two playable mpeg-4 h.264 videos, and offer them both with the [HD] toggle button (seen once video is playing) allowing users to pick between the two quality levels.

If you wanted to offer a *third* quality, you could do that with another ending like above but with otherwise the same requirements.  So you could upload:

  • filename.HD.mp4       (as, say, a 960 x 540 resolution video)
  • filename.HD.mpeg4   (as, say, a 1920 x 1080 resolution video)

and the toggle would show the three options:   1080P, 540P, 480P

You can update existing items if you like, and re-derive your items, to get multiple qualities present.

Happy hacking!

 

 

 

A/V Geeks fundraiser to digitize 100 miles of film

A/V Geeks has done a lot of digitization of old film for The Internet Archive. They are trying to raise funds to digitize many more hours of footage to put up on archive.org which will be free to view and use by the public. If you would like to contribute here is some information:

http://www.indiegogo.com/avgeeks100miles

WHAT? The A/V Geeks have over 24,000 old 16mm educational films that we’ve rescued from landfills, dumpsters, closets, school libraries. These films cover topics from Atomic Bombs to Zoo Babies and provide an entertaining yet insightful glimpse into our past. We’ve kept these materials from being thrown out and we want to continue our mission by giving them a new life and sharing them you!

Traditionally, this means that we would have to either travel to you or you would have to travel to us to watch a film on a projector. By digitizing these films we can give you and the world access to these materials! When a film from the A/V Geeks archive is digitized and uploaded to the internet, it can be easily accessed, watched, downloaded, researched and repurposed for music videos, class projects, documentaries and more.

Help us digitize 100 Miles of Film! Instead of traveling miles to give access to these films we’d rather digitize miles of 16mm film. Film length is generally measured in feet (in the US) to where 2100 feet of film roughly equals one hour of content. With your support we can digitize and make available 100 miles of film – over 240 hours of (around 1,000 individual titles) from the archive. So we envision this as sort of a road rally. We have to go 100 miles in a short period of time. We’ll have a small team to keep the machines humming along. We aren’t sure what we’ll find with some of the films – we haven’t seen them yet!

For every $500 we raise, we digitize one mile of film (nearly 2.5 hours of material)!

Improved theora/ogg video derivatives!

We’ve made our ogg video derivatives slightly better via:

  • minor bump up to “thusnelda” release
  • “upgrade” from 1-pass video encoding to 2-pass video encoding
  • direct ffmpeg creation of the video (you’ll need to re/compile ffmpeg minimally with “–enable-libtheora –enable-libvorbis” configure flags)

ffmpeg -y -i ‘camels.avi’ -q:vscale 3 -b:v 512k -vcodec libtheora -pix_fmt yuv420p -vf yadif,scale=400:300 -r 20 -threads 2 -map_metadata -1,g:0,g -pass 1 -an -f null /dev/null;

ffmpeg -y -i ‘camels.avi’ -q:vscale 3 -b:v 512k -vcodec libtheora -pix_fmt yuv420p -vf yadif,scale=400:300 -r 20 -threads 2 -map_metadata -1,g:0,g -pass 2 -map 0:0 -map 0:1 -acodec libvorbis -ac 2 -ab 128k -ar 44100 -metadata TITLE=’Camels at a Zoo’ -metadata LICENSE=’http://creativecommons.org/licenses/by-nc/3.0/’ -metadata DATE=’2004′ -metadata ORGANIZATION=’Dumb Bunny Productions’ -metadata LOCATION=http://archive.org/details/camels camels.ogv

some notes:

  • You’d want to adjust the “scale=WIDTH:HEIGHT” accordingly, as well as the “-r FRAMES-PER-SECOND” related args, to your source video.
  • I made a small patch to allow *both* bitrate target *and* quality level for theora in ffmpeg, after comparing the other popular tool “ffmpeg2theora” code with the libtheoraenc.c inside ffmpeg.  It may not be necessary, but I believe I saw *slightly* better quality coming out of theora/thusnelda ogg video.  For what it’s worth, my minor patch is here:  http://archive.org/~tracey/downloads/patches/ffmpeg-theora.patch
  • The way we compile ffmpeg (ubuntu/linux) is here.  (Alt MacOS version here )
  • (Edited post above after I removed this step) It’s *quite* odd, I realize to have ffmpeg transcode both the audio/video together, only to split/demux them back out temporarily.  However, for some videos, the “oggz-comment” step would wipe out the first video keyframe and cause unplayability in chrome (and the expected visual artifacts for things that could play it).   So, we split, comment the audio track, then re-stitch it back together.