Category Archives: Technical

new off-site video/audio embed codes

We are about to rollout a “new new” video/audio player 😎

You can see it in action now with our upcoming embed codes to go with this new player.

It will allow for additional much wanted features like:
– off-site playlists
– fullscreen in many cases
– subtitles/captions

as well as the standard arbitrary width/height and “autoplay” options.

You can see some examples here:

http://www.archive.org/help/video.php

The rest is coming soon (if you are eager, you can even “opt in” now by clicking here:

http://www.archive.org/details/movies&newplayer=jw

(then take a look at one of your favorite items).

Now relax, sit back, and enjoy an archive video!

Cheers!
–tracey

Open Hardware: Inexpensive Enclosures From Junction Boxes.

I had a need for a cheap, standard enclosure for a humidity and temperature monitoring project. While there are many, many options for enclosures out there, few are cheap AND locally available. It occurred to me that electrical junction boxes are widely available, inexpensive, and consistently dimensioned.

So, off to Home Depot I went, wallet and calipers in tow. There were a few attractive junction boxes, each around $1 each:

Raco 1-Gang Drawn Square Box
Model # 8190 Home Depot SKU # 587799

Raco 1-Gang Welded Square Box
Model # 8189 Home Depot SKU # 201863

Carlon 2-Gang 20 cu. in. Switch and Outlet Box
Model # A521DE-CARR Home Depot SKU # 271612

There was even a blue plastic cover!

But, on closer inspection, the cover turned out to be unsuitable. It’s made of PVC, which cannot be cut or marked on the laser. Etching or cutting PVC on the laser forms gaseous hydrochloric acid, which is toxic, corrosive, and voids the warranty on your laser cutter. Don’t cut PVC/Vinyl on the laser if you value your health, safety, and/or warranty. Incidentally, if you are buying a used laser, always look for signs of rust around the optics/cutting area. Rust is a good indicator that the laser was abused in this particular way.

After some iteration on cheap 1/4″ import Baltic Birch plywood…

I came up with this — a simple, Open Hardware cover and liner system for junction boxes. If you have access to a laser cutter, you can now make custom project boxes, suitable for holding Arduino AND a shield, in minutes. It’s as simple as a top plate and a bottom plate – the bottom plate designed to insulate the Arduino or other electronics from the metal box. Of course, as pictured above, you can also use the blue PVC boxes while retaining the laserability of this cover.

Here’s a nice shot showing some of the better features of this setup. First, by knocking out one of the knock-outs on the side, it is possible to feed in ethernet, USB, and sensor cables with room to spare. Second, even with the insulating plate in place, there is enough room for Arduino with a shield and header pins sticking up. Third, the box comes with screws suitable for fixing the cover in place. Pretty slick, and very cheap.

This is Open Hardware.

The Internet Archive is pretty excited about Open Hardware, and most or all of my work here will be released as such. This is release number 1 of many. Here is the artwork. (this link will be updated shortly).

 

 

 

 

 

improved h.264 derivatives!

We have thoroughly tested a newer and simpler way to create h.264 derivatives!

Changes you’ll notice:

  • More pixels!  previously 320 x 240    goes to 640 x 480 pixels
  • Slightly higher video bitrate — from about 512kb/s   to   about  700kb/s bitrate
  • Switching from mp4creator container maker to ffmpeg container + qt-faststart
  • Less back-end commands to make high-quality derivative

Nice things about this derivative (similar to prior derivative):

  • Plays in adobe flash plugin
  • Plays on all versions of iphone and ipad
  • Starts quickly, nearly instant seeking even to unbuffered areas of the video

Here’s a sample of how we do it with just 3 simple commands.  (We do/you should adjust “-r” argument appropriately to your video’s frames-per-second.  We also adjust the “640” in the “-vf scale” argument to be appropriate for the video’s *actual* aspect ratio, etc.  So for example, the 640 might become 852 for 16:9 widescreen video.  Although for our .mp4 specific derivative and playback ability on iPhone (1st gen and thus all versions), we would actually downrez that to 640×360).

ffmpeg -deinterlace -y -i 'camels.avi' -vcodec libx264 -fpre libx264-IA.ffpreset -vf scale=640:480 -r 20 -threads 2 -map_meta_data -1:0 -pass 1 -an tmp.mp4


ffmpeg -deinterlace -y -i 'camels.avi' -vcodec libx264 -fpre libx264-IA.ffpreset -vf scale=640:480 -r 20 -threads 2 -map_meta_data -1:0 -pass 2 -acodec aac -strict experimental -ab 128k -ac 2 -ar 44100 -metadata title='Camels at a Zoo - http://www.archive.org/details/camels' -metadata year='2004' -metadata comment=license:'http://creativecommons.org/licenses/by-nc/3.0/' tmp.mp4

qt-faststart tmp.mp4 'camels.mp4'

our preset file:
http://www.archive.org/~tracey/downloads/libx264-IA.ffpreset

 

For the adventurous out there, you can create this same setup by building ffmpeg on mac, linux, or windows.  Linux is easy, but personally, I’m a mac gal.  So here’s some ffmpeg build tips on the mac.

Happy viewing!

 

How Archive.org items are structured

What is an item?

An item is a logical “thing” that we present on one web page on archive.org. An item may be one video file along with scans of the DVD cover, one book, one audio file, or a set of audio files that represent a CD , etc.

How do you know whether your files should be in one item or separate items?  You get one metadata file per item.  If the same metadata describes ALL of the files (like a CD), then that’s one item.  If the files are too different to have the same metadata (title, creator, description, etc.), they should be in different items.

How Items Are Structured

All archive.org items have this format URL:
http://archive.org/details/[identifier]
(where [identifier] is unique within our system).

Example: For this item
http://www.archive.org/details/popeye_taxi-turvey
the identifier is popeye_taxi-turvey

An item is just a directory or folder of files that includes the originally uploaded content file(s) – audio, video, text, etc. – along with any derivative files we create from the originals and the metadata that describes the item.  To see all files in an item, click the HTTP link in the upper left box on the item page (circled in red below).

That link takes you to a directory listing showing all original, derived, and metadata files for the item.

You can view information about every file in this directory by viewing the file ending in _files.xml (in this example, popeye_taxi-turvey_files.xml). Each file in the item is listed here, along with whether the source is “original” (uploaded by the user), “derivative” (derived by archive.org), or “metadata” file.  You will also find a format designation, various checksums, and sometimes titles for the files.

To see all of the metadata for the item, view the file ending in _meta.xml (in this example, popeye_taxi-turvey_meta.xml). This file should list all of the pertinent information about the item, such as title, creator, description, etc.  IA’s metadata schema is based on Dublin Core, but it is extremely flexible.  You can add any key=value pair to this file and we will store it and make it searchable in the IA search engine.  (However, it may not automatically show up on the item page.)

Reviews, if there are any, are contained in the _reviews.xml file.

One thing to note: Many “display” characteristics on archive.org, among other things, work better if your item’s identifier matches your file name.  So if you’re uploading a file called popeye_taxi-turvey.mpg, it’s best to use the identifier popeye_taxi-turvey (just remove the file extension).  If you’re using the upload button on archive.org, put your desired identifier in the Title field of the upload form.  We turn that into the identifier automatically, and then after upload you can go back into the item and change the title to something more readable.

Archival URLs

An item’s “details” page will always be available at
http://archive.org/details/[identifier]

The item directory is always available at
http://archive.org/download/[identifier]

A particular file can always be downloaded from
http://archive.org/download/[identifier]/[filename]

Please Note: Archival URLs may redirect to an actual server that contains the content.  For example
http://www.archive.org/download/popeye_taxi-turvey
currently redirects to
http://ia600204.us.archive.org/14/items/popeye_taxi-turvey/
DO NOT LINK to any archive.org URL that begins with numbers like this.  This refers to the particular machine that we’re serving the file from right now, but we move items to new servers all the time.  If you link to this sort of URL, instead of the archival URL, your link WILL break at some point.

How Montana State Library Uploaded Batches of Digital Objects to the Internet Archive

by Chris Stockwell for Montana State Library, 12/29/2010

Introduction

The Montana State Library (MSL) last year moved a copy of its collection of 3000 born digital state publications to the Internet Archive (IA). Since MSL will be continuing to upload and integrate born digital publications to the Internet Archive, we encourage constructive comment. Also, MSL would be happy to answer questions about what we did. Contact the Library Information Services division.

It was a natural progression for MSL to upload and integrate its born digital state publications to the Internet Archive. The Internet Archive already is digitizing Montana’s print state publications under contract. After the items are digitized, IA provides public access to them through its free digital library with an MSL logo. IA is officially recognized as a library by California. Also, IA’s Archive-It team archives Montana state agency web sites under contract. Montana State Library considers IA to be its institutional repository for its primary state publications collection…

…Read the entire post on archive.org

Visit the Montana State Library collection

New BookReader!

By mang

We’re pleased to announce the release of our freshly re-designed BookReader on the Internet Archive.

The updated BookReader has these great new features (links will take you to a live example):

  • Redesigned user interface that maximizes the amount of space given to the book. Click the down arrow on the navigation bar to hide the user interface. (The Origin of Species)
  • Navigation bar that helps show your location in the book and navigate through it. Search results and chapter markers (if available) show up on the navigation bar.
  • New Read Aloud feature reads the book as audio in most browsers.  No special software is needed – just click the speaker icon  and go!
  • Tables of contents are being automatically generated for most books and can be edited or added manually through the Open Library site.  The chapter markers appear in the new navigation bar. (Launching Out Into The Deep in Wake of the War Canoe)
  • Vastly improved full-text search.  Search results are shown on the navigation bar and include a snippet of text near the matched search term. (Search results for “hawk” in book of birds)
  • More sharing options – the new Share dialog gives you to option to choose how to link to the book and set options when embedding the BookReader on a blog or website.  As always, you can just copy and paste the address in your browser address bar to get a shareable link to the current page. (Page 65 of Aviation in Canada, 1-page mode)
  • Touch gesture support – swipe to flip pages in two-page mode, pinch to zoom on iOS.
  • Improved support for tablet devices like the iPad.
  • Updated UI for the embedded BookReader – now includes “expando” button to view the book in a new browser window.
  • Integration with Open Library – books that have an Open Library record can have their title and table of contents edited through the Open Library site. The chapter headings on Open Library link directly into the BookReader. (Flatland table of contents on Open Library)

Here’s an embedded book for you to play with.  For any of our publicly accessible books you can embed it on your blog too by getting the embed code from the Share dialog!

Incredible thanks to our fantastic team for making it happen:

  • Raj Kumar – Read Aloud
  • Mike McCabe – table of contents
  • Peter Brantley – BookServer wrangler
  • Edward Betts – full-text search
  • George Oates – new user interface
  • Lance Arthur – markup and CSS
  • Alexis Rossi – QA
  • Jeff Kaplan – QA
  • Michael Ang (yours truly) – Putting It All Together(tm)
  • All of the Archive staff and contributors that make putting the books online possible!

As always, the BookReader remains open source and you can look at our developer documentation for information on reusing it on other sites. We’d like to thank user yankl on github for contributing a patch related to using the BookReader with right-to-left languages.

The Fourth Generation Petabox

Waiting for your upload or download….

Behind all the cool stuff users see on archive.org is some serious hardware. I was curious about the ongoing development of data storage here at Internet Archive. I spent a little time with Mario, Master of the Machines, while he gave me a tour of the newest generation of our staff designed and built Petabox storage units.

Here are some of the specs he gave me for the newest version.
• each has 480 terabytes of raw storage
• each Petabox contains: 240 2-terabyte disks in 4U high rack mounts
• each computer has: 2 – 4 core xeon processors, 12 gigs of RAM each, speed-2 GHz
• each machine has pair of 1Gbit interfaces that are bonded so it’s effectively 2Gbit
• the rack has a switch with uplink of 10Gbit
• Ubuntu OS is stored on a pair of mirrored internal hard drives separate from the data disks
• each has IPMI management interface (allows remote control power cycling and remote console)
• in all there will be a total of 8 units (that’s about 4 million gigabytes).

-Jeff Kaplan

New Support for HTML5 audio tag!

We just rolled out the tag support option for our audio files (which is similar to our tag support that we have had as an option for a bit).

So patrons can now opt to not use our flash plugin for audio file playback with relatively modern browsers (Safari v4+, Firefox v3.5+, Chrome, etc.) that support the new audio/video HTML5 tags.  For such browsers, you can visit an item and then look below where the normal player would be and click “Would you like to try the new audio tag?”  If you prefer this way of listening, we give you the option to set a cookie to make archive audio/video items always use this (non-flash) option.

Enjoy!

-Tracey Jaquith

better mp4 (h.264) derivatives at archive.org!

Late last week, we pushed live a new video deriving technique, as well as in the process updated our audio/video file reader, ffmpeg.

New items will benefit from this newer method, and prior items can be re-derived by users if they desire (probably by the end of the year, we will rederive all our movies automatically).

The video will have significantly less “noise”, a higher PSNR (Peak Signal-to-Noise Ratio), and less”blocking” — all at similar or faster deriving speed to build and the same bitrate and filesize!

example new derivative frame

example new derivative frame

example old derivative frame

example old derivative frame

We now open the source video file up with ffmpeg, resize and convert it to raw video, and pipe it to the most recent build of “x264” tool (opting for baseline profile for iPhone, etc. compatibility).
For the very curious (and the very geeky 😉 here is a how we make our h.264 MPEG4 video files now:

• ffmpeg -i camels.avi -vn -acodec libfaac -ab 64k -ac 2 temp.aac
• ffmpeg -an -deinterlace -i camels.avi -s 320x240 -r 20 -vcodec rawvideo -pix_fmt yuv420p -f rawvideo - 2>/dev/null | ffmpeg -an -f rawvideo -s 320x240 -r 20 -i - -f yuv4mpegpipe - 2>/dev/null | x264 --bitrate 512 --vbv-maxrate 768 --vbv-bufsize 1024 --profile baseline --pass 1 /dev/stdin --demuxer y4m -o temp.h264
• ffmpeg -an -deinterlace -i camels.avi -s 320x240 -r 20 -vcodec rawvideo -pix_fmt yuv420p -f rawvideo - 2>/dev/null | ffmpeg -an -f rawvideo -s 320x240 -r 20 -i - -f yuv4mpegpipe - 2>/dev/null | x264 --bitrate 512 --vbv-maxrate 768 --vbv-bufsize 1024 --profile baseline --pass 2 /dev/stdin --demuxer y4m -o temp.h264
• mp4creator -c temp.h264 -r 20 t2.mp4
• mp4creator -c temp.aac -interleave t2.mp4
• ffmpeg -i t2.mp4 -acodec copy -vcodec copy -metadata title="Camels at a Zoo - http://www.archive.org/details/camels" -metadata year="2004" -metadata comment="license:http://creativecommons.org/licenses/by-nc/3.0/" camels_512kb.mp4
• mp4creator -optimize camels_512kb.mp4

–Tracey Jaquith

Project funded to add features to a Million books

The Center for Intelligent Information Retrieval at UMass Amherst, the Perseus Digital Library Project at Tufts, and the Internet Archive are investigating large-scale information extraction and retrieval technologies for digitized book collections. The NSF has awarded a grant of $2.7 million for a project to apply advanced OCR, topic modeling and metadata extraction techniques to over one million books at the Internet Archive.

Thank you NSF.