The Fourth Generation Petabox

Waiting for your upload or download….

Behind all the cool stuff users see on is some serious hardware. I was curious about the ongoing development of data storage here at Internet Archive. I spent a little time with Mario, Master of the Machines, while he gave me a tour of the newest generation of our staff designed and built Petabox storage units.

Here are some of the specs he gave me for the newest version.
• each has 480 terabytes of raw storage
• each Petabox contains: 240 2-terabyte disks in 4U high rack mounts
• each computer has: 2 – 4 core xeon processors, 12 gigs of RAM each, speed-2 GHz
• each machine has pair of 1Gbit interfaces that are bonded so it’s effectively 2Gbit
• the rack has a switch with uplink of 10Gbit
• Ubuntu OS is stored on a pair of mirrored internal hard drives separate from the data disks
• each has IPMI management interface (allows remote control power cycling and remote console)
• in all there will be a total of 8 units (that’s about 4 million gigabytes).

-Jeff Kaplan

14 thoughts on “The Fourth Generation Petabox

  1. georgie62

    I like to believe that there’s a small inscription in this machine: “In memory of Alan Mathison Turing (1912 – 1954), one of the 20th Century’s greatest intellects.” Moreover, I propose that this work of art, math, and science showcased at University of Manchester in England celebrating A.M. Turing’s 100th birthday on June 23, 2012. Preferably, I would point at the School of Mathematics housed in the Alan Turing Building, Turing worked at The University of Manchester from 1948 to 1954. The demonstration would be a great honor for this organization, serve as reminder to Americans the power of our young intellects, and what a “jobs” program can produce.

  2. Patrick

    Any info on if those newer systems are still designed / built by Capricorn Technologies or if it’s an entirely in-house work? Their website hasn’t been updated since the first or second generation petaboxes came out and there is definite interest in that kind of high density storage.

  3. gb_lucas

    First time at your blog, really nice place to be.

    Hoy much could one of these cost?
    Just in case I want one at my home 🙂

    Saludos from Madrid

  4. Dez Blanchfield

    Why aren’t you using something like the BackBlaze box?

    They now have an open design you can a) build you self or b) just buy for around $7,400 per 4RU 135 TB “node”.

    Based on the physical design on a per rack basis, if you are using a standard 44 RU rack, take the top 4 RU and have 2 x 1RU switches with spacers between them for cable space ( facing forward ), you could have 10 x 4 RU BackBlaze boxes, which would be roughly 10 x 135 TB, or let’s round that out to around 1.3 PB per rack.

    Surely that simplifies your build if you get the 3rd party guys to build and ship you the BackBlaze units prebuilt and configured for $7,400 per unit, or let’s say $74,000 for 10 units ( discounts most likely apply at that sort of volume ), thrown in the cost of a rack, 2 x switches, and some power and ethernet cables and you’ve got a PetaByte for around $80,000 USD.

    Happy to discuss design and architecture if anyone is interested.


    p: +61 414 464 356

    1. brewster

      The backblaze guys have a great design and are wonderful about sharing it. We talked with them when we were planning this generation.

      We decided to go with our current design, which is more expensive, for a couple of reasons. We wanted to go with a more standard case (we bent our own metal the last time) and thought it would give us more flexibility and lower ongoing design costs. Also, buying cases built in the bay area took forever on lead time. The US really has slipped on manufacturing. it turns out one of the disadvantages of the cases we have used are the power supply fans are noisy. since more customers do not care, we think they will be unlikely to fix this flaw.

      Another reason is we wanted the replacable disks. this request came from the system administrators. This has turned out to be very helpful.


  5. Nick

    Awesome! Would love to have one of these racks for my home server 😉

    Thanks to some intrepid finagling and thinking outside the box, I managed to get a hold of quite a few “old” servers from which I Frankenstein’d my own home server which functions as a media storage center, direct-access media center, file server, and game server. It consists of:
    – 4x Intel Xeon E7-8870 2.4Ghz (133Mhz BCLK + 18x Multiplier) 10-Core/20-Thread CPU’s @ 2.99Ghz (166BCLK + 18x Multi)
    – SuperMicro X8QB6-F Motherboard with Integrated LSI 8x SAS RAID Controller w 512MB Cache
    – 128GB (32x4GB) DDR3-1333RE @ DDR3-1666 8-8-8-18
    – Nvidia Tesla K20 6GB GDDR5 Compute Card (Folding@Home)
    – LSI 927x-16i SAS/SATA6Gbps PCI-E 3.0 x8 RAID Card with 4GB DDR3 Cache, CacheCade, and CacheVault (NAND Flash-based Backup instead of Battery-based)
    – Intel 24-Port SAS Expander
    – Intel Dual-10GbEthernet PCI-Express NIC
    – 6x Intel X25-E SLC NAND 240GB Solid State Drives in RAID0 (Intel ICH10R)
    – 8x Western Digital VelociRaptor 1TB HDD’s in RAID0 (Onboard LSI)
    – 12x Hitachi Ultrastar 4TB HDD’s in RAID5 (LSI 927x)
    – 14x Western Digital RE 4TB HDD’s in RAID6 (LSI 927x)
    – 8x Seagate 600GB 15KRPM SAS HDD’s in RAID0

Comments are closed.