Wayback Machine Hits 400,000,000,000!

logo_wayback_210x77The Wayback Machine, a digital archive of the World Wide Web, has reached a landmark with 400 billion webpages indexed.  This makes it possible to surf the web as it looked anytime from late 1996 up until a few hours ago.

Let’s take a trip back in time and visit some sites.
Yahoo (Captured way back in Nov 28, 1996)

Yahoo

Geocities (Captured December 12, 1998)

Geocities

There were even places to start your very own web diary way back in 1999.
Diaryland.com (Captured November 27, 1999)

Diaryland

Mumbleboy was using Flash to push the creative limits of Web Animation (Captured August 1, 2001)

Mumbleboy

Before there was Borat, there was Mahir Cagri.  This site and the track it inspired on mp3.com created quite a stir in the IDM world, with people claiming that “Mahir Cagri” was Turkish for “Effects Twin” and that the whole thing was an elaborate ruse by Richard D. James (Aphex Twin). (Captured December 29, 2004 and December 7, 2000)

MahirCagrimp3com

 

Have you ever wondered what happens when the Wayback Machine archives itself?  Will we fall into a search window of recursion, never to find our way out of the mirror maze again? (Captured October 22, 2008)

Wayback

I guess we don’t want to break our brains.  Oh, well.

The Wayback Machine has had some exciting adventures over the years as it grew. Here are some highlights:

2001 – The Wayback Machine is launched.  Woo hoo.

2006 – Archive-It is launched, allowing libraries that subscribe to the service to create curated collections of valuable web content.

March 25, 2009 – The Internet Archive and Sun Microsystems launch a new datacenter that stores the whole web archive and serves the Wayback Machine.  This 3 Petabyte data center handled 500 requests per second from its home in a shipping container.

June 15th, 2011 – The HTTP Archive becomes part of the Internet Archive, adding data about the performance of websites to our collection of web site content.

May 28, 2012 – The Wayback Machine is available in China again, after being blocked for a few years without notice.

October 26, 2012 – Internet Archive makes 80 terabytes of archived web crawl data from 2011 available for researchers, to explore how others might be able to interact with or learn from this content.

October 2013 – New features for the Wayback Machine are launched, including the ability to see newly crawled content an hour after we get it, a “Save Page” feature so that anyone can archive a page on demand, and an effort to fix broken links on the web starting with WordPress.com and Wikipedia.org.

Also in October 2013 – The Wayback Machine provides access to important Federal Government sites that go dark during the Federal Government Shutdown.

We’re proud of you, Wayback Machine!  You’ve grown so big on a healthy diet of web captures, and you’re growing more every day.

 

21 thoughts on “Wayback Machine Hits 400,000,000,000!

    1. Jack Kinnerly

      The article says that in 2009, it held 3 petabytes (3000 terabytes or 3 million gigabytes).

      The FAQ on the Wayback Machine itself (https://archive.org/about/faqs.php#9) says:

      “The Internet Archive Wayback Machine contains almost 2 petabytes of data and is currently growing at a rate of 20 terabytes per month. This eclipses the amount of text contained in the world’s largest libraries, including the Library of Congress.”

    2. michelle

      We recently surpassed 8.9 petabytes in storage for the Wayback Machine alone, and of course that’s a number that’s constantly growing! That’s 9,332,326.4 gigabytes.

  1. Roberta Westerberg

    If only I had enough time to really get into all this information! All this knowledge is going to dispell a lot if not most of our cherished assumptions and prejudices. What will happen to the human race then??? All the more reason not to have nuclear weapons laying around. All the more reason for war to be just a remembrance in our DNA. It is said that those who don’t their history are bound to repeat it. Well, there is NO EXCUSE NOW!

  2. jwjb

    Thanks so much for all that you do and it would be great to see an amazing infographic encompassing these last eighteen years to visually mark this impressive and significant milestone.

  3. Tzvi Katzburg

    Thanks and thanks again for your amazing efforts.
    The logo to my site is based upon the old logo of my first blog ever .
    I didn’t believe when I found it in the time machine, but i did.
    Your work is one of the most important things in this modern age.
    Thank you!

  4. LegendSome

    After 2000 years people of future thinks this “internet” was our Holy book and everything in internet was real. -Bible

  5. Gabrielle

    You really make it seem so easy with your presentation but I
    find this topic to be really something which I think I would never
    understand. It seems too complex and very broad for me.
    I am looking forward for your next post, I’ll try to get
    the hang of it!

  6. PsychedBe

    All I know is… though a very nice way to archive the net, it is a pain to get something removed from the internets…

    (especially for websites gone rogue due to several take-overs. thinking geocities here, geocities to reocities to …)

    Other than that, great way to follow up on the evolution of design in general 😉

Comments are closed.