Category Archives: Wayback Machine – Web Archive

Military Industrial Powerpoint Complex Karaoke! — Tuesday, March 6

The Internet Archive presents the first ever Military Powerpoint Karaoke: a night of “Powerpoint Karaoke” using presentations in the Military Industrial Powerpoint Complex collection at archive.org that were extracted by the Internet Archive from its public web archive and converted into a special collection of PDFs/epubs. The event will take place on Tuesday, March 6th at 7:30pm at our headquarters in San Francisco. The show will be preceded by a reception at 6:30 pm, when doors will also open.

Get Free Tickets Here

Also known as “Battle Decks,” Powerpoint Karaoke is an improvisational and art event where audience members give a presentation using a set of Powerpoint slides that they’ve never seen before. There are three rules: 1) The presenter cannot see the slides before presenting; 2) The presenter delivers each slide in succession without skipping slides or going back; and 3) The presentation ends when all slides are presented, or after 5 minutes (whichever comes first). We’re thrilled to have Rick Prelinger, creator of Lost Landscapes and Prelinger Archive, and Avery Trufelman of 99% Invisible, joining us to deliver headlining Powerpoint decks. The rest of the presentations will be delivered by you — the audience members who sign up.

This event will use, as its source material, a curated collection of the Internet Archive’s Military Industrial Powerpoint Complex, a special project alongside GifCities that was originally created for the Internet Archive’s 20th Anniversary in October 2016. For the project, IA staff extracted all the Powerpoint files from its archive of the government’s public .mil web domain. The collection was expanded in early 2017 to include materials collected during the End of Term project, which archived a snapshot of the .gov and .mil web domains during the administration change. The Military Industrial Powerpoint Complex collection contains over 57,000 Powerpoint decks, each charged with material that ranges from the violent to the banal, featuring attack modes, leadership styles, harness types, and modes for requesting vacation days from the US Military. The project was originally inspired by writer Paul Ford’s article, “Amazing Military Infographics” which can be found in the Wayback Machine. As a whole, this collection forms a unique snapshot into our government’s Military Industrial Complex.

This event is organized by artists/archivists Liat Berdugo and Charlie Macquarie in partnership with the Internet Archive.

Tuesday, March 6
6:30 pm Reception
7:30 pm Program

Internet Archive
300 Funston Avenue
San Francisco, CA 94118

Get Free Tickets Here

30 Days of Stuff

Jason Scott, free-range archivist, reporting in as 2017 draws to a close.

As part of our end-of-year fundraising drive, I thought it might be fun to tweet highlighted parts of the vast stacks of content that the Internet Archive makes available for free to millions. A lot of folks know about our Wayback Machine and its 20+ years of website history, but there’s petabytes of media and works available to see throughout the site. I called it “30 Days of Stuff”, and for the last 30 days I’ve been pointing out great items at the Archive, once a day.

You won’t have to swim upstream through my tweets; here on the last day, I’ve compiled the highlighted items in this entry. Enjoy these jewels in the Archive’s collection, a small sample of the wide range of items we provide.

Books and Texts

  • The Latch Key of my Bookhouse was one of the first books scanned by the Internet Archive in its book scanner tests, and it’s a 1921 directory of Children’s Literature that is filled with really nice illustrations that came out great.
  • As part of our ever-growing set of Defense Technical Information Center collection, we have The Role of the Citizens Band Radio Service and Travelers Information Stations In Civil Preparedness Emergencies Final Report, a 1978 overview of CB Radio and what role it might play in civil emergencies. Many thousands of taxpayer-funded educational and defense items are mirrored in this collection.
  • Also in the DTIC collection is The Battalion Commander’s Handbook 1980, which besides the crazy front page of stamps, approvals and sign-offs, is basically a manager’s handbook written from the point of view of the US Army.
  • There are hundreds of tractor manuals at the Archive. Hundreds! Of all types, languages (a lot of them Russian) and level of information. Tractors are one of those tools that can last generations and keeping the maintenance on them in the field can make a huge difference in livelihood.
  • A lovely 1904 catalog for plums called The Maynard Plum Catalogue was scanned in with one of our partner organizations and it’s a breathless and inspiring declaration of the future wonder of the plums this wizard of plum-growing, Luther Burbank, was bringing to the world.
  • Xerox Corporation released “A Metamorphosis of Creative Copying” in 1964, which seems to function as both promotion for Xerox and a weird gift to give to your kids to color in.
  • In 2014, a short zine called The Tao of Bitcoin was released, telling people the dream of $10,000 bitcoin would be real.
  • The 1888 chapbook Goody Two-Shoes has lovely illustrations, and a fine short story.
  • Working with a lovely couple who brought in a 1942 black-owned-businesses directory, I scanned the pages by hand and put them up into this item.
  • Inside that directory was an ad for a school of whistling that said it taught using the methods of Agnes Woodward, and a quick scan of the Archive’s stacks showed that we had an entire copy of her book Whistling as an Art!
  • The medical treatise Sleep and Its Derangements, from 1869, is William A. Hammond, MD’s overview of sleep, and what can go wrong. Scanned from the Francis A. Countway Library of Medicine, it’s one of many thousands of books we’ve scanned with partners.
  • Let Hartman Feather Your Nest could be described as “A furniture catalog” in the same way the Sistine Chapel could be described as “a place of worship”. The catalog is a thundering, fist-pounding declaration of the superiority of the Hartman enterprise and the quality and breadth of furniture and service that will arrive at your door and be backed up to the far reaches of time.

Magazines

  • Photoplay considered itself the magazine for the motion picture industry in the first part of the 20th century, and this multi-volume compilation of photos, articles and advertisements is a truly lovely overview.
  • There’s over 140 issues of the classic Maximum RockNRoll zine, truly the king of music zines for a very long time. On its newsprint pages are howls and screeches of all manner of punk, rock and the needs of musicians.
  • A magazine created by the Walt Disney Company to trumpet various parts of Disneyland and its attractions was called Vacationland, and this Fall 1965 issue covers all sorts of stuff about the park’s first decade.

Movies

  • Rescued from a warehouse years ago, a collection of Hollywood movie “B-Roll”, unused secondary scenes often filmed by different crew, has been digitized. My personal favorite is [Western Film Scenes], which is circa 1950s footage of a Western Town, all of it utterly fake but feeling weirdly real, to be used in a western. Don’t miss everyone standing around looking right at you and looking like they agree quite energetically with you!
  • No compilation could be complete without the legendary Duck and Cover, a cartoon/PSA that explained the simple ways to avoid injury in a nuclear blast. Just lie down! It’ll be fine. Please note: This Probably Won’t Work. But the song is very catchy.
  • The very weird Electric Film Format Acid Test from 1990 has a semi-interested model holding up a color bar plate in a wide, wide variety of film and video formats. Filmed just a few blocks away from the Internet Archive’s current headquarters.
  • I snuck in a 1992 interview with the Archive’s founder, Brewster Kahle, back when he was 33 and working at WAIS, a company or two before the Archive and where he is asked about his thoughts on information and gathering of data. It’s quite interesting to hear the consistency of thought.
  • The Office of War Information worked with Disney to create “Dental Health“, a film to show to troops about proper dental care. It’s a combination of straightforward animation and industrial film-making worth enjoying.

Audio

  • We have a collection of hours of the radio show The Shadow from 1938-1939, starring  Orson Welles at 23, at the height of his performance powers, playing the dual main role.
  • For Christmas Eve, we pointed to “Christmas Chopsticks”, a 1953 78rpm record of “Twas the Night Before Christmas” performed to the tune of the classic piano piece “Chopsticks”; one of tens of thousands of 78rpm records the Archive has been adding this year.
  • On Christmas, a user of the Archive uploaded two obscure albums he’d purchased on eBay – remnants of the S. S. Kresge Company, which became K-Mart, and which were played over the PA system for shoppers. He got his hands on Albums #261 and #294.
  • Earlier in the month before the user uploaded those Christmas albums, I linked to a different holiday collection of K-Mart items, a 1974 Reel-to-Reel that started with a K-Mart jingle and went full holiday from there.
  • Before he was a (retired) talk show host, and before he was a stand-up comedian, David Letterman worked and trained in radio. Happily, we have recordings of Dave Letterman, DJ, from when he was 22, at Ball State University.
  • Ron “Boogiemonster” Gerber has been hosting his weekly pop music recycling radio show, “Crap from the Past”, for over 25 years, and he’s been uploading and cataloging his show to the Archive for well over 10 of those years, including all the way back to the beginning of his show. The full Crap From The Past archive is up and is hundreds of hours of fun.
  • The truly weird “Conquer the Video Craze” is a 1982 record album with straightforward descriptions of how to beat games like Centipede, Defender, Stargate, Dig Dug, and more. This album has been sampled from by multiple DJs to bring that extra spice to a track.
  • Over 3,000 shows at the DNA Lounge are at the archive, including “Bootie: Gamer Night“, which combines mash-up tracks and video games. Bootie has been playing at DNA Lounge for years, and puts the audio from one song with the singing from another, and… it’s quite addicting, like games. This night was for the nearby Game Developers’ Conference being held the same week.

Software

  • In 2011, as part of a “retrocomputing” competition, we saw the release of “Paku-Paku”, a pac-clone program which ran in an obscure early PC-Compatible graphics mode that was very colorful and very small (160×100) and was built perfectly for it. You can play the game in your browser by clicking here.
  • Psion Chess is a game for the Macintosh that can play both you and itself with pretty high levels of skill and really sharp and crisp black and white graphics.  It makes a really great screensaver in self-playing mode.

People often overuse a phrase like “Barely scratched the surface”, but I assure you there are millions of amazing items in the archive, and it’s been a pleasure to bring some to light. While the 30 Days of Stuff was a fun way to stretch out a month of fundraising with stuff to see every day, we’re here 24/7 to bring you all these items, and welcome you finding jewels, gems and clunkers throughout our hard drives whenever you want.

Thanks for another year!

The 20th Century Time Machine

by Nancy Watzman & Katie Dahl

Jason Scott

With the turn of a dial, some flashing lights, and the requisite puff of fog, emcees Tracey Jaquith, TV Architect, and Jason Scott, Free Range Archivist, cranked up the Internet Archive 20th Century Time Machine on stage before a packed house at the Internet Archive’s annual party on October 11.

Eureka! The cardboard contraption worked! The year was 1912, and out stepped Alexis Rossi, director of Media and Access, her hat adorned with a 78rpm record.

1912

D’Anna Alexander (center) with her mother (right) and grandmother (left).

“Close your eyes and listen,” Rossi asked the audience. And then, out of the speakers floated the scratchy sounds of Billy Murray singing “Low Bridge, Everybody Down” written by Thomas S. Allen. From 1898 to the 1950s, some three million recordings of about three minutes each were made on 78rpm discs. But these discs are now brittle, the music stored on them precious. The Internet Archive is working with partners on the Great 78 Project to store these recordings digitally, so that we and future generations can enjoy them and reflect on our music history. New collections include the Tina Argumedo and Lucrecia Hug 78rpm Collection of dance music collected in Argentina in the mid-1930s.

1927

Next to emerge from the Time Machine was David Leonard, president of the Boston Public Library, which was the first free, municipal library founded in the United States. The mission was and remains bold: make knowledge available to everyone. Knowledge shouldn’t be hidden behind paywalls, restricted to the wealthy but rather should operate under the principle of open access as public good, he explained. Leonard announced that the Boston Public Library would join the Internet Archive’s Great 78 Project, by authorizing the transfer of 200,000 individual 78s and LPs to preserve and make accessible to the public, “a collection that otherwise would remain in storage unavailable to anyone.”

David Leonard and Brewster Kahle

Brewster Kahle, founder and Digital Librarian of the Internet Archive, then came through the time machine to present the Internet Archive Hero Award to Leonard. “I am inspired every time I go through the doors,” said Kahle of the library, noting that the Boston Public Library was the first to digitize not just a presidential library, of John Quincy Adams, but also modern books.  Leonard was presented with a tablet imprinted with the Boston Public Library homepage by Internet Archive 2017 Artist in Residence, Jeremiah Jenkins.

1942

Kahle then set the Time Machine to 1942 to explain another new Internet Archive initiative: liberating books published between 1923 to 1941. Working with Elizabeth Townsend Gard, a copyright scholar at Tulane University, the Internet Archive is liberating these books under a little known, and perhaps never used, provision of US copyright law, Section 108h, which allows libraries to scan and make available materials published 1923 to 1941 if they are not being actively sold. The name of the new collection: the Sony Bono Memorial Collection, named for the now deceased congressman and former representative who led the passage of the Copyright Term Extension Act of 1998, which included the 108h provision as a “gift” to libraries.

One of these books includes “Your Life,” a tome written by Kahle’s grandfather, Douglas E. Lurton, a “guide to a desirable living.” “I have one copy of this book and two sons. According to the law, I can’t make one copy and give it to the other son. But now it’s available,” Kahle explained.

1944

Sab Masada

The Time Machine cranked to 1944, out came Rick Prelinger, Internet Archive Board member, archivist, and filmmaker. Prelinger introduced a new addition to the Internet Archive’s film collection: long-forgotten footage of an Arkansas Japanese internment camp from 1944.  As the film played on the screen, Prelinger welcomed Sab Masada, 87, who lived at this very camp as a 12-year-old.

Masada talked about his experience at the camp and why it is important for people today to remember it. “Since the election I’ve heard echoes of what I heard in 1942,” Masada said. “Using fear of terrorism to target the Muslims and people south of the border.”

1972

Next to speak was Wendy Hanamura, the director of partnerships. Hanamura explained how as a sixth grader she discovered a book at the library, Executive Order 9066, published in 1972, which chronicled photos of Japanese internment camps during World War II.

“Before I was an internet archivist, I was a daughter and granddaughter of American citizens who were locked up behind barbed wire in the same kind of camps that incarcerated Sab,” said Hanamura. That one book – now out of print – helped her understand what had happened to her family.

Inspired by making it to the semi-final round of the MacArthur 100&Change initiative with a proposal that provides libraries and learners with free digital access to four million books, the Internet Archive is forging ahead with plans, despite not winning the $100 million grant. Among the books the Internet Archive is making available: Executive Order 9066.

1985

The year display turned to 1985, Jason Scott reappeared on stage, explaining his role as a software curator. New this year to the Internet Archive are collections of early Apple software, he explained, with browser emulation allowing the user to experience just what it was like to fire up a Macintosh computer back in its hay day. This includes a collection of the then wildly popular “HyperCards,” a programmatic tool that enabled users to create programs that linked materials in creative ways, before the rise of the world wide web.

1997

After Vinay Goelthis tour through the 20th century, the Time Machine was set to 1997. Mark Graham, Director of the Wayback Machine and Vinay Goel, Senior Data Engineer, stepped on stage. Back in 1997, when the Wayback Machine began archiving websites on the still new World Wide Web, the entire thing amounted to 2.2 terabytes of data. Now the Wayback Machine contains 20 petabytes. Graham explained how the Wayback Machine is preserving tweets, government websites, and other materials that could otherwise vanish. One example: this report from The Rachel Maddow Show, which aired on December 16, 2016, about Michael Flynn, then slated to become National Security Advisor. Flynn deleted a tweet he had made linking to a falsified story about Hillary Clinton, but the Internet Archive saved it through the Wayback Machine.

Goel took the microphone to announce new improvements to Wayback Machine Search 2.0. Now it’s possible to search for keywords, such as “climate change,” and find not just web pages from a particular time period mentioning these words, but also different format types — such as images, pdfs, or yes, even an old Internet Archive favorite, animated gifs from the now-defunct GeoCities–including snow globes!

Thanks to all who came out to celebrate with the Internet Archive staff and volunteers, or watched online. Please join our efforts to provide Universal Access to All Knowledge, whatever century it is from.

Editor’s Note, 10/16/17: Watch the full event https://archive.org/details/youtube-j1eYfT1r0Tc  

 

Internet Archive to help First Draft News debunk fake news

We are delighted to announce a new partnership with First Draft News, a nonpartisan organization dedicated to ferreting out misinformation online.

In its short existence–it was founded in June 2015–First Draft News has already spearheaded innovative projects that bring together news organizations, social technology companies, and human rights organizations to verify the information that flows to online audiences. First Draft also helps define the problem: in February, Claire Wardle, the group’s research director, published a helpful taxonomy of the different types of fake news and misinformation that proliferate online.

Example: with French elections fast approaching on April 23, 2017, First Draft News launched CrossCheck, a project combining the efforts of more than 37 newsroom partners, as well as journalism students across France and beyond. They’ve been working together to debunk false rumors and news reports in a much-watched contest pitting the far-right National Front leader Marine Le Pen against centrist Emmanuel Macron, defender of the European Union, as well as other candidates.

This partnership has quashed reports that 30 percent of Macron’s campaign funding comes from Saudi Arabia, that France is spending 100 million euros to buy hotels to house immigrants, and that the country is planning to replace Christian public holidays with Muslim and Jewish holidays, plus many more. These false stories had been shared thousands of times on social media.

When the elections are over, First Draft News will research whether CrossCheck’s efforts were effective, or how they may be modified to become more so. “CrossCheck is a living laboratory,” says Aimee Rinehart, manager of First Draft’s Partner Network. Wardle will lead the efforts to determine whether the CrossCheck model, where several news organizations sign off on a fact-check or verification, builds public trust in the media, an increasing problem worldwide.

Already, First Draft News partners rely heavily on the Internet Archive’s Wayback Machine to verify information online. With our new collaboration, we hope to increase use of other Internet Archive resources, including our searchable collection of TV news and curated archives such as the Trump Archive, with its linked fact-checks by national fact checking organizations. We also hope the collaboration provides valuable input for our plans to apply more tools of machine learning to the TV News Archive that could help inform reliable news reporting in the future.

If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine

In recent days many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.

There are several ways to save pages and whole sites so that they appear in the Wayback Machine.  Here are 6 of them.

1. Save Page Now

Put a URL into the form, press the button, and we save the page.  You will instantly have a permanent URL for your page.

save page now

At the moment, there are a few exceptions for this method – some sites prohibit crawling, a few have SSL (security) settings that make it break – but this method will work for most pages.  The feature saves the page you enter including the images and CSS.  It does not save any of the outlinks, and can’t be used to initiate a crawl of an entire web site. We do not keep your IP address, so your submission is anonymous.

2. Chrome extension

Install the Wayback Machine Chrome extension in your browser.  Go to a page you want to archive, click the icon in your toolbar, and select Save Page Now. We will save the page and give you a permanent URL.

Chrome extension allows save page now

The same provisos from “Save Page Now” apply – there are some pages where it won’t work, and it only saves one page at a time.  One plus to installing the extension though is that now as you surf around, when you run into a missing page we will alert you if we have a saved copy.

We also have a “Wayback Machine” Firefox add-on

A “Wayback Machine” Safari Extension

A “Wayback Machine” iOS app

And a “Wayback Machine” Android app

3. Wikipedia JavaScript Bookmarklet

Nobody loves a primary source more than a Wikipedia editor.  To that end, they offer a Wayback Machine JavaScript Bookmarklet that allows you to quickly save a web page from any browser.

wikipedia wayback bookmarklet

4. Volunteer for Archive Team

Archive Team is an entirely volunteer driven group who are interested in saving Internet history.  Many of the sites and pages they save end up in the Wayback Machine.  Visit the Archive Team site to learn more about how to volunteer with them.

Archive Team

5. Sign up for an Archive-It Account

Archive-It is a subscription service provided by Internet Archive that allows you to run your own crawling projects without any technical expertise.  Tell us what to crawl and how often to crawl it, and we execute the crawl and put the results in the Wayback Machine.

Archive-It

Archive-It is a paid subscription service with technical and web archivist support. This option is most appropriate for organizations that have a mandate to save certain types or categories of web content on a regular basis. If your institution is a current Archive-It partner, contact them for how you can contribute.

6. End of Term Archive

Every time the US government administration changes, Internet Archive works with partners to make a copy of government-related sites and web presences.  We call it the End of Term Archive.  You can help us discover new government sites by using the Nomination Tool to suggest pages or sites.  These nominations are added to the crawl and end up in the Wayback Machine.

End of term archive nomination tool

 

The Internet Archive has been saving web pages for 20 years.  This archive has been built by thousands of people, and we would like you to help.  Use one of the methods above to make sure we have the pages you care about.

 

Robots.txt Files and Archiving .gov and .mil Websites


The Internet Archive is
collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. Some have asked if we ignore URL exclusions expressed in robots.txt files.

The answer is a bit complicated.  Historically, sometimes yes and sometimes no; but going forward the answer is “even less so.”

mollymonsterRobots.txt files live on the top level of a website at a url like this: https://example.com/robots.txt. This standard was developed in 1994 to guide search engine crawlers in a variety of ways, including some areas to avoid crawling.   This standard is used by Google, for instance.

These files were useful 20 years ago for the Internet Archive’s crawlers, but have become less and less so over the years because many sites have not actively maintained the files from the point of view of archiving. Also, large websites or hosted websites often do not make it easy for their users to edit these files, and large websites increasingly guide or block crawlers with technological measures. Another problem is knowing when a domain name changes hands, so a current robots.txt file is not relevant to a different era. As time has gone on, for those who want to exclude their sites we encourage webmasters to send exclusion requests to info@archive.org and encourage them to specify what time period they apply to.

Our end-of-term crawls of .gov and .mil websites in 2008, 2012, and 2016 have ignored exclusion directives in robots.txt in order to get more complete snapshots. Other crawls done by the Internet Archive and other entities have had different policies.  We have had little or no negative feedback on this, and little or no positive feedback — in fact little feedback at all. The Wayback Machine has also been replaying the captured .gov and .mil webpages for some time in the beta wayback, regardless of robots.txt.   

Overall, we hope to capture government and military websites well, and hope to keep this valuable information available to users in the future.

Please: Help Build the 2016 U.S. Presidential Election Web Archive

seal_of_the_president_of_the_united_states-svgHelp us build a web archive documenting reactions to the 2016 Presidential Election. You can submit websites and other online materials, and provide relevant descriptive information, via this simple submission form. We will archive and provide ongoing access to these materials as part of the Internet Archive Global Events collection.

Since its beginning, the Internet Archive has worked with a global partner community of cultural heritage institutions, researchers and scholars, and citizens to build crowdsourced topical web archives that preserve primary sources documenting significant global events. Past collections include the Occupy Movement, the 2013 US Government Shutdown, the Jasmine Revolution in Tunisia, and the Charlie Hebdo attacks. These collections leverage the power of individual curators and motivated citizens to help expand our collective efforts to diversity and augment the historical record. Any webpages, sites, or other online resources about the 2016 Presidential Election are in scope. This web archive will build upon our affiliated efforts, such as the Political TV Ad Archive, and other collecting strategies, to provide permanent access to current political events.

As we noted in a recent blog post, the Internet Archive is “well positioned, with our mission of Universal Access to All Knowledge, to help inform the public in turbulent times, to demonstrate the power in sharing and openness.” You can help us in this mission by submitting websites that preserve the online record of this unique historical moment.

I CAN HAZ MEME HISTORY??


Jason Scott presents Internet Memes of the last 20 Years at the Internet Archive’s 20th anniversary celebration.

——–

It’s always going to be an open question as to what parts of culture will survive beyond each generation, but there’s very little doubt that one of them is going to be memes.

Memes are, after all, their own successful transmission of entertainment. A photo, an image that you might have seen before, comes to you with a new context. A turn of phrase, used by a politician or celebrity and in some way ridiculous or unique, comes back you in all sorts of new ways (Imma let you finish) and ultimately gets put back into your emails, instant messages, or even back into mass media itself.

However, there are some pretty obvious questions as to what memes even are or what qualifies as a meme. Everyone has an opinion (and a meme) to back up their position.leo2

One can say that image macros, those combinations of an expressive image with big bold text, are memes; but it’s best to think of them as one (very prominent) kind of a whole spectrum of Meme.

Image Macros rule the roost because they’re platform independent. They slip into our lives from e-mails, texts, websites and even posted on walls and doors. The chosen image (in this example, from the Baz Luhrman directed Great Gatsby) portrays an independent idea (Here’s to you) and the text compliments or contrasts it. The smallest, atomic level of an idea. And it gets into your mind, like a piece of candy (or a piece of grit).

photofunia-1475750857It can get way more complicated, however. This 1980s “Internet Archive” logo was automatically generated by an online script which does the hard work of layout, fonts and blending for you. When news of this tool broke in September of 2016 (it had been around a long time before that), this exact template showed up everywhere, from nightclub flyers to endless tweets. Within a short time, the ideas of both “using a computer to do art” and “the 1980s” became part of the payload of this image, as well as the inevitable feeling it was even more cliche and tired as hundreds piled on to using it. The long-term prospects of this “1980s art” meme are unknown.

223798 And let’s not forget that “memes” (a term coined by Richard Dawkins in his 1976 book The Selfish Gene) themselves go back decades before the internet made its first carefully engineered cross-continental connections. Office photocopies ran rampant with passed along motivational (or de-motivational) posters, telling you that you didn’t need to be crazy to work here… but it helps! Suffering the pains of analog transfer, the endless remixing and hand touchups of these posters gave them a weathered look, as if aged by their very (relative) longevity. To many others, this whole grandparent of the internet meme had a more familiar name: Folklore.

Memes are therefore rich in history and a fundamental part of the online experience, passed along by the thousands every single day as a part of communicating with each other. They deserve study, and they’ve gotten it.

Websites have been created to describe both the contributing factors and the available examples of memes throughout the years. The most prominent has been Know Your Meme, which through several rounds of ownership and contributors has consistently provided access to the surprisingly deep dive of research a supposedly shallow “meme” has behind it.

meme-gapBut the very fluidity and flexibility of memes can be a huge weakness — a single webpage or a single version of an image will be the main reference point for knowing why a meme came to be, and the lifespan of these references are short indeed. Even when hosted at prominent hosting sites or as part of a larger established site, one good housecleaning or consolidation will shut off access to the information, possibly forever.

This is where the Internet Archive comes in. With our hundreds of billions of saved URLs from 20 years stored in the Wayback Machine, a neutral storehouse of not just the inspirations for memes but examples of the memes themselves are kept safe for retrieval beyond the fleeting fads and whims of the present.
58145293

The metaphor of “the web” turns out to be more and more apt as time goes on — like spider webs, they’re both surprisingly strong, but also can be unexpectedly lost in an instant. Connections that seemed immutable and everlasting will drop off the face of the earth at the drop of a hat (or a server, or an unpaid hosting bill).

Memes are, as I said, compressed culture. And when you lose culture, you lose context and meaning to the words and thoughts that came before. The Wayback machine will be a part of ensuring they stick around for a long time to come.

Defining Web pages, Web sites and Web captures

blog-thoughtbubble
The Internet Archive has been archiving the web for
20 years and has preserved billions of webpages from millions of websites. These webpages are often made up of, and link to, many images, videos, style sheets, scripts and other web objects. Over the years, the Archive has saved over 510 billion such time-stamped web objects, which we term web captures.

We define a webpage as a valid web capture that is an HTML document, a plain text document, or a PDF.

A domain on the web is an owned section of the internet namespace, such as google.com or archive.org or bbc.co.uk. A host on the web is identified by a fully qualified domain name or FQDN that specifies its exact location in the tree hierarchy of the Domain Name System. The FQDN consists of the following parts: hostname and domain name.  As an example, in case of the host blog.archive.org, its hostname is blog and the host is located within the domain archive.org.

We define a website to be a host that has served webpages and has at least one incoming link from a webpage belonging to a different domain.

As of today, the Internet Archive officially holds 273 billion webpages from over 361 million websites, taking up 15 petabytes of storage.

The Hidden Shifting Lens of Browsers

browsersoup

Some time ago, I wrote about the interesting situation we had with emulation and Version 51 of the Chrome browser – that is, our emulations stopped working in a very strange way and many people came to the Archive’s inboxes asking what had broken. The resulting fix took a lot of effort and collaboration with groups and volunteers to track down, but it was successful and ever since, every version of Chrome has worked as expected.

But besides the interesting situation with this bug (it actually made us perfectly emulate a broken machine!), it also brought into a very sharp focus the hidden, fundamental aspect of Browsers that can easily be forgotten: Each browser is an opinion, a lens of design and construction that allows its user a very specific facet of how to address the Internet and the Web. And these lenses are something that can shift and turn on a dime, and change the nature of this online world in doing so.

An eternal debate rages on what the Web is “for” and how the Internet should function in providing information and connectivity. For the now-quite-embedded millions of users around the world who have only known a world with this Internet and WWW-provided landscape, the nature of existence centers around the interconnected world we have, and the browsers that we use to communicate with it.

Netscape1-about-1024x656

Avoiding too much of a history lesson at this point, let’s instead just say that when Browsers entered the landscape of computer usage in a big way after being one of several resource-intensive experimental programs. In circa 1995, the effect on computing experience and acceptance was unparalleled since the plastic-and-dreams home computer revolution of the 1980s. Suddenly, in one program came basically all the functions of what a computer might possibly do for an end user, all of it linked and described and seemingly infinite. The more technically-oriented among us can point out the gaps in the dream and the real-world efforts behind the scenes to make things do what they promised, of course. But the fundamental message was: Get a Browser, Get the Universe. Throughout the late 1990s, access came in the form of mailed CD-ROMs, or built-in packaging, or Internet Service Providers sending along the details on how to get your machine connected, and get that browser up and running.

As I’ve hinted at, though, this shellac of a browser interface was the rectangular window to a very deep, almost Brazillike series of ad-hoc infrastructure, clumsily-cobbled standards and almost-standards, and ever-shifting priorities in what this whole “WWW” experience could even possibly be. It’s absolutely great, but it’s also been absolutely arbitrary.

With web anniversaries aplenty now coming into the news, it’ll be very easy to forget how utterly arbitrary a lot of what we think the “Web” is, happens to be.

image444
There’s no question that commercial interests have driven a lot of browser features – the ability to transact financially, to ensure the prices or offers you are being shown, are of primary interest to vendors. Encryption, password protection, multi-factor authentication and so on are sometimes given lip service for private communications, but they’ve historically been presented for the store to ensure the cash register works. From the early days of a small padlock icon being shown locked or unlocked to indicate “safe”, to official “badges” or “certifications” being part of a webpage, the browsers have frequently shifted their character to promise commercial continuity. (The addition of “black box” code to browsers to satisfy the ability to stream entertainment is a subject for another time.)

Flowing from this same thinking has been the overriding need for design control, where the visual or interactive aspects of webpages are the same for everyone, no matter what browser they happen to be using. Since this was fundamentally impossible in the early days (different browsers have different “looks” no matter what), the solutions became more and more involved:

  • Use very large image-based mapping to control every visual aspect
  • Add a variety of specific binary “plugins” or “runtimes” by third parties
  • Insist on adoption of a number of extra-web standards to control the look/action
  • Demand all users use the same browser to access the site

Evidence of all these methods pop up across the years, with variant success.

Some of the more well-adopted methods include the Flash runtime for visuals and interactivity, and the use of Java plugins for running programs within the confines of the browser’s rectangle. Others, such as the wide use of Rich Text Format (.RTF) for reading documents, or the Realaudio/video plugins, gained followers or critics along the way, and were ultimately faded into obscurity.

And as for demanding all users use the same browser… well, that still happens, but not with the same panache as the old Netscape Now! buttons.

ie-rip

This puts the Internet Archive into a very interesting position.

With 20 years of the World Wide Web saved in the Wayback machine, and URLs by the billions, we’ve seen the moving targets move, and how fast they move. Where a site previously might be a simple set of documents and instructions that could be arranged however one might like, there are a whole family of sites with much more complicated inner workings than will be captured by any external party, in the same way you would capture a museum by photographing its paintings through a window from the courtyard.  

When you visit the Wayback and pull up that old site and find things look differently, or are rendered oddly, that’s a lot of what’s going on: weird internal requirements, experimental programming, or tricks and traps that only worked in one brand of browser and one version of that browser from 1998. The lens shifted; the mirror has cracked since then.

NextEditorBW
This is a lot of philosophy and stray thoughts, but what am I bringing this up for?

The browsers that we use today, the Firefoxes and the Chromes and the Edges and the Braves and the mobile white-label affairs, are ever-shifting in their own right, more than ever before, and should be recognized as such.

It was inevitable that constant-update paradigms would become dominant on the Web: you start a program and it does something and suddenly you’re using version 54.01 instead of version 53.85. If you’re lucky, there might be a “changes” list, but that luck might be variant because many simply write “bug fixes”. In these updates are the closing of serious performance or security issues – and as someone who knows the days when you might have to mail in for a floppy disk to be sent in a few weeks to make your program work, I can totally get behind the new “we fixed it before you knew it was broken” world we live in. Everything does this: phones, game consoles, laptops, even routers and medical equipment.

But along with this shifting of versions comes the occasional fundamental change in what browsers do, along with making some aspect of the Web obsolete in a very hard-lined way.

Take, for example, Gopher, a (for lack of an easier description) proto-web that allowed machines to be “browsed” for information that would be easy for users to find. The ability to search, to grab files or writings, and to share your own pools of knowledge were all part of the “Gopherspace”. It was also rather non-graphical by nature and technically oriented at the time, and the graphical “WWW” utterly flattened it when the time came.

But since Gopher had been a not-insignificant part of the Internet when web browsers were new, many of them would wrap in support for Gopher as an option. You’d use the gopher:// URI, and much like the ftp:// or file:// URIs, it co-existed with http:// as a method for reaching the world.

Until it didn’t.

Microsoft, citing security concerns, dropped Gopher support out of its Internet Explorer browser in 2002. Mozilla, after a years-long debate, did so in 2010. Here’s the Mozilla Firefox debate that raged over Gopher Protocol removal. The functionality was later brought back externally in the form of a Gopher plugin. Chrome never had Gopher support. (Many other browsers have Gopher support, even today, but they have very, very small audiences.)

The Archive has an assembled collection of Gopherspace material here.  From this material, as well as other sources, there are web-enabled versions of Gopherspace (basically, http:// versions of the gopher:// experience) that bring back some aspects of Gopher, if only to allow for a nostalgic stroll. But nobody would dream of making something brand new in that protocol, except to prove a point or for the technical exercise. The lens has refocused.

In the present, Flash is beginning a slow, harsh exile into the web pages of history – browser support dropping, and even Adobe whittling away support and upkeep of all of Flash’s forward-facing projects. Flash was a very big deal in its heyday – animation, menu interface, games, and a whole other host of what we think of as “The Web” depended utterly on Flash, and even specific versions and variations of Flash. As the sun sets on this technology, attempts to be able to still view it like the Shumway project will hopefully allow the lens a few more years to be capable of seeing this body of work.

As we move forward in this business of “saving the web”, we’re going to experience “save the browsers”, “save the network”, and “save the experience” as well. Browsers themselves drop or add entire components or functions, and being able to touch older material becomes successively more difficult, especially when you might have to use an older browser with security issues. Our in-browser emulation might be a solution, or special “filters” on the Wayback for seeing items as they were back then, but it’s not an easy task at all – and it’s a lot of effort to see information that is just a decade or two old. It’s going to be very, very difficult.

But maybe recognizing these browsers for what they are, and coming up with ways to keep these lenses polished and flexible, is a good way to start.