SHOWCASE: the GIF Collider at Berkeley Art Museum

Image from the GIF Collider, by Greg Niemeyer and Olya Dubatova

Image from the GIF Collider, by Greg Niemeyer and Olya Dubatova

by Greg Niemeyer, Director, Berkeley Center for New Media

Have you ever wondered what happened to all the GIF animations that sparkled in the dawn of the internet? According to artists Greg Niemeyer and Olya Dubatova, they have become part of the digital subconscious, and the Berkeley Art Museum & Pacific Film Archive (BAMPFA) is presenting what that subconscious might look like, in an exhibit called GIF Collider.

Niemeyer studied both the Internet Archive’s  collections of GIF animations and the Prelinger Film Archives from the 1950’s. He noticed how the film archives, which include ads, educational films and propaganda, show a heavy gender and racial bias. In comparison, the GIF animations from forty years later reflect less gender and racial bias—but we can’t help but wonder with more historical distance, what kinds of bias will become apparent in the future?  What about these GIFs do we not see now, that will be obvious in 50 years?

For three days, from Wednesday, October 26 to Friday, October 28, from dawn to dusk, BAMPFA invites you to ponder these questions as thousands of GIF animations emerge and collide on the huge public outdoor screen in a ballet of memory and erasure. Call it an outstallation, and it’s free for the walking to the intersection of Addison Avenue and Oxford Avenue in Berkeley. The GIFS will be presented in several chapters, playing for 30 minutes of every hour.

A special showcase with music made for the GIF Collider by Paz Lenchantin (Pixies) and with live music performances by Trevor Bajus and Space Town is planned for Friday, Oct 28, from 6-8 pm.

For more information:

Posted in Announcements, News | Comments Off on SHOWCASE: the GIF Collider at Berkeley Art Museum

Defining Web pages, Web sites and Web captures

The Internet Archive has been archiving the web for
20 years and has preserved billions of webpages from millions of websites. These webpages are often made up of, and link to, many images, videos, style sheets, scripts and other web objects. Over the years, the Archive has saved over 510 billion such time-stamped web objects, which we term web captures.

We define a webpage as a valid web capture that is an HTML document, a plain text document, or a PDF.

A domain on the web is an owned section of the internet namespace, such as or or A host on the web is identified by a fully qualified domain name or FQDN that specifies its exact location in the tree hierarchy of the Domain Name System. The FQDN consists of the following parts: hostname and domain name.  As an example, in case of the host, its hostname is blog and the host is located within the domain

We define a website to be a host that has served webpages and has at least one incoming link from a webpage belonging to a different domain.

As of today, the Internet Archive officially holds 273 billion webpages from over 361 million websites, taking up 15 petabytes of storage.

Posted in Announcements, News, Web Archive | 4 Comments

Authors Alliance and Internet Archive Team Up to Make Books Available

picture1-1by Michael Wolfe, Executive Director, Authors Alliance
To write a book takes time, effort, more often than not, love. Happily, books are built to last, and with the proper stewardship remain relevant, provide insight and information, or entertain for generations. So why is it that, when the internet provides more avenues than ever for making work accessible, the vast majority of books written in the last 100 years are out of print and largely unavailable? Authors Alliance has been working with its members to help recover their unavailable books and give them another public life. Since the release of our guide to Understanding Rights Reversion in 2015, we have provided information, assistance, and know-how to authors on the topic of recovering rights in order to bring back works that have fallen out of view. While many authors choose to make these recovered titles available commercially, a growing contingent has instead committed to ensuring their works endure in the public eye by making them available under Creative Commons licenses or dedicating them to the public domain. Many of our members’ titles are already discoverable through the HathiTrust digital library, and we are now partnering with the Internet Archive to make these works available in full on our new Authors Alliance Collection Page.

Authors Alliance members Robert Darnton, Joseph Nye, and Thomas Leonard are just some of the authors whose books are now freely available in full-text digital versions under Creative Commons licenses. Join them to rescue your previously published work from obscurity, safeguard your intellectual legacy, and help us build a robust Internet Archive collection. If you have regained rights to your previously published book(s) and would like to feature them in the Internet Archive and on Open Library, this guide to sharing your work is a good place to start. If you have any trouble, contact us! We can help take care of the details and will even handle the scanning and ingest of pre-digital works. And, if you have a backlist but haven’t yet begun the process of regaining rights, we can help with that too. Check out our guide to Understanding Rights Reversion and our guide to crafting a reversion letter to get started. You can always reach out to us directly to help get you on track to unlock your books, regain your rights, and give your work new life online. Contact us to get started, and help us build the Authors Alliance collection page in the Internet Archive!

Posted in Announcements, News | 4 Comments

Dewey Defeats Truman, Pence Defeats Kaine



Physicist Niels Bohr may or may not have been thinking about The Chicago Daily Tribune’s famously erroneous 1948 “Dewey Defeats Truman” headline when he wrote, “Prediction is very difficult, especially about the future.” In a case of déjà vu all over again, history repeated itself last week.


No one expects a political party to provide an objective analysis of a debate, so the Republican party’s verdict on the Vice Presidential debate came as no surprise:

“Americans from all across the country tuned in to watch the one and only Vice Presidential debate. During the debate we helped fact check and monitor the conversation in real time @GOP. The consensus was clear after the dust settled, Mike Pence was the clear winner of the debate.”

What did raise more than a few eyebrows was the timing: announced the “results” of the debate over two hours before the debate actually began. Although it’s not unusual for party officials to prepare article outlines in advance, what was atypical was the timing of the post. A staffer noticed the mistake and took down the pages touting Pence’s accomplishment, but not before the Internet Archive’s Wayback Machine preserved a web capture of the site. It’s a fine example of how our archives serve as timely and unedited historical records.

Posted in Announcements, News | 3 Comments

Access to Knowledge in Canada

The Internet Archive Canada asked Lila Bailey to report on the policy landscape for digital libraries in Canada.   This is a summary of her report:   Looking good.

On September 30th, the Canadian National Institute for the Blind transferred accessible books in audio format to Australia through the book service of the Accessible Books Consortium (ABC). This transfer occurred without the legal obligation to request permission from the copyright owners. This effort was made possible by the Marrakesh Treaty, which creates exceptions in copyright law for the print-disabled. As we previously noted, Canada was the 20th signatory to the treaty, triggering it to enter into force.

Canada has made great strides towards increasing access to human knowledge in recent years. Judicial and legislative developments have brought balance into the law, ushering in more opportunities for public access and use of copyright protected works. And now, with the Marrakesh Treaty entering into effect, it seems a good a time to highlight Canada’s contributions to the world’s accessible digital heritage.

Our sister organization, Internet Archive Canada, has digitized more than 530,000 books, microreproductions, archival fonds, and maps. Libraries and institutions that have collaborated with, financially supported, and contributed material to IAC stretch across the entire country, from Memorial University in Newfoundland to University of Victoria in British Columbia. Internet Archive Canada has been working on accessibility projects, and has digitized more than 10,000 texts in partnership with the Accessible Content E-Portal. To date, this material has only been available to students and scholars within Ontario’s university system. Joining the Marrakesh Treaty now makes it possible for accessible versions of works to be shared more broadly within Canada, and with the other countries that have ratified the treaty.

Canadiana is another group that has helped to advance access to knowledge in Canada. Initially created by Canadian Universities in 1978 to microform National Library collections, Canadiana has more recently worked to digitize Canadian heritage with a focus mainly on public domain printed materials. The University of Toronto Library has also developed full-text digital collections, primarily consisting of public domain materials. These special collections contain a wide variety of items, including over 200,000 books, over 600 archived versions of local government websites, Canadian pamphlets and broadsides, and a fine art repository among many other materials. Similarly, the University of Alberta has developed an open access digital portal called Peel’s Prairie Provinces – a collection containing both an online bibliography of books, pamphlets and other materials related to the settlement and development of the Canadian West, as well as a searchable full-text collection of digital version of many of these materials. The portal allows access to a diverse collection that includes approximately 7,500 digitized books, over 66,000 newspaper issues, 16,000 postcards and 1,000 maps.

The above are just a few examples of Canadian efforts to bring analog materials into digital form to allow increased access to knowledge. Many more such projects can be found via the Canadian National Digital Heritage Index (CNDHI). Supported by funding from Library and Archives Canada and the Canadian Research Knowledge Network, CNDHI is designed to increase awareness of, and access to digital heritage collections in Canada, to support the academic research enterprise and to facilitate information sharing within the Canadian documentary heritage community.

These digitization activities have made significant strides towards opening access to human knowledge in Canada, however, to date, these efforts have been piecemeal. In June of 2016, Library and Archives Canada (LAC) announced a National Heritage Digitization Strategy in order “to bring Canada’s cultural and scientific heritage into the digital era to ensure that we continue to understand the past and document the present as guides to future action.” The goal of the strategy is to provide a cohesive path toward the digitization of Canadian memory institutions’ collections, thus ensuring the institutions remain relevant in the digital age by making their collections easily accessible. LAC wishes to compliment the current efforts of Canadian memory institutions such as those described above by ensuring that a national plan of action is in place.

The public policy landscape in Canada has been generally supportive of access to knowledge efforts. For example, the Canadian Supreme Court has interpreted certain legal provisions, called “fair dealing,” as expansive user rights that cannot be unduly constrained. In a case called CCH Canadian Ltd. v. Law Society of Upper Canada, the Court held that it was fair dealing for the Great Library of Canada to make photocopies of court decisions on behalf of attorneys. In Alberta v. Access Copyright, the Supreme Court held that is fair dealing for teachers to copy short excerpts of copyrighted works for students in their classes. The Court found that such copying was done for the acceptable purpose of research and private study because, as a user right, the relevant perspective from which to consider the purpose was the user/student whose research and private study was furthered by the teacher’s copying. The court also held that the “amount of the dealing” factor should not be assessed in the aggregate. Instead, the court must look at the amount of the work in proportion to the length of the whole works.

In SOCAN v. Bell Canada, the Supreme Court reaffirmed the principles articulated in the Access Copyright case. Here, the Court held that a commercial platform allowing users to stream 30-second preview clips of musical works before they decided whether to purchase the work was also considered fair dealing for the purpose of research. The Court reiterated that the purpose must be assessed from the perspective of the user and not the commercial entity that was trying to sell the music. In each of these cases, the Supreme Court of Canada acknowledged fair dealing as the exercise of users’ rights that must be broadly interpreted.

As a result of these decisions, many Canadian educational institutions developed reasonable fair dealing guidelines which provide educators with a set of criteria for determining whether a particular instance of copying requires permission, or whether it is protected by fair dealing. For example, the University of Toronto’s Fair Dealing Guidelines provide a step-by-step analysis of whether a given use of a copyright protected work may be fair dealing, as well as a few more specific guidelines about what constitutes fair dealing, allowing more uses of copyrighted works without permission.

Additionally, the Canadian legislature passed the Copyright Modernization Act (CMA). The CMA added several important user-oriented provisions, including the addition of education, parody, and satire as acceptable fair dealing purposes. Taken together with the recent Supreme Court decisions discussed above, Canadian law now allows quite a bit more flexibility in using copyrighted works without permission.

The CMA allows private individuals to do more with copyright protected works without legal liability. For example, the CMA created the so-called “YouTube exception” which allows for non-commercial sharing of user-generated content that contains copyrighted material. The provision is designed to permit activity that many ordinary Internet users engage in regularly, such as creating mashups, or using a popular song in the background of a personal home video. This provision is subject to conditions (i.e., identification of the source and author, legality of the original work or the copy used, and absence of a substantial adverse effect on the exploitation of the original work).

A series of additional provisions protect consumers from liability for other “ordinary activities that are commonly accepted,” but which had previously remained illegal under Canadian copyright law. For example, the CMA now permits format shifting of personal copies of works, such as transferring a song from CD to an MP3 player. Similarly, the CMA permits time shifting of copyrighted materials for later listening, reading or viewing. Finally, the law permits individuals to make backup copies of copyrighted works, provided that, among other things, the individual does not give any of the reproductions away to others. However, each of these expansions of user-rights to permit format-shifting, time-shifting, and the creation of backup copies are all subject to the condition that the creation of the reproduction not circumvent a “technological protection measure.”  As such, they may not be as user-friendly in practice as they may appear on paper.

The CMA also expanded the use rights of libraries, museums, and archives. For example, the law now allows libraries, museums, and archives to format shift a work in its permanent collection if the original is in a format that is obsolete or the technology required to use the original is unavailable or is becoming unavailable. Further, libraries, museums, and archives can distribute certain materials digitally, provided that they take certain measures to protect the copyright owner’s rights. There is a similar allowance for unpublished works deposited in archives. The CMA also allows the use of publicly accessible online materials for educational purposes, provided that the source and author are attributed, and unless the works are protected by “digital locks.”

The CMA also revised the statutory damages provisions in a user-friendly manner. The law now distinguishes between commercial from non-commercial infringements for the purposes of statutory damages awards. Specifically, where the “infringements are for non-commercial purposes”, the court may order between $100 and $5,000 in damages “with respect to all infringements involved in the proceedings for all works.” In other words, statutory damages in a proceeding for non-commercial infringement are now limited to $5,000, no matter how many works were infringed.  Furthermore, in exercising its discretion to award statutory damages for non-commercial infringements, the court is to consider “the need for an award [of damages] to be proportionate to the infringements, in consideration of the hardship the award may cause to the defendant, whether the infringement was for private purposes or not, and the impact of the infringements on the plaintiff.”

These recent developments in Canadian law, in conjunction with its ratification of the Marrakesh Treaty, make the landscape ripe for further expansions of digital access to knowledge in the future. Internet Archive Canada will be exploring opportunities for partnerships and projects to bring Canada digital and help the nation to become an international leader in access to knowledge.


Posted in News | 2 Comments

Oct 26th Event: Celebrating 20 Years of Archiving the Web


          The Web dwells in a never-ending present. It is—elementally—ethereal, ephemeral, unstable, and unreliable.         

                               –Jill Lepore, from “The Cobweb: Can the Internet be Archived?”                                                    in the New Yorker, January 26, 2015

For twenty years, here at the Internet Archive, we’ve been trying to capture lightning in a bottle. How do you archive the “ethereal, ephemeral, unstable and unreliable“ Web? Since 1996, that has been part of our daily work. We crawl the Web, preserve it, try to make it play back, as if you were back in 1999 on your own GeoCities page, delighting in that animated Under Construction GIF you just posted.

On October 26, 2016, we will be celebrating our 20th Anniversary, and we hope you will join us. We’ve been grappling with how to convey the enormity of our task. How do you visualize the universe of the Web—the audio, images, Web pages, and software that we’ve been archiving for the last 20 years? When you come to our celebration, we’ll be presenting the work of media innovators, each trying to capture the ephemeral Web:


One view from Cyberscape, Owen Cornec and Vinay Goel’s visualization of the top 800,000 Web sites

  • Cyberscape—Data visualization engineer, Owen Cornec and Internet Archive Data Scientist, Vinay Goel team up to create an interactive exploration of the top sites on the Web, as captured by the Wayback Machine as early as 1996.


  • Deleted Cities—Artist Richard Vijgen’s interactive visualization of GeoCities, once the Web’s largest online community. When Yahoo decided GeoCities was obsolete in 2009, the Internet Archive and Archive Team rushed to preserve tens of millions of GeoCities “homestead” pages before they were erased. Vijgen’s work takes you back to the neighborhoods and virtual cities where a vibrant society once lived online.

Paul D. Miller aka DJ Spooky will perform a newly commissioned piece on October 26.

  • DJ Spooky aka Paul D. Miller & media innovator, Greg Niemeyer join forces to create an audio and video composition, drawn completely from media preserved in the Internet Archive. DJ Spooky’s work ranges from producing 14 albums to the DVD anthology, “Pioneers of African American Cinema,” about which the New York Times wrote “there has never been a more significant video release.”
  • How Media & Messaging are Shaping the 2016 Election—journalist and former Managing Editor of the Sunlight Foundation, Kathy Kiely, explains how short snippets—of debates, political ads, cable news—are altering the Presidential landscape. This analysis is made possible in part by the Internet Archive’s Political Ad Archive, preserving key ads and debates and monitoring how they are used in swing states.


  • Defining Memes & Memories—perhaps the world’s only Free Range Archivist, Jason Scott, takes you on a wild ride through 20 years of memes that captured the global imagination.  From the original keyboard cat to Three Wolf Moon, Scott explores the Archive items and collections that rocked the world.

And to round up the evening, Internet Archive Founder, Brewster Kahle, will reflect upon his lifelong obsession—backing up the Web, making it more reliable and secure. Our work is just beginning, but if we are successful, new generations of learners will be able to access the amazing universe of the Web, learn from it, and build societies that are even better.

GET YOUR FREE TICKET TO “How to Build an Archive—20 Years in the Making.” Wednesday, October 26, 2016 from 5-9:30 p.m. at the Internet Archive, 300 Funston Avenue, San Francisco.


Posted in Announcements, News | 7 Comments

Internet Archive data fuels journalists’ analyses of how TV news shows covered prez debate

The presidential debate between Hillary Clinton and Donald Trump on September 26 drew an audience of 84 million, shattering records. It was also a first for the Internet Archive, which made data publicly available, for free, on how TV news shows covered the debate. These data, generated by the Duplitron, the open source tool used to generate counts of ad airings for the Political TV Ad Archive, also is able to track coverage of specific video clips by TV news shows.

Download TV News Archive presidential debate data here.

Journalists took these data and crunched away, creating novel visualizations to help the public understand how TV news presented the debates.

The New York Times created a visual timeline of TV cable news coverage in the 24 hours following the presidential debate, with separate lines for CNN, MSNBC, and Fox News. Below the time line were short explanations of the peaks and how the different networks varied in their presentations even when they all covered roughly the same ground. The project was the work of Jasmine C. Lee, Alicia Parlapiano, Adam Pearce, and Karen Yourish. For much of the day on Sept. 29, it was featured at the top of the New York Times website.


To see more visualizations created by journalists using TV News Archive data following the first presidential debate, visit the Political TV Ad Archive.

The Internet Archive will make similar data available on the upcoming vice presidential debate, as well as the remaining presidential debates. This effort is part of a collaboration with the Annenberg Public Policy Center to study how voters learn about candidates from debates.



Posted in Announcements, News | Tagged , , , , , , , , , | 2 Comments

Guest Post: Preserving Digital Music – Why Netlabel Archive Matters

The following entry is by Simon Carless, who worked for the Internet Archive in the early 2000’s before moving on to work in media and conferences, while simultaneously maintaining collections at the Internet Archive and running the for-free game information site Mobygames.

netlabelsIt’s fascinating that the early Internet era (digital) data can sometimes be trickier to preserve & access than pre-Internet (analog) data. A prime example is the amazing work of the Netlabel Archive, which I wanted to both laud and highlight as ‘digital archiving done right’.

Created in 2016 by the amazing Zach Bridier, the Netlabel Archive has preserved the catalogs of 11 early ‘netlabels’ and counting, a number of which involve music that was either completely unavailable online, or difficult to listen to online. One of these netlabels is the one that I ran from 1996 to 2009, Mono/Monotonik. So obviously, I’m particularly delighted by that project. But a number of the other netlabels are also great and previously tricky to access, and I’m even more excited for those. (Reminder: all these netlabels freely distributed their music at the time, which makes it a great thing to archive and bring back.)

The nub of the problem around early netlabels  – particularly from 1996 to 2003 – is due to PCs & the Internet (& pre-Internet BBSes!) just not being fast enough or having enough storage to support MP3 downloads at that time.

So this early netlabel music – on PCs and even other computers like Commodore Amigas – was composed in smaller (in kB!) module files, which was composed and played on computers by using sample data and MIDI-style ‘note triggering’ with rudimentary real-time effects. This allows 5-minute long songs to be just 30kB-300kB in size, versus the 5mB or more that a MP3 takes.

For the more recent history of netlabels, I founded the Netlabels collection at the Internet Archive back in 2003, and that’s grown to hold over 65,000 individual music releases – and hundreds of thousands of tracks – by 2016. But the Internet Archive’s collection was largely designed to hold MP3 and OGG files, and so the early .MODs, .XMs and .ITs were not always preserved as part of this collection – and they were certainly not listenable to in-browser.

Additionally, there were a number of netlabels that used their own storage instead of the Internet Archive’s, even after 2003. But if it disappeared, their data disappeared with it, and music files are generally large enough not to be archived by the saintly Wayback Machine.

So if early netlabel archives exist, it was as ZIP/LHA archives on or other relevant demoscene FTP sites. (Netlabels were spawned from the demoscene to some extent, since demo soundtracks use the same format of .MODs and .XMs.) And tracker music is annoyingly hard to play on today’s PCs and Macs – there are programs (such as VLC & more specialist apps) which do it, but it’s not remotely mainstream & not web browser-streamable.

So what Zach has done is keep the original .ZIP/.LHA files, which often had additional ASCII art & release info in them, save the .MODs and .XMs, convert everything to .MP3, painstakingly catalog all of the releases, and then upload the entire caboodle (both original and converted files) to both the Internet Archive and additionally to YouTube, where there are gigantic playlists for each label. So there’s now multiple opportunities for in-browser listening & the original files are also properly preserved.

This means we can now all easily browse and listen to the complete catalog of Five Musicians, a seminal early global PC tracker group/netlabel, as well as the super-neat Finnish electronic music netlabel Milk, the aggressive chiptune/noise label mp3death, and a host of others. And I recently uploaded a rare FTP backup from 1998 which allowed him to put up the 10 releases (that we know about!) from funky electronic netlabel Cutoff. These may have been partially online in databases like Modland, but certainly weren’t this accessible, complete, or well-collected.

What’s somewhat crazy about this is that we’re not even talking about ancient history here – at most, these digital files are 20 years old. And they’re already becoming difficult to access, listen to, or in a few cases even find.

For example, I had to dig deep into backup CD-ROMs to find some of the secret bootleg No’Mo releases that we deliberately _didn’t_ put on the Mono website back in 1996 – opting to distribute them via BBSes instead. These files literally didn’t exist on the Internet any more, despite being small and digital-native.

I think that’s – hopefully – the exception rather than the rule. But without diligent work by Zach (much kudos to him!) & similar work by other citizen digital activists like the 4am Apple II archiver, Jason Scott (obviously!) and a host of others, we’d have issues. And we may need more help still – some of this digital-first materials may disappear permanently, as the CD-ROMs or other media they are on become unreadable.

But we’re still doing a PRETTY good job on preservation, especially with CD-ROMs being ingested in massive amounts onto the Internet Archive regularly. (I’m working with MobyGames & another to-be-announced organization on preserving video game press CD-ROMs on, for example, and Jason Scott’s CD-ROM work is many magnitudes larger than mine.)

Yet I actually think contextualization and access to these materials is just as big a problem, if not bigger. Once we’ve got this raw data, who’s available to look through it, pick out the relevant stuff, and make it easily viewable or streamable to anyone who wants to see it? That’s why the game art/screenshots on those press CD-ROMs is also being extracted and uploaded to MobyGames for easy Google Images access, and why Netlabel Archive’s work to put streamable versions of the music on and YouTube is so vital. (And why playable-in-browser emulation work is SO very important!)

In the end, you can preserve as much data as you want, but if nobody can find it or understand it, well – it’s not for naught, but it’s also not the reason you went to all the trouble of archiving it in the first place. And the fact the Netlabel Archive does both – the preserving AND the accessibility – makes it a gem worth celebrating. Thanks again for all your work, Zach.


Posted in News | 1 Comment

Persistent URL Service,, Now Run by the Internet Archive


OCLC and the Internet Archive today announced the results of a year-long cooperation to ensure the future of The organizations have worked together to build a new service hosted by the Internet Archive that will manage the persistent URLs and sub-domain redirections for, and

Since its introduction by OCLC Research in 1995, has provided a source of Persistent URLs (PURLs) that redirect users to the correct hosting location for documents, data, and websites as they change over time.

With more than 2,500 users including publishing and metadata organizations such as Dublin Core, has become important to the smooth functioning of the Web, data on the Web, and the Semantic Web in particular.

Brewster Kahle of the Internet Archive said “We share a common belief with OCLC that what is shared on the Web should be preserved, so it makes perfect sense for us to add this important service to our set of tools and services including the WayBack Machine as part of our mission to promote universal access to all knowledge.”

Lorcan Dempsey of OCLC welcomed the announcement as “a major step in the future sustainability and independence of this key part of the Web and linked data architectures. OCLC is proud to have introduced persistent URLs and in the early days of the Web and we have continued to host and support it for the last twenty years. We welcome the move of to the Internet Archive which will help them continue to archive and preserve the World’s knowledge as it evolves.”

All previous PURL definitions have been transferred to Internet Archive and can continue to be maintained by their owners through a new web-based interface located at here.

About OCLC:
OCLC is a nonprofit global library cooperative providing shared technology services, original research and community programs so that libraries can better fuel learning, research and innovation. Through OCLC, member libraries cooperatively produce and maintain WorldCat, the most comprehensive global network of data about library collections and services. Libraries gain efficiencies through OCLC’s WorldShare, a complete set of library management applications and services built on an open, cloud-based platform. It is through collaboration and sharing of the world’s collected knowledge that libraries can help people find answers they need to solve problems. Together as OCLC, member libraries, staff and partners make breakthroughs possible

About Internet Archive:
The Internet Archive ( is a 501(c)(3) non-profit that was founded to build an Internet library, with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format.

Posted in Announcements, News | 6 Comments

Tales from the TV News Archive presidential debate near real-time livestream

During last night’s presidential debate, the Internet Archive’s TV News Archive experimented with something new: a near real-time live stream of the first presidential debate. This online video stream is editable, embeddable, and shareable on social media. We were the only public library of the debate capturing these clips within minutes, while the candidates were still duking it out. The debate is preserved on the TV News Archive site for posterity. And when the vice presidential candidates, Tom Kaine and Mike Pence, meet for their debate on October 4, the TV News Archive will be making this live stream available to  journalists and the general public.

During the debate, we matched up TV debate video with fact checks from our Political TV Ad Archive partners at and PolitiFact. Here are some representative tweets and links from last night’s debate:

Minute 15: Hillary Clinton said, “Donald thinks that climate change is a hoax perpetrated by the Chinese.” “I do not say that,” said Trump. “Mostly True,” read the fact check posted by PolitiFact reporters. Jessica Clark, founder of Dot Connector Studio and a consultant to the TV News Archive, was able to link the two here:

Minute 20: Donald Trump said, “I was against the war in Iraq.” posted this timeline of Trump’s statements about the Iraq war, pointing out that Trump had voiced support for the war in 2002 in an interview with “shock jock” Howard Stern. I tweeted that here:

Minute 36: Donald Trump said, “You learn a lot from financial disclosures” as opposed to tax returns. “False,” posted PolitiFact, “Trump has not released his tax returns, which experts say would offer valuable details on his effective tax rate, the types of taxes he paid, and how much he gave to charity, as well as a more detailed picture of his income-producing assets.” This sort of information is not included on financial disclosure forms. I linked to the fact check in this tweet:


Minute 44: Hillary Clinton said: “The gun epidemic is the leading cause of death of young African American men, more than the next nine causes put together.” “True,” posted PolitiFact. Roger Macdonald, TV News Archive director, tweeted the following link to the TV debate clip, along with the fact check.

Overall, fact checking was a crucial part of last night’s debates, as Clark noted:

The near real-time live stream experiment was part of our collaboration around the debates with the Annenberg Public Policy Center, to bring context to the 2016 presidential debates. Stay tuned: today we are drilling down on how TV news is covering the debates. Which video clips are they picking up from the debates in post-debate analyses? We’ll be making that information available to the public, as well as to academic researchers at the Annenberg Public Policy School for integration into their post-debate surveys.

Posted in News | Comments Off on Tales from the TV News Archive presidential debate near real-time livestream

The Internet Archive Turns 20!

For 20 years, the Internet Archive has been capturing the Web– that amazing universe of images, audio, text and software that forms our shared digital culture.  Now it’s time to celebrate and we’re throwing a party! Please join us for our 20th Anniversary celebration on Wednesday, October 26th, 2016, from 5-9:30 pm.

Annual Celebration 2014 exterior

Get your free tickets here.

We’ll kick off the evening with cocktails, tacos trucks and hands-on demos of our coolest tools. Come scan a book, play in a virtual reality arcade, or try out the brand new search feature in the Wayback Machine. When you arrive, be sure to get your library card.  “Check out” all the stations on your card and we’ll reward you with a special gift commemorating our 20th anniversary.

Tracy Demo Station 2015

Starting at 7 p.m., we’ve commissioned Paul D. Miller aka DJ Spooky — composer, author and multimedia artist — to create a short musical montage drawn from the Internet Archive’s audio collections. We’ll look back on some of the defining digital moments of the past 20 years, and explore how media and messaging captured in our Political TV Ad Archive is impacting the 2016 Election.

And to keep you dancing into the evening, DJ Phast Phreddie the Boogaloo Omnibus, will be spinning 45rpm records from 8-9:30. We hope you can join our celebration!

Event Info:Gaming Booth 2015                    Wednesday, October 26th
5pm: Cocktails, tacos, and hands-on demos
7pm: Program
8pm: Dessert, Dancing and more Demo stations

Location:  Internet Archive, 300 Funston Avenue, San Francisco

Be sure to reserve your ticket today!


Posted in Announcements, News | Comments Off on The Internet Archive Turns 20!

Dear Congress: Please Don’t Make It More Difficult And Dangerous To Be A Library

copyrightoffice1Last Friday, the Internet Archive and several of our library, archive, and museum partners sent a letter to House Judiciary Committee Chairman Bob Goodlatte (R-VA) urging him not to make it more difficult and dangerous to be a library.

As we wrote about over the summer, the U.S. Copyright Office is proposing to completely rewrite Section 108, the part of the law that is designed to support traditional library functions such as preservation and inter-library loans. Although the proposal has not been made public yet, we understand from our meeting with them that the Copyright Office wants to redefine who gets to be a library, making it harder for small players and virtual libraries to be protected under the law. The proposal is also likely to be damaging to fair use and may add new, burdensome regulations on libraries who archive the web (among other things).

Thankfully, the Copyright Office does not write the law–that is up to Congress. Our letter explains that now is not the time to scrap the old law, which is working well. The Copyright Office’s proposal is not only unnecessary, but potentially harmful to library efforts to increase access to information. We hope Congress will take the strong objections of the library community seriously when considering the Copyright Office’s proposal to rewrite the law that applies to libraries.

Posted in News | 3 Comments

SAVE THE DATE — The Internet Archive Turns 20!

View from last year's annual celebration. Our 20th anniversary is coming up and we’re throwing a party! Please save the date and join us for our annual celebration on Wednesday, October 26th, 2016.

We’ll kick off the evening with cocktails, tacos and hands-on demo stations. Come scan a book, play in a virtual reality arcade, search billions of Web pages in our Wayback Machine and so much more! Then check out the interactive new media projects by talented artists working with our collections.

Starting at 7 p.m., Paul D. Miller aka DJ Spooky — composer, author, teacher, electronics DJ and multi-media artist — will perform a short, original musical retrospective of the Internet Archive’s audio collections. We’ll look back on some of the defining moments of the past 20 years, and explore how media and messaging is impacting the 2016 Election.

And to keep you dancing into the evening, DJ Phast Phreddie the Boogaloo Omnibus, will be spinning 45rpm records from 8-9:30. We hope you can join our celebration!
Event Info:

Wednesday, October 26th
5pm: Cocktails, tacos, and hands-on demos
7pm: Program
8pm: Dessert and Dancing

Location: Internet Archive, 300 Funston Avenue, San Francisco, CA 94118

Posted in Event | 5 Comments

Rock Against the TPP is Coming to San Francisco…TOMORROW!

On Friday, September 9th hip hop icons Dead Prez, actress Evangeline Lilly, punk legend Jello Biafra, Grammy winners La Santa Cecilia, and others will play a free concert at the Regency Ballroom in San Francisco to protest the Trans-Pacific Partnership (TPP).

The TPP is a contentious trade agreement that is getting quite a bit of negative press in the 2016 U.S. election cycle. Among many other issues, the TPP would govern how signatory countries protect and enforce intellectual property rights. The TPP could have a large negative impact on libraries by increasing copyright term limits and neglecting the essential limitations on copyright law that libraries around the world rely on. Many different groups have vocally opposed the TPP, both for its substance and for the secrecy of the negotiations process.

tppmorrelloOrganized by Fight for the Future and Rage Against the Machine guitarist Tom Morello, the  tour is designed to pull new audiences into the fight against the TPP. See more details and a full lineup at

The concert will be followed by a teach-in on “How to Fight the TPP” on Saturday, Sept. 10th from 1pm – 3pm at 1999 Bryant Street, hosted by experts from a wide range of organizations opposing the TPP.

Posted in Event, Music, News | 2 Comments

Saving the 78s

Written by B. George, the Director of ARChive of Contemporary Music in NYC, and Curator of Sound Collections at the Internet Archive in San Francisco.

While audio CDs whiz by at about 500 revolutions per minute, the earliest flat disks offering music whirled at 78rpm. They were mostly made from shellac, i.e., beetle (the bug, not The Beatles) resin and were the brittle predecessors to the LP (microgroove) era. The format is obsolete, and the surface noise is often unbearable and just picking them up can break your heart as they break apart in your hands. So why does the Internet Archive have more than 200,000 in our physical possession?Music

A little over a year ago New York’s ARChive of Contemporary Music (ARC) partnered with the Internet Archive to focus on preserving and digitizing audio-visual materials. ARC is the largest independent collection of popular music in the world. When we began in 1985 our mandate was microgroove recordings – meaning vinyl – LPs and forty-fives. CDs were pretty much rumors then, and we thought that other major institutions were doing a swell job of collecting earlier formats, mainly 78rpm discs. But donations and major research projects like making scans for The Grammy Museum and The Ertegun Jazz Hall of Fame placed about 12,000 78s in our collection.

For years we had been getting calls offering 78 collections that we were unable to accept. But when space and shipping became available through the Internet Archive, it was now possible to begin preserving 78s. Here’s a short history of how in only a few years ARC and the Internet Archive have created one of the largest collections in America.

Our first major donation came from the Batavia Public Library in Illinois, part of the Barrie H.Thorp Collection of 48,000 78s.

We’re always a tad suspicious of large collections like these. First thought is, “Must be junk.” Secondly, “It’s been cherrypicked.” But the Thorp Collection was screened by former ARC Board member Tom Cvikota, who found the donor, helped negotiate the gift and stored it. That was in 2007. Between then and our 2015 pickup Tom arranged for some of the recordings to be part of an exhibition at the Greengrassi Gallery, London, (UK, Mar-Apr, 2014) by artist Allen Ruppersberg, titled, For Collectors Only (Everyone is a Collector).

What makes the Thorp collection unique is the obsessive typewritten card catalog featured in a short film hosted on the exhibition’s webpage. Understanding why you collect and how you give your interests meaning is a part of Allen’s work – artworks that focus on the collector’s mentality. One nice quote by Allen referenced in Greil Marcus’ book, The History of Rock n’ Roll in Ten Songs is, “In some cases, if you live long enough, you begin to see the endings of things in which you saw the beginnings.”

Philosophical musings aside, there are 48,000 discs to deal with. That meant taking poorly packed boxes — many of them open for 20 years — and re-boxing them for proper storage. The picture below shows an example of how they arrived (on the right), and how they were palletized (on the left.)

PalletizedThe trick to repacking in a timely fashion is to not look at the records. It’s a trick that is never performed successfully. Handling fragile 78s requires grabbing one or just a few at a time. So we’re endlessly reading the labels, sleeving and resleeving, all the time checking for rarities, breakage and dirt.

Now we didn’t do all this work on our own. Working another part of the warehouse was two-and-a-half month old Zinnia Dupler — the youngest volunteer ever to give us a hand. Mom also helped a bit.


A few minutes after the snap I found this gem in the Thorp collection. Coincidence? I don’t think so…burpinthebaby

“Burpin” is a country novelty tune from out of Texas by Austin broadcaster and humorist Richard “Cactus” Pryor (1923 – 2011). It came from a box jam-packed with country and hillbilly discs. This was a pleasant surprise, as we expected the collection to be like most we encounter – big band and bland pop. But here was box-after-box of hillbilly, country, and Western swing records. Now, I use’ta think I knew a bit about music. But with this collection, it was back to school for me. Just so many artists I’ve never heard of or held a record by. As we did a bit of sorting, in the ‘G’s alone there’s Curly Gribbs, Lonnie Glosson and the Georgians. Geeez! Did you know that Hank Snow had a recordin’ kid, Jimmy, and he cut “Rocky Mountain Boogie” on 4 Star records, or that Cass Daley, star of stage and screen, was the ‘Queen of Musical Mayhem?” Me neither.  The Davis Sisters, turns out, included a young Skeeter Davis(!) and not to be confused with the Davis Sister Gospel group, also in this collection. Then there’s them Koen Kobblers, Bill Mooney and his Cactus Twisters, and Ozie Waters and the Colorado Hillbillies. No matter they should be named the Colorado Mountaineers, they’re new to me.

For us this donation is a dream: it allows us to preserve material that was otherwise going to be thrown away; it has a larger cultural value beyond the music; and it contained a mountain of unfamiliar music, much of it quite rare. And most of it is not available online.

It was a second large donation that prompted the Internet Archive to move toward the idea that we should digitize all of our 78s. The Joe Terino Collection came to us through a cold call, the collection professionally appraised at $500,000. The 70,000 plus 78s were stored in a warehouse for more than 40 years, originally deposited by a distributor. Here’s the kicker: they said that we could have it all, but we had to move it – NOW! Internet Archive did and it came in on 72 pallets, in three semis, from Rhode Island to San Francisco, looking like this…JoeTernino

So Fred Patterson and the crackerjack staff out in our Richmond warehouses (Marc Wendt, Mark Graves, Sean Fagan, Lotu Tii, Tracey Gutierrez, Kelly Ransom, and Matthew Soper) pulled everything off the ramshackle pallets and carefully reboxed this valuable material.


How valuable? Well, we’re really not so sure yet, despite the appraisal, as just receiving and reboxing was such a chore. One hint is this sweet blues 78 that we managed to skim off the top of a pile.


The next step is curating this material, acquiring more collections and moving towards preservation through digitization. Already we have a pilot project in the works with master preservationist George Blood to develop workflow and best digitization practices.

We’re doing all this because there’s just no way to predict if the digital will outlast the physical, so preserving both will ensure the survival of cultural materials for future generations to study and enjoy. And, it’s fun.


Posted in Announcements, Audio Archive, Music | 8 Comments

Hacking Web Archives

The awkward teenage years of the web archive are over. It is now 27 years since Tim Berners-Lee created the web and 20 years since we at Internet Archive set out to systematically archive web content. As the web gains evermore “historicity” (i.e., it’s old and getting older — just like you!), it is increasingly recognized as a valuable historical record of interest to researchers and others working to study it at scale.

Thus, it has been exciting to see — and for us to support and participate in — a number of recent efforts in the scholarly and library/archives communities to hold hackathons and datathons focused on getting web archives into the hands of research and users. The events have served to help build a collaborative framework to encourage more use, more exploration, more tools and services, and more hacking (and similar levels of the sometime-maligned-but-ever-valuable yacking) to support research use of web archives. Get the data to the people!

pngl3s_hackathon_postFirst, in May, in partnership with the Alexandria Project of L3S at University of Hannover in Germany, we helped sponsor “Exploring the Past of the Web: Alexandria & Archive-It Hackathonalongside the Web Science 2016 conference. Over 15 researchers came together to analyze almost two dozen subject-based web archives created by institutions using our Archive-It service. Universities, archives, museums, and others contributed web archive collections on topics ranging from the Occupy Movement to Human Rights to Contemporary Women Artists on the Web. Hackathon teams geo-located IP addresses, analyzed sentiments and entities in webpage text, and studied mime type distributions.

unleashed attendeesunleashed_vizSimilarly, in June, our friends at Library of Congress hosted the second Archives Unleashed  datathon, a follow-on to a previous event held at University of Toronto in March 2016. The fantastic team organizing these two Archives Unleashed hackathons have created an excellent model for bringing together transdisciplinary researchers and librarians/archivists to foster work with web data. In both Archives Unleashed events, attendees formed into self-selecting teams to work together on specific analytical approaches and with specific web archive collections and datasets provided by Library of Congress, Internet Archive, University of Toronto, GWU’s Social Feed Manager, and others. The #hackarchives tweet stream gives some insight into the hacktivities, and the top projects were presented at the Save The Web symposium held at LC’s Kluge Center the day after the event.

Both events show a bright future for expanding new access models, scholarship, and collaborations around building and using web archives. Plus, nobody crashed the wi-fi at any of these events! Yay!

Special thanks go to Altiscale (and Start Smart Labs) and ComputeCanada for providing cluster computing services to support these events. Thanks also go to the multiple funding agencies, including NSF and SSHRC, that provided funding, and to the many co-sponsoring and hosting institutions. Super special thanks go to key organizers, Helge Holzman and Avishek Anand at L3S and Matt Weber, Ian Milligan, and Jimmy Lin at Archives Unleashed, who made these events a rollicking success.

For those interested in participating in a web archives hackathon/datathon, more are in the works, so stay tuned to the usual social media channels. If you are interested in helping host an event, please let us know. Lastly, for those that can’t make an event, but are interested in working with web archives data, check out our Archives Research Services Workshop.

Lastly, some links to blog posts, projects, and tools from these events:

Some related blog posts:

Some hackathon projects:

Some web archive analysis tools:

Here’s to more happy web archives hacking in the future!

Posted in Archive-It, News | Tagged , , , | 3 Comments

The Hidden Shifting Lens of Browsers


Some time ago, I wrote about the interesting situation we had with emulation and Version 51 of the Chrome browser – that is, our emulations stopped working in a very strange way and many people came to the Archive’s inboxes asking what had broken. The resulting fix took a lot of effort and collaboration with groups and volunteers to track down, but it was successful and ever since, every version of Chrome has worked as expected.

But besides the interesting situation with this bug (it actually made us perfectly emulate a broken machine!), it also brought into a very sharp focus the hidden, fundamental aspect of Browsers that can easily be forgotten: Each browser is an opinion, a lens of design and construction that allows its user a very specific facet of how to address the Internet and the Web. And these lenses are something that can shift and turn on a dime, and change the nature of this online world in doing so.

An eternal debate rages on what the Web is “for” and how the Internet should function in providing information and connectivity. For the now-quite-embedded millions of users around the world who have only known a world with this Internet and WWW-provided landscape, the nature of existence centers around the interconnected world we have, and the browsers that we use to communicate with it.


Avoiding too much of a history lesson at this point, let’s instead just say that when Browsers entered the landscape of computer usage in a big way after being one of several resource-intensive experimental programs. In circa 1995, the effect on computing experience and acceptance was unparalleled since the plastic-and-dreams home computer revolution of the 1980s. Suddenly, in one program came basically all the functions of what a computer might possibly do for an end user, all of it linked and described and seemingly infinite. The more technically-oriented among us can point out the gaps in the dream and the real-world efforts behind the scenes to make things do what they promised, of course. But the fundamental message was: Get a Browser, Get the Universe. Throughout the late 1990s, access came in the form of mailed CD-ROMs, or built-in packaging, or Internet Service Providers sending along the details on how to get your machine connected, and get that browser up and running.

As I’ve hinted at, though, this shellac of a browser interface was the rectangular window to a very deep, almost Brazillike series of ad-hoc infrastructure, clumsily-cobbled standards and almost-standards, and ever-shifting priorities in what this whole “WWW” experience could even possibly be. It’s absolutely great, but it’s also been absolutely arbitrary.

With web anniversaries aplenty now coming into the news, it’ll be very easy to forget how utterly arbitrary a lot of what we think the “Web” is, happens to be.

There’s no question that commercial interests have driven a lot of browser features – the ability to transact financially, to ensure the prices or offers you are being shown, are of primary interest to vendors. Encryption, password protection, multi-factor authentication and so on are sometimes given lip service for private communications, but they’ve historically been presented for the store to ensure the cash register works. From the early days of a small padlock icon being shown locked or unlocked to indicate “safe”, to official “badges” or “certifications” being part of a webpage, the browsers have frequently shifted their character to promise commercial continuity. (The addition of “black box” code to browsers to satisfy the ability to stream entertainment is a subject for another time.)

Flowing from this same thinking has been the overriding need for design control, where the visual or interactive aspects of webpages are the same for everyone, no matter what browser they happen to be using. Since this was fundamentally impossible in the early days (different browsers have different “looks” no matter what), the solutions became more and more involved:

  • Use very large image-based mapping to control every visual aspect
  • Add a variety of specific binary “plugins” or “runtimes” by third parties
  • Insist on adoption of a number of extra-web standards to control the look/action
  • Demand all users use the same browser to access the site

Evidence of all these methods pop up across the years, with variant success.

Some of the more well-adopted methods include the Flash runtime for visuals and interactivity, and the use of Java plugins for running programs within the confines of the browser’s rectangle. Others, such as the wide use of Rich Text Format (.RTF) for reading documents, or the Realaudio/video plugins, gained followers or critics along the way, and were ultimately faded into obscurity.

And as for demanding all users use the same browser… well, that still happens, but not with the same panache as the old Netscape Now! buttons.


This puts the Internet Archive into a very interesting position.

With 20 years of the World Wide Web saved in the Wayback machine, and URLs by the billions, we’ve seen the moving targets move, and how fast they move. Where a site previously might be a simple set of documents and instructions that could be arranged however one might like, there are a whole family of sites with much more complicated inner workings than will be captured by any external party, in the same way you would capture a museum by photographing its paintings through a window from the courtyard.  

When you visit the Wayback and pull up that old site and find things look differently, or are rendered oddly, that’s a lot of what’s going on: weird internal requirements, experimental programming, or tricks and traps that only worked in one brand of browser and one version of that browser from 1998. The lens shifted; the mirror has cracked since then.

This is a lot of philosophy and stray thoughts, but what am I bringing this up for?

The browsers that we use today, the Firefoxes and the Chromes and the Edges and the Braves and the mobile white-label affairs, are ever-shifting in their own right, more than ever before, and should be recognized as such.

It was inevitable that constant-update paradigms would become dominant on the Web: you start a program and it does something and suddenly you’re using version 54.01 instead of version 53.85. If you’re lucky, there might be a “changes” list, but that luck might be variant because many simply write “bug fixes”. In these updates are the closing of serious performance or security issues – and as someone who knows the days when you might have to mail in for a floppy disk to be sent in a few weeks to make your program work, I can totally get behind the new “we fixed it before you knew it was broken” world we live in. Everything does this: phones, game consoles, laptops, even routers and medical equipment.

But along with this shifting of versions comes the occasional fundamental change in what browsers do, along with making some aspect of the Web obsolete in a very hard-lined way.

Take, for example, Gopher, a (for lack of an easier description) proto-web that allowed machines to be “browsed” for information that would be easy for users to find. The ability to search, to grab files or writings, and to share your own pools of knowledge were all part of the “Gopherspace”. It was also rather non-graphical by nature and technically oriented at the time, and the graphical “WWW” utterly flattened it when the time came.

But since Gopher had been a not-insignificant part of the Internet when web browsers were new, many of them would wrap in support for Gopher as an option. You’d use the gopher:// URI, and much like the ftp:// or file:// URIs, it co-existed with http:// as a method for reaching the world.

Until it didn’t.

Microsoft, citing security concerns, dropped Gopher support out of its Internet Explorer browser in 2002. Mozilla, after a years-long debate, did so in 2010. Here’s the Mozilla Firefox debate that raged over Gopher Protocol removal. The functionality was later brought back externally in the form of a Gopher plugin. Chrome never had Gopher support. (Many other browsers have Gopher support, even today, but they have very, very small audiences.)

The Archive has an assembled collection of Gopherspace material here.  From this material, as well as other sources, there are web-enabled versions of Gopherspace (basically, http:// versions of the gopher:// experience) that bring back some aspects of Gopher, if only to allow for a nostalgic stroll. But nobody would dream of making something brand new in that protocol, except to prove a point or for the technical exercise. The lens has refocused.

In the present, Flash is beginning a slow, harsh exile into the web pages of history – browser support dropping, and even Adobe whittling away support and upkeep of all of Flash’s forward-facing projects. Flash was a very big deal in its heyday – animation, menu interface, games, and a whole other host of what we think of as “The Web” depended utterly on Flash, and even specific versions and variations of Flash. As the sun sets on this technology, attempts to be able to still view it like the Shumway project will hopefully allow the lens a few more years to be capable of seeing this body of work.

As we move forward in this business of “saving the web”, we’re going to experience “save the browsers”, “save the network”, and “save the experience” as well. Browsers themselves drop or add entire components or functions, and being able to touch older material becomes successively more difficult, especially when you might have to use an older browser with security issues. Our in-browser emulation might be a solution, or special “filters” on the Wayback for seeing items as they were back then, but it’s not an easy task at all – and it’s a lot of effort to see information that is just a decade or two old. It’s going to be very, very difficult.

But maybe recognizing these browsers for what they are, and coming up with ways to keep these lenses polished and flexible, is a good way to start.

Posted in Emulation, Technical, Wayback Machine | 2 Comments

No More 404s! Resurrect dead web pages with our new Firefox add-on.

No More 404sHave you ever clicked on a web link only to get the dreaded “404 Document not found” (dead page) message? Have you wanted to see what that page looked like when it was alive? Well, now you’re in luck.

Recently the Internet Archive and Mozilla announced “No More 404s”, an experiment to help you to see archived versions of dead web pages in your Firefox browser. Using the “No More 404s” Firefox add-on you are given the option to retrieve archived versions of web pages from the Internet Archive’s 20-year store of more than 490 billion web captures available via the Wayback Machine.


To try this free service, and begin to enjoy a more reliable web, view this page with Firefox (version 48 or newer) then:

  1. Install the Firefox “Test Pilot”:
  2. Enable the “No More 404s” add-on:
  3. Try viewing this dead page:

See the banner that came down from the top of the window offering you the opportunity to view an archived version of this page?  Success!

Wayback MachineFor 20 years, the Internet Archive has been crawling the web, and is currently preserving web captures at the rate of one billion per week. With support from the Laura and John Arnold Foundation, we are making improvements, including weaving the Wayback Machine into the fabric of the web itself.

“We’d like the Wayback Machine to be a standard feature in every web browser,” said Brewster Kahle, founder of the Internet Archive. “Let’s fix the web — it’s too important to allow it to decay with rotten links.”

“The Internet Archive came to us with an idea for helping users see parts of the web that have disappeared over the last couple of decades,” explained Nick Nguyen, Vice President, Product, Firefox.

The Internet Archive started with a big goal — to archive the web and preserve it for history. Now, please help us. Test our latest experiment and email any feedback to

Posted in Announcements, Wayback Machine | 10 Comments

Microphone Check: Thousands of Hip-Hop Mixtapes at the Archive

The Internet Archive has been growing an interesting sub-collection of music for the past few months: Hip-Hop Mixtapes. The resulting collection still has a way to go before it’s anywhere near what is out there (limited by bandwidth and a few other technical factors), but now that it’s past 150 solid days of music on there, it’s quite enough to browse and “get the idea”, should you be so inclined.

Note: Hip-Hop tends to be for a mature audience, both in subject matter and language.

I’m sure this is entirely old knowledge for some people, but it was new to me, so I’ll describe the situation and the thinking.


There’s some excellent introductions and writeups about mixtapes in Hip-Hop culture at these external articles:

So, in quick summary, there have been mixtapes of many varieties for many years, going back to the 1970s to the dawn of what we call Hip-Hop, and throughout the time since the “tapes” have become CDs and ZIP files and are now still being released out into “the internet” to be spread around. The goal is to gain traction and attention for your musical act, or for your skills as a DJ, or any of a dozen reasons related to getting music to the masses.

There is an entire ecosystem of mixtape distribution and access. There are easily tens of thousands of known mixtapes that have existed. This is a huge, already-extant environment out there, that was established, culturally critical, and born-digital.

It only made sense for a library like the Internet Archive to provide it as well.

There’s a lot coded into the covers of these mixtapes (not to even mention the stuff coded into the lyrics themselves) – there’s stressing of riches, drug use, power, and oppression. There’s commentary on government, on social issues, and on the meaning of entertainment and celebrity. There’s parody, there’s aggrandizement, and there’s every attempt to draw in the listeners in what is a pretty large pile of material floating around. It’s not about this song or that grandiose portrait, though – it’s about the fact this whole set of material has meaning, reality and relevance to many, many people.

How do I know this has relevance? Within 24 hours of the first set of mixtapes going onto the Archive, many of the albums already had hundreds of listeners, and one of them broke a thousand views. Since then, a good amount have had tens of thousands of listens. Somebody wants this stuff, that’s for sure. And that’s fundamentally what the Archive is about – bringing access to the world.

The end goal here is simple: Providing free access to huge amounts of culture, so people can reference, contextualize, enjoy and delight over material in an easy-to-reach, linkable, usable manner. Apparently it’s already taken off, but here you go too.

Get your drank on here.

Posted in Announcements, Music, News | 2 Comments