Declaration to be ‘Defensive’ for the Defensive Patent License

The Internet Archive hereby declares itself ‘Defensive’ by committing to offer a Defensive Patent License, version 1.1 or any later version, for any of its patents, to any DPL User.   The Internet Archive does not have any patents at this time.

Our contact address is:


Founder, Digital Librarian
Internet Archive

 Birthday and Announcement about DPL.

Posted in News | 1 Comment

Defensive Patent License: Troll Proofed. Innovation Protected.

Today the Defensive Patent License is officially released.   It is designed to bring free software ideas to the patent arena by encouraging patent owners to declare themselves “defensive,” and share their patents with others that have declared themselves defensive.


This way a large number of patents can be used to help create new products and services without fear of being sued.  As more organizations join in becoming defensive, then the set of patents gets larger and the incentive to become defensive grows.

The Internet Archive hosted the “birthday party” as the license was refined, and declared itself defensive.  Brewster Kahle helped spur this generation of the idea by collaborating with lawyers who worked for years to get this to happen.

In celebration of this release, today John Gilmore is dedicating an important portfolio of patents from Pixel Qi to be defensive.   Pixel Qi was a company run by Mary Lou Jepsen of OLPC fame, and partially funded by Brewster Kahle and John Gilmore.

Please consider joining in by declaring your organization defensive, whether you have patents or not.  The Internet Archive has declared itself defensive to support this effort.




Posted in Announcements, News | 3 Comments

430 Billion Web Pages Saved….Help Us Do More!

141117-BrewsterDear Friends,

Today we launch our End-of-Year Campaign.  Once a year, I ask all of you to keep the Internet Archive going and growing stronger.   Please help us reach our goal of raising $1.5 million by the end of the year.  Your support will help pay for servers, bandwidth and our dedicated staff.

I founded the Internet Archive as a non-profit with a huge goal:  to give everyone access to all knowledge—the books, web pages, audio, television and software of our shared human culture. Forever.

Book Scanning with Table Top Scribe

Lan Zhu, a scanner at Internet Archive, at the Table Top Scribe. Zhu can scan a 300-page book in thirty minutes. Since 2005, the Internet Archive has digitized over 2.4 million books.

Together we are building the digital library of the future. A place where we can all go to learn and explore.

At the Internet Archive, we’ve preserved 430 billion web pages. People download 20 million books on our site each month. We get more visitors in a year than most libraries do in a lifetime. The key is to keep improving—and to keep it free. That’s where you can help us.

For the cost of buying a book, you can make a book permanently available for the next generation. Please consider donating $10, $25, $50 or whatever you can afford  to support the Internet Archive before the end of this year. It’s is a small amount to inform millions. Help us do more. I promise you, it’s money well spent.

Thank you,

Brewster Kahle
Founder, Digital Librarian
Internet Archive

Photos by David Rinehart/Internet Archive

Posted in Announcements, News | 14 Comments

Partnership Promotes Jobs and Builds Free Global Library

BARM1As part of their Building Libraries Together initiative the Internet Archive is testing a new socially-responsible jobs model with Bay Area Rescue Mission (BARM) of Richmond, California.

The Internet Archive has been digitizing books for nearly 10 years, but needed help reaching a goal of 10 million eBooks. “We had so much high value content that needed to be digitized, but not enough staff to do the work”, explains Robert Miller, Director of Digital Books and Media. “We wondered how we could make our problem someone else’s solution.” BARM offers a ‘Healthy Living’ addiction recovery program, where over 350 men and women work in a residential setting designed to move them towards self-sufficiency and independent living. The challenge for the staff at BARM is that most of their graduating clients lacked the job skills and professional résumé required for securing a job. Internet Archive can offer job skills and a work history. A conversation between Miller and Tim Hammock, Vice-President of the Bay Area Rescue Mission ensued and the Work Transition Program was born.

BARM2Candidates for the Internet Archive Work Transition Program are men and women from BARM who have completed a 12-month sober living, drug counseling or domestic abuse crisis program and are ready to re-enter the job market. This group often lacks relevant job skills, recent work experience, interpersonal and work relationship skills, self-confidence and, a résumé that a national or local employer would find compelling enough to grant an interview. The curriculum for the Internet Archive Work Transition Program lasts 9 months and focuses on ‘Learning-to-Work’. This three-phase program was based on lessons learned from the 600+ staff that the Archive has hired over the past 8 years. From these lessons, a program of progressive responsibility, constant feedback and a merit badging system was built to meet this challenge. Miller notes that this is not a make-work program. The work is substantive and needs to be completed to help get content online to share with the global community. “The Internet Archive Texts collections have over 20 million downloads each month and the material digitized by the team maintains our high standard of quality.”


BARM3To ‘grease the skids’ for the Work Transition Program graduates, Hammack and Miller contacted local companies, explaining that the program was not a handout and they weren’t looking for charity. They simply asked for a commitment from employers to grant the graduate an interview. Upon reviewing the program goals and expectations, local businesses including UPS, San Francisco Public library, Costco and others signed on. The first class graduates in February 2015, but already two of the candidates have secured part-time employment.

Hammack is thrilled with the program, adding that “We take people on the worst day of their lives and help them achieve dignity, learn healthy living habits, while getting clean and sober. The Work Transition Program continues this path to recovery by helping them earn a job; a huge accomplishment!”


Special thanks to the teams at Internet Archive: Jesse Bell Digitization Coordinator, and Antoine McGrath, Work Transition Supervisor, and at Bay Area Rescue Mission, headed by Tim Hammack ,Vice- President of Operations. For more information about the program, contact Robert Miller.

Posted in Announcements, Books Archive, News | Comments Off

Music Analysis Beginnings

As mentioned in our recent Building Music Libraries post, we are working with researchers at Columbia University and UPF in Barcelona to run their code on the music collection to help their research and to provide new analyses that could help with exploration and understanding.

We are doing some pilot runs to generate files which some close observers may see in the music item directories on  Audio fingerprints from audfprint are .afpt and music attributes from Essentia are in _esslow.json.gz (download sample) and _esshigh.json.gz.

Spectrogram of a Grateful Dead track

Spectrogram of a Grateful Dead track

We are also creating image files showing the audio spectrum used.  We hope this is useful for those that want to see if files have been compressed in the past (even if they are posted as flac files now).  There is also a .png for each audio file of a basic waveform that is being used in the archive’s beta site as eye candy.

More as it happens, but we wanted you know there is some progress and you will see some new files.  If you have proposed other analyses that would benefit from being run over a large corpus, please let us know by contacting info at archive dot org.

Thank you to the researchers and the Archive programmers who are working together to make this happen.


Posted in Audio Archive, Live Music Archive, Music | Comments Off

Using Docker to Encapsulate Complicated Program is Successful

The Internet Archive has been using docker in a useful way that is a bit out of the mainstream: to package a command-line binary and its dependencies so we can deploy it on a cluster and use it in the same way we would a static binary.

Columbia University’s Daniel Ellis created an audio fingerprinting program that was used in a competition.   It was not packaged as a debian package or other distribution approach.   It took a while for our staff to find how to install it and its many dependencies consistently on Ubuntu, but it seemed pretty heavy handed to install that on our worker cluster.    So we explored using docker and it has been successful.   While old hand for some, I thought it might be interesting to explain what we did.

1) Created a docker file to make a docker container that held all of the code needed to run the system.

2) Worked with our systems group to figure out how to install docker on our cluster with a security profile we felt comfortable with.   This included running the binary in the container as user nobody.

3) Ramped up slowly to test the downloading and running of this container.   In general it would take 10-25 minutes to download the container the first time. Once cached on a worker node, it was very fast to start up.    This cache is persistent between many jobs, so this is efficient.

4) Use the container as we would a shell command, but passed files into the container by mounting a sub filesystem for it to read and write to.   Also helped with signaling errors.

5) Starting production use now.

We hope that docker can help us with other programs that require complicated or legacy environments to run.

Congratulations to Raj Kumar, Aaron Ximm, and Andy Bezella for the creative solution to problem that could have made it difficult for us to use some complicated academic code in our production environment.

Go docker!

Posted in Music, Technical | 3 Comments

SEEKING: Visual Studies PostDoc for an Exciting New Opportunity at Internet Archive!

Council on Library and Information Resources

Today, the Internet Archive and the Council on Library and Information Resources (CLIR) announced a new position:

Visual Data Curation Fellow

Do you know a recent Ph.D in Visual Studies (film, photography, information sciences, fine art) who would like to work at the Internet Archive? We’re looking for a talented Post-doc to come work with our growing Film Archive. This two-year position is based at the Internet Archive offices in San Francisco and begins July 1, 2015 through June 30, 2017. We want to thank CLIR and the Andrew W. Mellon Foundation for a generous grant to support this position. For more information visit CLIR.  Applications are open here through December 29.

Posted in News | Comments Off

Lost Landscapes of San Francisco: Fundraiser Benefitting Internet Archive — Friday, December 19, 2014

FerryBldgFromWaterDuskRick Prelinger’s Lost Landscapes of San Francisco is back for one final performance this year!   Now you can catch this perennially sold-out show and your ticket donation will benefit the Internet Archive, a nonprofit digital library which hosts the Prelinger Collection. Please give generously to support the effort.

Friday, December 19, 2014
6 pm Reception
7:30 pm Film

300 Funston Ave.
San Francisco, CA 94118

Get tickets here!

TouristsGGBopening1936ATripDownMarketStreet1906_1This year’s LOST LANDSCAPES brings together familiar and unseen archival film clips showing San Francisco as it was and is no more. Blanketing the 20th-century city from the Bay to Ocean Beach and the Presidio to Bayview, this screening includes San Franciscans at work and play; early hippies in the Haight; a highly privileged walk on the unfinished Golden Gate Bridge;
newly-discovered images of Playland and the waterfront; families living and playing in their neighborhoods; detail-rich streetscapes of the late 1960s; peace rallies in Golden Gate Park; 1930s color images of a busy Market Street; a selected reprise of greatest hits from years 1-8; and much, much more.

As usual, the viewers make the soundtrack — audience members are asked to identify places and events, ask questions, share their thoughts, and create an unruly interactive symphony of speculation about the city we’ve lost and the city we’d like to live in.

The film begins at 7:30 pm and is preceded by an informal
reception that begins at 6:00 pm.

Posted in Announcements, Event, News | 2 Comments

Inviting the Internet Over to Play


At our Annual Event last week, the Archive announced a variety of new projects and plans, including our new beta interface, our compact book scanner, and our progress in tracking political ads on television. The event (full video is here) went very well, with lots of activities and social gathering before and afterwards, and included the first public unveiling of our newest project, the Internet Arcade.

Photo by Kyle Way

Photo by Kyle Way

It was obvious we were on to something – the smallish room with the two stations set up to play emulated arcade games from the collection was constantly packed. Players young and old tried out classic video games, including parents showing their children games they’d played in their own teenage years. All of it was running off the Archive’s own web pages through standard web browsers, with no special plug-ins – and it held up well. We even tracked high scores.


The party, of course, was just the beginning – over the weekend, we quietly announced that the Internet Arcade was available through the main site. With over 900 arcade machines in the collection, most every major machine released between 1976 and 1988 was included. (The emulation system we use, JSMESS, is a Javascript port of a long-running emulation project called MESS/MAME, which has had hundreds of contributors over the years – we salute them.)

After an initial tweet or two, the Arcade’s existence went from a mention by Waxy and Laughing Squid, to sites like Hacker News and Mashable, and from there it hit larger and larger audiences. Within a few hours news had spread to a whole range of sites, including Joystiq, The Verge, Engadget, CNN, PC World, Gizmodo, Ars Technica… and, well, let’s just say a very large amount of sites were reporting on this story.

And that’s when the world showed up.

We’re still counting, but we know hundreds of thousands of people came, many of them all at once, to play.

And as these thousands of curious visitors and first-time callers came to the Archive to try out our collection, minor inefficiencies became showstoppers and the site was temporarily crushed. Our brave administration team persevered, repairs were made, and the site settled in for the new reality:

That's a lot of new visitors!

Everything’s fine and normal… then we crash and fix things… and WOW that’s a lot of new visitors!

This crush of new visitors are coming to the Internet Archive, possibly for the first time ever, and we welcome them with open arms. After all, that’s what we were founded for –  our stated purpose is to function as the Internet’s Library, with stored websites, digitized texts, music, movies and software.  It’s our mission as a non-profit library: make as much of culture and information available to as many people as possible. You can lose a workday or a whole winter in our virtual stacks, and our users often do.

Meanwhile, the story continues to have legs, appearing in newspapers, on radio shows, video podcasts, and message boards around the world.

And then we made it to TV news:


So now that we have (apparently) the world’s attention… ahem ahem..

Even we don’t know where this story is going to lead. But one thing is sure – video games and software are as important a part of history and culture as books, movies and music have been in the past.  And we’re dedicated to bringing all of this to you, the Internet. Sure, it can be a bit surprising when the entire internet comes over to play, but we wouldn’t have put out the welcome mat if we didn’t want you to visit.

As a non-profit, we depend heavily on user donations to stay afloat – we even take Bitcoin and subscriptions. Keeping 20 petabytes of information flowing, fast and free, is what we’re working on day and night and the positive messages and feedback we’ve gotten this past week (and over the years) tell us we’re doing the right thing.

The JSMESS emulation project is one of many open-source projects the Internet Archive is involved with, and while a lot of it is fun and games we’ve got a serious side too, gathering up disappearing web resources and important historical events into our archives to preserve for next generations. We hope that after you relive your childhood or live out a second new one, you’ll stick around and see what else we have here. It’s quite a place.

Game on!






Posted in Announcements, News, Software Archive | 3 Comments


Last week we announced a new beta version of the site.  The beta is the first step toward inviting people to participate in building libraries together.













2014 beta site

2014 beta site

Why redesign the site?

The Wayback Machine was launched in 2001, and the current look of the site was debuted in 2002 when we added movies, texts, software, and music.  There have been minor design changes and we’ve added features over the years to make the library materials more usable, but the current interface has just accumulated over time.  We have not “rethought” the site in a holistic way in the past 12 years.

A lot has changed since 2002, for the Internet Archive and on the web.  In 2002 the archive contained 5,000 non-Wayback items, about half movies from the Prelinger Archive and half live music concerts from the community with a few books and pieces of software sprinkled in. Those 5,000 files added up to about 3 terabytes of data.  Today we have more than 20 million media items that add up to about 10,000 terabytes of data (that’s not including 435 billion saved web pages that take up an additional 10,000 terabytes of space).

As we added more stuff to the archive, people came to visit.  We ended 2002 with about 9,000 registered users.  Today we have just a hair under 2 million registered users, and around 2.5 million individuals use the library materials every day.

Having thousands of movies available on the Internet in 2002 was actually pretty rare (remember, Youtube didn’t exist until 2005). Those 5,000 media items couldn’t be played on our site – you had to download them to your own computer to watch or listen. It was very difficult to add your own files to the Internet Archive – and who would have had the bandwidth to do it anyway?  In 2002 only 21% of U.S. homes had “high speed” internet connections.  High speed back then meant 200 kb per second. [1]

And of course, we can’t forget mobile. About 20-30% of our users today are on mobile devices, and the current web site is not serving them well.

Over the years the archive has grown immensely in terms of material and patrons. Our mission is Universal Access to All Knowledge.  And we think we can do better both with Access and with gathering All Knowledge if we have new tools and a better interface for the site.

Why this interface?

We started talking about the redesign in January of this year.  (Well, honestly we’ve been talking about it since 2006, but this was the first serious, archive-wide project.)

First we found a wonderful Creative Director, David Merkoski, and hired a great designer, Kristen Schlott.  We interviewed people, both users of the archive and people who had never heard of us, and asked them questions about how they use media. We examined how our site was being used, and talked about the intricacies and complications that come with archiving 20 million disparate things. We researched how other sites deal with large amounts of media. We used our current collections and use cases to understand how different designs would perform. Our lead developer, Tracey Jaquith, built prototypes and we user tested them. We talked to some of our power users and partners about our plans and showed them the prototype to get feedback. We had a LOT of meetings.

Idea clustering after user interviews

Idea clustering after user interviews

During this process we realized that we needed to find a way to open the archive up to more participation.  The Internet Archive has built some important and useful collections, both with partners and on our own.  We digitize 1,000 books per day.  We archive 1 billion URLs every week.  We capture television 24 hours per day, every single day.  But there is a lot of media out there in the world, and we can’t save all of it for the future without the help of experts.

Who are the experts?  You!  There are some amazing collections of media in the archive, out on the web, and sitting around on shelves and in basements that have been created by the people who know and care the most about saving those things and making sure their collections are complete and well described.  We want to create a place for those people to build communities around their interests where they can safely store these amazing collections and show them to as many people as possible.  If we all work together, we can create the most useful library the world has ever seen.


Today the beta has the same basic functions as the current site, with some great additions: more visual cues to help you find things, facets on collections to quickly get you where you want to go, easy searching within collections, user pages, and many more.  We think it’s already an improvement over the current site – otherwise, we wouldn’t be showing it to you yet!

But the tools that will allow you to create your own collections and collaborate with others are still being built.  These features will be released in stages so that we can test them out in the beta and see how they work for people.  We will use feedback from patrons – both what you tell us, and the usage logs for the beta – to make decisions about how things will evolve. (Don’t worry, we aren’t keeping IP addresses — the beta respects user privacy.) When you’re in the beta, you’re going to run into things that might not work quite the way you expected, or that have suddenly changed since you used them yesterday. Sometimes it will be slow or you’ll find bugs. New things will appear, and other things may disappear. New tools will suddenly start working. We hope that for our intrepid beta users, this will be part of the fun. (Because we certainly think it’s fun!)


What new things are coming?

To some extent, this remains to be seen.  We will in part make decisions based on how the beta is used, so please use it!

Our current ideas include: speeding up the site; allowing patrons to create their own collections; improving accessibility for the print disabled, adding ways for patrons to collaborate around collections and items, etc.

There’s a lot more to come.  We hope you will explore all of these new options with us, and help us build the library.  If you would like to give us feedback, please write to us at info at archive dot org, or leave comments here.



Posted in News | 6 Comments

New York Times: The Internet Archive, Trying to Encompass All Creation

Thanks to the New York Times for doing a great write-up of our annual celebration.  Check it out!



Posted in News | Comments Off

NYtimes Readers: Try our beta website

The NYtimes article on us has our beta website address incorrect as, but please visit instead.


Posted in News | Comments Off

Invitation to Aaron Swartz Day Nov. 8 in SF

Saturday, November 8, 2014
Internet Archive
300 Funston Ave
San Francisco, CA 94119


The Internet Archive is hosting an Aaron Swartz Day Celebration on what would have been Aaron’s 28th birthday: November 8, 2014, from 6-10:30 pm.



Although we are looking ahead, rather than dwelling on the past, this year’s theme is “Setting the record straight.”

Now that we have brought people together and shared information with each other, the smoke has cleared a bit, and we can clearly explain to the world exactly what Aaron actually did and did not do.

Reception: 6pm-7pm – Come mingle with the speakers and celebrate Aaron’s accomplishments.

Speakers: 7pm-8pm – The Year in Aaron 2014: A comprehensive update.

Movie: 8-9:45 pm – Watch The Internet’s Own Boy with Director Brian Knappenberger.

Q&A: 9:45 – Audience Q & A with Brian Knappenberger and Trevor Timm (co-founder and Executive Director of the Freedom of the Press Foundation) after the movie!


April Glaser (EFF, Freedom to Innovate Summit)
The Freedom to Innovate Summit is a collaboration between EFF and the Center for Civic Media at MIT that calls upon Universities to protect students who innovate at the boundaries of the law.

Yan Zhu (Yahoo, SF Hackathon Organizer)
Yan will explain the history, and evolution to the present day, of the Aaron Swartz International Hackathon.

Brewster Kahle (Digital Librarian, Internet Archive)
Internet Archive has just launched a new set of tools for building collaborative libraries online that were inspired by Aaron’s dreams and visions.

Cindy Cohn (EFF Legal Director – CFAA Reform)
A short and simple update on a very complicated subject: Why most attempts to reform the Computer Fraud and Abuse Act have largely stalled in Congress.

Kevin Poulsen (Journalist – FOIA case that MIT intervened in)
An update on the most recent batch of documents and video from Aaron’s FBI and Secret Service files that have finally trickled out of the U.S. government over this last year, after undergoing further redactions by MIT.

Garrett Robinson and James Dolan (SecureDrop)
2014 was a big year for Aaron’s whistleblowing submission platform, with 15 new instances including:  Forbes, Greenpeace New Zealand, The Guardian, The Intercept, The New Yorker, BayLeaks, and The Washington Post.

Daniel Purcell (Keker & Van Nest, one of Aaron’s lawyers)
Along with Eiliot Peters, Dan Purcell was hired by Aaron and his family in September 2012 to defend Aaron at his criminal trial, set for March 2013. Dan will talk about Aaron’s defenses to the criminal charges and the expert testimony the legal team planned to present.

The event will take place following this year’s San Francisco-based Aaron Swartz International Hackathon, which is going on Saturday and Sunday from 11am-6pm at the Internet Archive PLEASE CLICK HERE. Confirmed 2014 cities include:  Berlin, Boston, Buenos Aires, Houston, Kathmandu, Los Angeles, Magdeberg, New York, Oakland, Oxford, and San Francisco.


On November 8, Pivot is airing Internet’s Own Boy: The Story of Aaron Swartz.  Check local listings.

For more information, contact:
Lisa Rein, Coordinator, Aaron Swartz Day

Posted in Announcements, News | 1 Comment

Building Libraries Together: New Tools for a New Direction


(NYtimes on this announcement, video of talks)

Let’s work together to save all human knowledge.  Today the Internet Archive is announcing a new beta site and new tools to encourage everyone to lend a hand.

Prototype Table Top Scribe for scanning books

Prototype Table Top Scribe for scanning books

We were founded in 1996 as an archive OF the Internet; we saved web pages and made them available through the Wayback Machine starting in 2001. In 2002 we became an archive ON the internet when we began digitizing and hosting movies, books, TV, music and software by working closely with libraries and online communities. Much of the work of building the current archive has been done by us and a relatively small number of selected partners.

Today marks a change in direction.

Listening Room

Listening Room

We are creating new tools to help every media-based community build their own collections on a long term platform that is available to the entire world for free. Collectors will be able to upload media, reference media from other collections, use tools to coordinate the activities of their community, and create a distinct Internet presence while also offering users the chance to explore diverse collections of other content.

In this future, communities and libraries will take the central role in building collections, leveraging the tools and storage of the Internet Archive.

Political campaign ads

Political campaign pilot interface

Still in its early development, the Internet Archive is looking for feedback and help in this new direction.  Shaping these tools will be a joint process with our library and community partners.

Introducing new tools today, with further developments to come:

    • Table-top book scanner that works with back-end Archive technology and staff to create beautiful online books
Beta preview of

Beta preview of

The Internet Archive needs your help to create and use these tools.   Your donations of time, money, digital and physical materials can help us Build Libraries Together.

Posted in Announcements, News | 4 Comments

Building Music Libraries

The Internet Archive is working with partners to preserve our musical heritage. The music collections started 8 years ago with the live music recordings and grew when we started hosting netlabels.

Scanning an LP cover

Scanning an LP cover

Now through new efforts and partnerships we have begun to expand and explore the music collections further.  We are working with researchers, record labels, collectors, internet communities and other archives to gather music media, build tools for preservation and expand metadata for exploration.

We have already made tremendous progress. We have archived millions of tracks, we are working with the Archive of Contemporary Music to digitize portions of their extensive collections of physical media, the community has provided meticulous metadata, and researchers from university programs have begun to analyze the music.

Listening Room

Listening Room

A prototype “listening room” in the Internet Archive’s building in San Francisco is available free to the public to listen to the full musical holdings.  Access to these collections will also be provided to select computer science researchers via a secure “virtual reading room” in our data center.  As tools and the collections grow, we will offer everyone access to the metadata to help them explore, and then offer links to commercial sites for listening or purchasing.

We invite interested people to participate:

Archives. The Internet Archive and the Archive of Contemporary Music in New York have started digitizing ACM’s holdings with consistent, high quality, standards-based methods to build a scalable workflow.  We welcome other archives with similar projects, or who would like to help.  “Digitizing our large physical collections is an important step for our archive to allow others to learn from this deep legacy,” said Bob George, Director of the Archive of Contemporary Music, NYC.


Digitizing CDs at the Archive of Contemporary Music

Collectors.  Digitize, donate, or lend material for digitization.  Improve metadata or provide context to help others understand the depth and cultural relevance of these collections.  “Recycled Records is happy to have directed the donation of many thousands of LPs to the Internet Archive to help with their projects and for the love of music,” Bruce Lyall, proprietor of Recycled Records.

Labels.  Preserving a complete collection of everything published by a label is best done by or with the record label.  We would like to work with labels to get their releases archived and properly cataloged.  “The upcoming Music Libraries program continues the very work that enables our label, and the musicians who record for us, to bring the music of earlier times to audiences today. We are proud to participate in a tradition of preservation that has brought joy to so many through music.”  said David Fox, Co-founder of Musica Omnia.

Cataloging services.  Commercial and non-commercial cataloging services can participate by making sure there are proper links from and to these collections.  The open, community-created catalog has already been very helpful.


Commercial vendors and streaming services.  Links from these collections to commercial services can help users buy and listen to full tracks.  These services might have valuable metadata as well that can help users navigate.

Musicians and bands.  Please create more great works that libraries can preserve and provide access to.  We would like to hear your ideas about making the site useful for both musicians and the general public.

Researchers, historians, and music lovers.  Annotate, organize, datamine, and surface music in the collections, and help us preserve those works not yet in the collections.  “Access to a comprehensive archive of commercial music audio is the key missing link for research relating signal processing to listener behavior,” said Daniel Ellis, professor at Columbia University.  By analyzing the rhythms, keys, instruments, and genres, researchers will help create more complete metadata and aid discovery.

Looking to the future, we hope to expand these shared music collections by uniting the work done by other archives and collectors.  By bringing all of this music and its metadata into a shared library, we hope to bring the richness of our musical heritage to people all over the world.

Visit the Listening Room

Internet Archive
300 Funston Ave
San Francisco, CA 94118
Hours: Fridays from 1-4pm, or by appointment.

If you would like to participate in any way, please email us.

Posted in Announcements, Live Music Archive, Music, News | 4 Comments

Archive of Contemporary Music and the Internet Archive Team up to Create a Music Library

bobgeorgeWhen the personal record collection of music producer Bob George hit 47,000 discs, he knew something had to be done.  “I wanted to give them away, but they were mostly punk, reggae and hip-hop,” he recalled, “and no established library or archive was interested.” The only thing to do, it would seem, was to turn his collection into a non-profit archive in New York called the ARChive of Contemporary Music.  29 years later, the ARC is one of the largest popular music collections in the world, with some three million sound recordings, 19,000 music-related books, and millions of photos, press kits and artifacts.  Now this rich musical resource—used primarily by musicologists and the entertainment industry—is teaming up with one of the largest digital libraries in the world, the San Francisco-based Internet Archive, to create a music library that will preserve and provide researcher access to a wide range of music and the rich materials that surround it.

ACMdigitizationPowered by teams of volunteers, the two archives are partnering to digitize CDs and LPs and then use audio fingerprinting to match tracks with metadata from catalogs and other services.  Using Internet Archive scanners, the ARC is digitizing its books and photographs at its New York facility.  When complete, this music library will be a rich resource for historians, musicologists and the general public.

Listening Room

Listening Room

Starting today, the public can listen to millions of tracks for free, including many that are not available in Spotify or iTunes, at the Internet Archive’s new listening room in San Francisco.  “The Internet Archive has allowed us to move forward at unprecedented speed, originally with book scanning and now with the digitization of a wide range of audio formats,” said Bob George.  “The physical records from around the world that the ARC has archived are a unique treasure,” said Brewster Kahle, founder and digital librarian of the Internet Archive. “Soon these records will be studied in new ways because they will be digital as well.”

ACMpullquoteSince 1985, George, the ARC’s co-founder and director, has run the organization in Tribeca, New York City, supported by friends in the music industry including Paul Simon, David Bowie and Nile Rodgers.  The Rolling Stones guitarist Keith Richards endows a collection of blues and R&B recordings there. Filmmakers Martin Scorsese and Jonathan Demme stop by when trying to track down hard-to-find songs.  Yet for most of its almost three decades, the ARC has been a decidedly “analog” experience:  records, CDs and cassette tapes line its walls; to experience a song you usually have to drop a needle into a pristine vinyl groove.  The collaboration with the web-based Internet Archive represents a new direction.  “We feel that our primary mission, to collect and preserve this material, is near completion,” said Bob George. “Now we are seeking ways to allow greater access to this incredible collection.”

Scanning an LP cover

Scanning an LP cover

The Internet Archive may be best known for the 435 billion web pages in its Wayback Machine, but this digital library has always been a place where live music collectors go to preserve concerts on the web.  Its audio collections include some 130,000 live concerts by bands such as the Grateful Dead, Jack Johnson and Smashing Pumpkins—many with more than a million plays. Recently, the ARC shipped 46,000 seventy-eight rpm recordings to the San Francisco-based non-profit, and has donated tens-of-thousands of long-playing records. Music labels Music Omnia and Other Minds are making their entire collections searchable on, in part because the Internet Archive is one of the few online platforms that preserves audio, texts, musical manuscripts, photos and films and makes them accessible forever, for free.

The Internet Archive listening room is now open to the public for free on Fridays from 1-4 pm, holidays excepted, and by appointment at 300 Funston Avenue, San Francisco, CA.  Those interested in donating physical music collections to the ARC or Internet Archive should contact or


Posted in Announcements, News | Comments Off

Archive-It: Crawling the Web Together

A post by the Archive-It team

Today Phase 1 of the 5.0 release of the Archive-It web application was released for use by the 326 partners using the Archive-It service.

In 1996 when the Internet Archive was founded, we used automated crawlers to capture the web, snapping up millions of web pages and preserving them for history. Ironically, our digital record of humankind was being driven by computer algorithms.

As the years went by, it became clear that we needed people and communities to capture and save what is really and truly important. So in February 2006 we launched the Archive-It service, 1.0, which allowed traditional librarians and archivists to become web archivists by initiating focused, curated crawls of the live web using a simple web application with partner/tech support. Launching Archive-It meant we could help our colleagues create their own web collections for their own libraries and also foster a community around web archiving to work together to build a global digital public library at

Now, as we expand to the next generation of Archive-It with our 5.0 release, we hope to provide even greater tools for collection development. Released this week, 5.0 phase 1 highlights a shiny new user interface and significantly enhanced post-crawl reports that include infographics with visual representations of the data.

representative of the data

Figure 1: Screenshot from the Reports section of the new Archive-It 5.0 user interface

Back in 2006 there was little understanding of web archiving and many organizations were questioning whether this was a valid activity that could or should be a part of their larger institutional collecting strategies. After all, the challenges were staggering: the quality of web content was all over the map; conflicting policies and organizational structures posed challenges; no one had yet established best practices for selecting the content, how to handle metadata, or how to integrate this new type of content into other holdings and existing catalogs at the institution.   Also, back then we could not have predicted the extent to which material that once existed in physical form would now only appear on the web in digital form.

We launched the Archive-It service with a small band of believers and supporters, among them librarians and archivists from Indiana University, University of Texas at Austin, Library of Virginia, Montana State Library, and North Carolina State Archives and State Library. Partners were very patient with us and with Archive-It 1.0, which was bare bones. Collaborating and working with the library and archive community has always been a top priority for the Internet Archive, and a defining characteristic of the Archive-It service. There have been many times during the past 8+ years when we have not known the answer to a question and we say: “Let’s ask the community and see what they think!” And the community has always gotten back to us with supportive answers   – both illustrative and specific.

Figure 2: Screenshot from the North Carolina State Government Web Site Archive of the North Carolina State Archives and State Library of North Carolina.

As time went on, the community of web archivists grew and we were able to produce some compelling answers to the question: why web archive? Here are just a few:

  • To create a thematic or topical web archive
  • To fulfill a mandate to preserve institutional memory and history
  • To archive state or local agency publications no longer being deposited in print form
  • To archive records to meet university or government retention policies
  • To preserve an historical record of an institution’s web and /or social media presence
  • To capture a website before re-design or it is taken offline
  • To archive online art, exhibitions, and artists’ materials


Figure 3: Screenshot from the Latin American Government Documents Archive, LAGDA of the University of Texas at Austin.

Figure 4: Screenshot from the Catalogues Raisonnés collection of the New York Art Resources Consortium (NYARC).

To date in 2014, 326 Archive-It partners have created 2700 public collections on a diversity and range of topics, subjects, events and domains. These collections have become integral to these organizations’ collecting strategies and have helped to raise awareness and understanding about why web archiving is so important.

We like to say that the Archive-It service is both a partner and a vendor. We are a service provider and we strive to consistently deliver a high level of customer support — which we believe partners notice and appreciate. We also strive to be a partner to our community and work collaboratively on initiatives that we share together; a few of which are: a) collaborative efforts around archiving spontaneous events (like the 2011 Japanese Earthquake collection), b) teaching web archiving in graduate level MLIS programs and professional development workshops and c) the K12 Web Archiving program (now in its 7th year) where we work with 3rd to 12 graders around the county and ask them what they would like to archive for future generations. As one of the student archivists put it, “500 years from now, kids will think we were really cool.”

Many of the features and functionality that we see in the Archive-It service today are a direct result of a partner making a suggestion or request. Through face to face brainstorming sessions, online surveys, webinars, and support tickets, partners have expressed their ideas as well as offered constructive criticism. And we have listened.   We hope that as the service continues to grow and we launch Archive-It 5.0 that many of our partners will see themselves in Archive-It. Their collections will continue to be valuable to researchers, historians, scholars and the general public for many years to come.

Here are some links to just a few of those collections on the Archive-It website:

Columbia University’s collection on Human Rights:

National Museum of Women in the Arts’s collection on Contemporary Women Artists on the Web:

University of Alberta’s Circumpolar Collection:

Brigham Young University’s Mormon Missionary Collection:

Stanford University’s collection on Freedom of Information (FOIA):

As we continue down this road – excited for the future and what comes next – we know that it takes a community to archive the web and we look forward to working with our partners to build libraries together.

Posted in Announcements, Archive-It, News | 3 Comments

Media, Money & Elections: 2014 Philly Political Media Ad Watch

Philadelphia-region Political Media Ad Watch is a pilot project that allows citizens and journalists to go online to search every political message in the Philly television market, compare all the ads from a single sponsor (sample: Tom Wolf for Governor) —positive and negative—and trace back who is paying for those ads.

She’s Dishonest!
He’s in Bed with an Accused Mobster!
This is what television audiences in Pennsylvania and Southern New Jersey are hearing a lot of this season. And it’s not Judge Judy or the Jerry Springer Show. Nope. It’s the deeply disturbing reality television show of our nation’s mid-term elections.

Dark accusations run back-to-back with heartwarming assurances of compassion.  All financed by increasingly unfettered flows of cash from ever more veiled donors.

Voters have a right to know who’s paying for these messages. And this flood of commercials begs a few critical questions for our democracy:

  • With so much heat, where can citizens find the light they need to make thoughtful choices?
  • Are the local media, many of whom make big bucks on election advertising, doing a good job giving voters the information and context they need to make sound decisions on Election Day?
  • Can we establish a baseline of metrics to evaluate the performance of local media during elections?

The project is a collaboration between the Internet ArchiveSunlight Foundation, Philadelphia’s Committee of Seventy (a non-partisan government watchdog), University of Delaware’s Center for Community Research & Service and the Linguistic Data Consortium at the University of Pennsylvania. It immediately enables local media to do a better job sifting between fact and fiction in political messaging and revealing financial sources of political influence.

In the coming year, University of Delaware researchers will sift project data to answer some basic questions about how local media is serving the public:

  • To what extent, if any, do local television news broadcasts examine the claims that are made in the political ads that appear on the newscasts?
  • Do the broadcasts cover the same issues that are the subject of political ads? If so, which issues are covered, which issues are not covered?
  • How much time is devoted to that coverage? Where does that coverage appear in the newscasts?

And in the long term, our pioneering work in the Philadelphia-region will help us create an affordable and technically scalable model to answer these questions in local markets nationwide leading up to the 2016 elections.

One of the exciting features of this project is that it brings cutting edge technology together with campaign finance expertise and grassroots good-government advocates in Philadelphia to potentially provide vastly greater understanding on who funds our political system and how they influence campaigns on the ground. Each of these organizations by themselves have a strong potential impact—together, we have the ability to amplify the rich, revealing information that can move voters and sway debate toward better outcomes.

What We’re Doing

The Internet Archive is recording, indexing for search and presenting online Philadelphia TV Market Area television news—which includes 22 counties in Pennsylvania and southern New Jersey; indexing for search all political ads therein; creating an interface for trained volunteers to identify and tag political advertising; joining indexed ads with sponsor information databases; making news and ads searchable, quotable and embeddable; capturing and presenting, in a full-text searchable database, much of the region’s Web media ecosystem..

The Sunlight Foundation is training volunteer political ad sponsorship coders, creating adaptations of the Influence Explorer interface and database to include real time Pennsylvania state campaign data; developing specialized optical character recognition algorithms for extracting Public Inspection File disclosures on sponsorship for TV political ad buys on its Political Ad Sleuth database; conducting outreach to journalists and others for their collaboration and use of resources for stories; integrating ad sponsor data into related Sunlight Foundation data tools and API’s; working with the Internet Archive to sync up sponsorship data with the actual ads in the same interface.

The Committee of Seventy is organizing a team of volunteers; acting as liaison with Philadelphia-region civic organizations; conducting outreach to area press; and providing guidance on issues and political candidates to track.

The University of Delaware’s Center for Community Research & Service at the School of Public Policy & Administration will conduct an analysis of the broadcast news programs in the Philadelphia television market, aired September 1 through Election Day, November 4.  After Election Day, the University team will conduct content analysis to address the research questions above and publish findings next year.

The Linguistic Data Consortium at the University of Pennsylvania is providing technical support and advice regarding the Internet Archive’s broadcast monitoring in the Philadelphia area.

Project Resources

View all identified political TV ads
• Watch video tour guide to using Philly-region TV news search
Search just Philadelphia content from the TV News Archive
Philadelphia stations’ political ad sponsor reports to FCC
Archived Philadelphia web media ecosystem sites (key word searchable)

Project Advisors

Kathleen Hall Jamieson, the Elizabeth Ware Packard Professor of Communication at the Annenberg School for Communication; and Walter and Leonore Annenberg Director of the Annenberg Public Policy Center at the University of Pennsylvania.

Travis N. Ridout, the Thomas S. Foley Distinguished Professor of Government and Public Policy and Associate Professor in the school of Politics, Philosophy and Public at Washington State University; and co-director of the Wesleyan Media Project.

David Westin, former president of ABC News, Founding CEO of NewsRight, a digital start-up spun off from the AP; and now Principal of Witherbee Holdings, LLC

Supported in part by grants and other contributions from:

David Glassco
Democracy Fund
Rita Allen Foundation
Hawthorn Family Fund
Buck Foundation (NYC)
Kahle/Austin Foundation
John S. and James L. Knight Foundation
Philadelphia Foundation, from an anonymous contributor to their donor-advised funds

Project Collaborator Contacts

Internet Archive – Roger Macdonald
Sunlight Foundation – Kathy Kiely
University of Delaware – Danilo Yanich
Committee of Seventy – Ellen Kaplan
Linguistic Data Consortium – Denise DiPersio





Posted in Announcements, News | 6 Comments

Invitation to the Internet Archive Annual Event


Posted in Announcements, News | 1 Comment

Please Help Protect Net Neutrality

Please stand with the Internet Archive to Protect Net Neutrality by writing to your congressperson.    Today, many organizations are putting “Internet Loading” symbols on their sites to bring awareness to the stakes to those of us that would be at the mercy of the Cable and Phone Companies to selectively slow down our sites for profit or just because they may not like our policies.

China started blocking the Internet Archive again a couple of months ago, we believe, because they do not like our open access policies.    In this way, we have started to understand the power in the hands of the Internet service providers.    Lets keep our access to Internet sites “Neutral” and not at the discretion of companies and governments.

Please write to your congressperson.

Posted in Announcements, News | 2 Comments