Author Archives: jeff kaplan

Help Free PACER–Cast your Vote for Free Court Records at the Internet Archive this Friday!

Public Resource Postcard  Internet activist and founder of, Carl Malamud is launching a national campaign to free millions of court documents in PACER–Public Access to Court Electronic Records–the technologically backwards federal electronic system that charges Americans 10 cents per page to access court files in the public domain.  This Friday, you can come by the Internet Archive “polling place” at 300 Funston Avenue., San Francisco from 8 a.m. to 5 p.m. to “cast your vote” for free court records.  Carl will be on hand with inspiring postcards addressed to Chief Judge Thomas of the Ninth Circuit Court of Appeals.  By sending His Honor hundreds of handwritten postcards asking him to grant a PACER fee-exemption,  we can save tax-payers millions of dollars, while freeing court documents crucial to understanding and interpreting the law.

This is just one prong in a multi-faceted campaign to free PACER.  Carl outlines Friday’s strategy in a memorandum of law called, “Yo, Your Honor.”  His request of us:

May 1 is Law Day, and I’m asking people to come in and write a brief postcard about why you think that access to PACER is important. More specifically, you’ll be writing a postcard to Chief Judge Thomas of the Ninth Circuit of the U.S. Court of Appeals in support of my request that the Court grant us free access to PACER for several courts in the Ninth Circuit. It would be a really big deal if the Court said yes, we’re trying to show public support in a way the judges can relate to.

Photo of PACER PostcardsYou can also send your postcard directly if you can’t make it to the Internet Archive on Friday:

Clerk of the Court
Attn: Docket 15-80056
United States Courts of Appeals
James Browning Courthouse
95 7th Street
San Francisco, CA 94103


In 2008, Aaron Swartz downloaded millions of PACER documents, and worked with Malamud to make them accessible for free on the Internet Archive through the RECAP Project.  This is just one more step toward providing everyone with free access to all knowledge–the great promise of the Internet and our mission at the Internet Archive.



Internet Archive Supports Critical Updates to Electronic Privacy Law in California

The California Electronic Communications Privacy Act (CalECPA), a newly introduced bill in California, would help bring state law up to date and require law enforcement to get a warrant before searching private online accounts or personal electronic devices. The Internet Archive is pleased to join a long and diverse list organizations and companies supporting CalECPA. To learn more, see write-ups by State Senator Mark Leno’s office, the ACLU of California, and the Electronic Frontier Foundation.

University of California Libraries to partner with Archive-It

cdl_logoThis week, the University of California California Digital Libraries and the UC Libraries announced a partnership with Internet Archive’s Archive-It Service.

In the coming year, CDL’s Web Archiving Service (WAS) collections and all core infrastructure activities, i.e., crawling, indexing, search, display, and storage, will be transferred to Archive-It. WAS partners have captured close to 80 terabytes of archived content most of which will be added to the 450 terabytes Archive-It partners have collected.

We are excited to work with CDL as we transition over the UC (and other) libraries to the Archive-It service. These UC libraries have unique and compelling collections (some dating back to 2006) including their Grateful Dead Web Archive: http://webarchives.cdlib.orggdarchive/a/gratefuldead which of course fits in quite nicely with the Internet Archive’s large collection of downloadable and streamed Grateful Dead shows in our Live Music Archive.

By collaborating with CDL, Archive-it can continue to expand the core functionalities of web archiving and work with CDL and other colleagues to develop new tools to advance the use of web archives. Such collaboration is sorely needed at this juncture and we welcome the opportunity to expand the capabilities of web archiving. By working together as a community we can create useful and sustainable web archives and ensure growth in the field of web archiving.

Be sure and check out some of the CDL collections:

Archiving the LGBT Web: Eastern Europe and Eurasia- UCB:
Federal Regional Agencies in California Web Archive- UC Davis:
Salvadoran Presidential Election March 2009 – Web Archive- UC Irving:
2009 H1N1 Influenza A (Swine Flu) Outbreak- UC San Diego:
California Tobacco Control Web Archive- UCSF

Partnership Promotes Jobs and Builds Free Global Library

BARM1As part of their Building Libraries Together initiative the Internet Archive is testing a new socially-responsible jobs model with Bay Area Rescue Mission (BARM) of Richmond, California.

The Internet Archive has been digitizing books for nearly 10 years, but needed help reaching a goal of 10 million eBooks. “We had so much high value content that needed to be digitized, but not enough staff to do the work”, explains Robert Miller, Director of Digital Books and Media. “We wondered how we could make our problem someone else’s solution.” BARM offers a ‘Healthy Living’ addiction recovery program, where over 350 men and women work in a residential setting designed to move them towards self-sufficiency and independent living. The challenge for the staff at BARM is that most of their graduating clients lacked the job skills and professional résumé required for securing a job. Internet Archive can offer job skills and a work history. A conversation between Miller and Tim Hammock, Vice-President of the Bay Area Rescue Mission ensued and the Work Transition Program was born.

BARM2Candidates for the Internet Archive Work Transition Program are men and women from BARM who have completed a 12-month sober living, drug counseling or domestic abuse crisis program and are ready to re-enter the job market. This group often lacks relevant job skills, recent work experience, interpersonal and work relationship skills, self-confidence and, a résumé that a national or local employer would find compelling enough to grant an interview. The curriculum for the Internet Archive Work Transition Program lasts 9 months and focuses on ‘Learning-to-Work’. This three-phase program was based on lessons learned from the 600+ staff that the Archive has hired over the past 8 years. From these lessons, a program of progressive responsibility, constant feedback and a merit badging system was built to meet this challenge. Miller notes that this is not a make-work program. The work is substantive and needs to be completed to help get content online to share with the global community. “The Internet Archive Texts collections have over 20 million downloads each month and the material digitized by the team maintains our high standard of quality.”


BARM3To ‘grease the skids’ for the Work Transition Program graduates, Hammack and Miller contacted local companies, explaining that the program was not a handout and they weren’t looking for charity. They simply asked for a commitment from employers to grant the graduate an interview. Upon reviewing the program goals and expectations, local businesses including UPS, San Francisco Public library, Costco and others signed on. The first class graduates in February 2015, but already two of the candidates have secured part-time employment.

Hammack is thrilled with the program, adding that “We take people on the worst day of their lives and help them achieve dignity, learn healthy living habits, while getting clean and sober. The Work Transition Program continues this path to recovery by helping them earn a job; a huge accomplishment!”


Special thanks to the teams at Internet Archive: Jesse Bell Digitization Coordinator, and Antoine McGrath, Work Transition Supervisor, and at Bay Area Rescue Mission, headed by Tim Hammack ,Vice- President of Operations. For more information about the program, contact Robert Miller.

Lost Landscapes of San Francisco: Fundraiser Benefitting Internet Archive — Friday, December 19, 2014

FerryBldgFromWaterDuskRick Prelinger’s Lost Landscapes of San Francisco is back for one final performance this year!   Now you can catch this perennially sold-out show and your ticket donation will benefit the Internet Archive, a nonprofit digital library which hosts the Prelinger Collection. Please give generously to support the effort.

Friday, December 19, 2014
6 pm Reception
7:30 pm Film

300 Funston Ave.
San Francisco, CA 94118

Get tickets here!

TouristsGGBopening1936ATripDownMarketStreet1906_1This year’s LOST LANDSCAPES brings together familiar and unseen archival film clips showing San Francisco as it was and is no more. Blanketing the 20th-century city from the Bay to Ocean Beach and the Presidio to Bayview, this screening includes San Franciscans at work and play; early hippies in the Haight; a highly privileged walk on the unfinished Golden Gate Bridge;
newly-discovered images of Playland and the waterfront; families living and playing in their neighborhoods; detail-rich streetscapes of the late 1960s; peace rallies in Golden Gate Park; 1930s color images of a busy Market Street; a selected reprise of greatest hits from years 1-8; and much, much more.

As usual, the viewers make the soundtrack — audience members are asked to identify places and events, ask questions, share their thoughts, and create an unruly interactive symphony of speculation about the city we’ve lost and the city we’d like to live in.

The film begins at 7:30 pm and is preceded by an informal
reception that begins at 6:00 pm.

Archive of Contemporary Music and the Internet Archive Team up to Create a Music Library

bobgeorgeWhen the personal record collection of music producer Bob George hit 47,000 discs, he knew something had to be done.  “I wanted to give them away, but they were mostly punk, reggae and hip-hop,” he recalled, “and no established library or archive was interested.” The only thing to do, it would seem, was to turn his collection into a non-profit archive in New York called the ARChive of Contemporary Music.  29 years later, the ARC is one of the largest popular music collections in the world, with some three million sound recordings, 19,000 music-related books, and millions of photos, press kits and artifacts.  Now this rich musical resource—used primarily by musicologists and the entertainment industry—is teaming up with one of the largest digital libraries in the world, the San Francisco-based Internet Archive, to create a music library that will preserve and provide researcher access to a wide range of music and the rich materials that surround it.

ACMdigitizationPowered by teams of volunteers, the two archives are partnering to digitize CDs and LPs and then use audio fingerprinting to match tracks with metadata from catalogs and other services.  Using Internet Archive scanners, the ARC is digitizing its books and photographs at its New York facility.  When complete, this music library will be a rich resource for historians, musicologists and the general public.

Listening Room

Listening Room

Starting today, the public can listen to millions of tracks for free, including many that are not available in Spotify or iTunes, at the Internet Archive’s new listening room in San Francisco.  “The Internet Archive has allowed us to move forward at unprecedented speed, originally with book scanning and now with the digitization of a wide range of audio formats,” said Bob George.  “The physical records from around the world that the ARC has archived are a unique treasure,” said Brewster Kahle, founder and digital librarian of the Internet Archive. “Soon these records will be studied in new ways because they will be digital as well.”

ACMpullquoteSince 1985, George, the ARC’s co-founder and director, has run the organization in Tribeca, New York City, supported by friends in the music industry including Paul Simon, David Bowie and Nile Rodgers.  The Rolling Stones guitarist Keith Richards endows a collection of blues and R&B recordings there. Filmmakers Martin Scorsese and Jonathan Demme stop by when trying to track down hard-to-find songs.  Yet for most of its almost three decades, the ARC has been a decidedly “analog” experience:  records, CDs and cassette tapes line its walls; to experience a song you usually have to drop a needle into a pristine vinyl groove.  The collaboration with the web-based Internet Archive represents a new direction.  “We feel that our primary mission, to collect and preserve this material, is near completion,” said Bob George. “Now we are seeking ways to allow greater access to this incredible collection.”

Scanning an LP cover

Scanning an LP cover

The Internet Archive may be best known for the 435 billion web pages in its Wayback Machine, but this digital library has always been a place where live music collectors go to preserve concerts on the web.  Its audio collections include some 130,000 live concerts by bands such as the Grateful Dead, Jack Johnson and Smashing Pumpkins—many with more than a million plays. Recently, the ARC shipped 46,000 seventy-eight rpm recordings to the San Francisco-based non-profit, and has donated tens-of-thousands of long-playing records. Music labels Music Omnia and Other Minds are making their entire collections searchable on, in part because the Internet Archive is one of the few online platforms that preserves audio, texts, musical manuscripts, photos and films and makes them accessible forever, for free.

The Internet Archive listening room is now open to the public for free on Fridays from 1-4 pm, holidays excepted, and by appointment at 300 Funston Avenue, San Francisco, CA.  Those interested in donating physical music collections to the ARC or Internet Archive should contact or


Archive-It: Crawling the Web Together

A post by the Archive-It team

Today Phase 1 of the 5.0 release of the Archive-It web application was released for use by the 326 partners using the Archive-It service.

In 1996 when the Internet Archive was founded, we used automated crawlers to capture the web, snapping up millions of web pages and preserving them for history. Ironically, our digital record of humankind was being driven by computer algorithms.

As the years went by, it became clear that we needed people and communities to capture and save what is really and truly important. So in February 2006 we launched the Archive-It service, 1.0, which allowed traditional librarians and archivists to become web archivists by initiating focused, curated crawls of the live web using a simple web application with partner/tech support. Launching Archive-It meant we could help our colleagues create their own web collections for their own libraries and also foster a community around web archiving to work together to build a global digital public library at

Now, as we expand to the next generation of Archive-It with our 5.0 release, we hope to provide even greater tools for collection development. Released this week, 5.0 phase 1 highlights a shiny new user interface and significantly enhanced post-crawl reports that include infographics with visual representations of the data.

representative of the data

Figure 1: Screenshot from the Reports section of the new Archive-It 5.0 user interface

Back in 2006 there was little understanding of web archiving and many organizations were questioning whether this was a valid activity that could or should be a part of their larger institutional collecting strategies. After all, the challenges were staggering: the quality of web content was all over the map; conflicting policies and organizational structures posed challenges; no one had yet established best practices for selecting the content, how to handle metadata, or how to integrate this new type of content into other holdings and existing catalogs at the institution.   Also, back then we could not have predicted the extent to which material that once existed in physical form would now only appear on the web in digital form.

We launched the Archive-It service with a small band of believers and supporters, among them librarians and archivists from Indiana University, University of Texas at Austin, Library of Virginia, Montana State Library, and North Carolina State Archives and State Library. Partners were very patient with us and with Archive-It 1.0, which was bare bones. Collaborating and working with the library and archive community has always been a top priority for the Internet Archive, and a defining characteristic of the Archive-It service. There have been many times during the past 8+ years when we have not known the answer to a question and we say: “Let’s ask the community and see what they think!” And the community has always gotten back to us with supportive answers   – both illustrative and specific.

Figure 2: Screenshot from the North Carolina State Government Web Site Archive of the North Carolina State Archives and State Library of North Carolina.

As time went on, the community of web archivists grew and we were able to produce some compelling answers to the question: why web archive? Here are just a few:

  • To create a thematic or topical web archive
  • To fulfill a mandate to preserve institutional memory and history
  • To archive state or local agency publications no longer being deposited in print form
  • To archive records to meet university or government retention policies
  • To preserve an historical record of an institution’s web and /or social media presence
  • To capture a website before re-design or it is taken offline
  • To archive online art, exhibitions, and artists’ materials


Figure 3: Screenshot from the Latin American Government Documents Archive, LAGDA of the University of Texas at Austin.

Figure 4: Screenshot from the Catalogues Raisonnés collection of the New York Art Resources Consortium (NYARC).

To date in 2014, 326 Archive-It partners have created 2700 public collections on a diversity and range of topics, subjects, events and domains. These collections have become integral to these organizations’ collecting strategies and have helped to raise awareness and understanding about why web archiving is so important.

We like to say that the Archive-It service is both a partner and a vendor. We are a service provider and we strive to consistently deliver a high level of customer support — which we believe partners notice and appreciate. We also strive to be a partner to our community and work collaboratively on initiatives that we share together; a few of which are: a) collaborative efforts around archiving spontaneous events (like the 2011 Japanese Earthquake collection), b) teaching web archiving in graduate level MLIS programs and professional development workshops and c) the K12 Web Archiving program (now in its 7th year) where we work with 3rd to 12 graders around the county and ask them what they would like to archive for future generations. As one of the student archivists put it, “500 years from now, kids will think we were really cool.”

Many of the features and functionality that we see in the Archive-It service today are a direct result of a partner making a suggestion or request. Through face to face brainstorming sessions, online surveys, webinars, and support tickets, partners have expressed their ideas as well as offered constructive criticism. And we have listened.   We hope that as the service continues to grow and we launch Archive-It 5.0 that many of our partners will see themselves in Archive-It. Their collections will continue to be valuable to researchers, historians, scholars and the general public for many years to come.

Here are some links to just a few of those collections on the Archive-It website:

Columbia University’s collection on Human Rights:

National Museum of Women in the Arts’s collection on Contemporary Women Artists on the Web:

University of Alberta’s Circumpolar Collection:

Brigham Young University’s Mormon Missionary Collection:

Stanford University’s collection on Freedom of Information (FOIA):

As we continue down this road – excited for the future and what comes next – we know that it takes a community to archive the web and we look forward to working with our partners to build libraries together.

Millions of historic images posted to Flickr

by Robert Miller, Global Director of Books, Internet Archive


“Reading a book from the inside out!”. Well not quite, but a new way to read our eBooks has just been launched. Check out this great BBC article:

Here is the fabulous Flickr commons collection:

BBC articleAnd here is our welcome to Flickr’s Common Post:

What is it and how did it get done?
A Yahoo research fellow at Georgetown University, Kalev Leetaru, extracted over 14 million images from 2 million Internet Archive public domain eBooks that span over 500 years of content.  Because we have OCR’d the books, we have now been able to attach about 500 words before and after each image. This means you can now see, click and read about each image in the collection. Think full-text search of images!

How many images are there?
As of today, 2.6 million of the 14 million images have been uploaded to Flickr Commons. Soon we will be able to add continuously to this collection from the over 1,000+ new eBooks we scan each day. Dr. Simon Chaplin, Head of the Wellcome Library says, “This way of discovering and reading a book will help transform our medical heritage collection as it goes up online. This is a big step forward and will bring digitized book collections to new audiences.”

What is fun to do with this collection?
Trying typing in the word “telephone’ and enjoy what images appear? Curious about how death has been characterized over 500 years of images – type in “mordis”. Feeling good about health care – type in medicine and prepare to be amazed. Remember, all of these images are in the public domain!

Future plans?
We will be working with our wonderful friends at Flickr and our great Library partners to make this collection even more interesting –  more images, more sub-collections and some very interesting ideas of how to use some image recognition tools to help us learn more about, well, anything!

Questions about this collection, projects or things to come?
Email me at