Author Archives: jeff kaplan

Celebrate at the Internet Archive — 1024 — Thursday Oct. 24th

Internet Archive invites you to a fun evening in San Francisco on October 24th for our once-a-year celebration and announcements of new services. (And it just so happens to fall on 1024, which our fellow geeks will recognize as 2^10.)  We will drink and be merry with our friends, then gather together to tell you about the new steps we’re taking to guarantee permanent, free access to the world’s knowledge.  

No More 404s

October 24, 2013
Free Admission, Donations Welcome
6pm – 7pm : Cocktails and Reception
7pm – 8pm : Announcements
300 Funston Ave., San Francisco CA 94118
415-561-6767

Clapper

Please RSVP – we don’t want to run out of wine!

Some of the things we’ll share include:

  • No more broken links. Help wipe out dead links on the Internet with new tools and APIs to replace dead links with archived versions.  Down with 404s!
  • Reader PrivacyQuotable Television News.  A new interface for the TV News Research Service will facilitate journalists, bloggers and your news-addicted relatives to search, quote short clips and borrow from a massive, searchablearchive of U.S.television news programs.
  • Reader Privacy for All.  We are helping to protect the reading habits of our users from prying eyes by increasing encryption and keeping less user data.
  • Data tape with HamurabiBringing Old Software Back to Life.  First steps to bring the software for Apple II’s, Commodore 64’s etc back from cassette and to the web.
  • Petabytes, Gigabits, and More.  Come see for yourself!

Over 7,000 Free Audio Books: Librivox and its New Look!

Librivox logoIn 2005, Hugh McGuire asked:
“Can the net harness a bunch of volunteers to help bring books in the public domain to life through podcasting?”

The answer is yes. Thanks to the help of many, LibriVox, the nonprofit organization he leads, has made tremendous progress in producing and distributing free audiobooks of public domain work.

The LibriVox site has recently undergone a major facelift, making it far easier to browse and find great public domain audiobooks. In addition, the underlying software that helps thousands of volu

nteers contribute to LibriVox has been completely rebuilt. This rebuild project was funded by the

Andrew W. Mellon Foundation, and donations from the public. LibriVox continues to use the Internet Archive to host all it’s audio and web infrastructure.

Thanks to:

The thousands of volunteer readers who bring over 100 new books a month originally in Project Gutenberg, and other public domain sources (including, of course, the Internet Archive) to the listening public.

With over 7,000 audio books, LibriVox is one of the largest publishers of audiobooks in the world, and certainly the largest publisher of free public domain audiobooks.

The Millions of Listeners who download over three million LibriVox audiobooks every month.

The Andrew W. Mellon Foundation, and Don Waters at their Scholarly Communications and Information Technology programme, for providing funding for the revamp of the LibriVox website, and underlying technology that runs the project.

Free Hosting by the Internet Archive.

Pro bono Legal services from Diana Szego of Orrick, Herrington & Sutcliffe.

And the relentless good cheer of Hugh McGuire who over the last eight years has created this fabulous service, and continued to make contributions to open (e)book publishing with PressBooks.com@hughmcguire

Please donate!
This project needs ongoing support for servers and software upgrades.

Scheduled outage tonight, Sept 3

Our backup data center in Richmond, CA will experience an Internet outage some time between 10pm and 6am PST tonight. We should be able to keep the site up during this outage (though the site will be read only), but please be aware that there might be unforeseen issues that could affect accessibility. You can check our twitter feed @internetarchive for updates in case of problems.

Job Posting: Web Application/Software Developer for Archive-It

The Internet Archive is looking for a smart, collaborative and resourceful engineer to lead and do the development of the next generation of the Archive-It service, a web based application used by libraries and archives around the world. The Internet Archive is a digital public library founded in 1996. Archive-It is a self-sustaining revenue generating subscription service first launched in 2006.

Primary responsibilities would be to extend the success of Archive-It, which librarians and archivists use to create collections of digital content, and then make them accessible to researchers, scholars and the general public.  Widely considered to be the market leader since its’ inception, Archive-It’s partner base has archived over five billion web pages and over 260 terabytes of data.  http://archive-it.org

Working for Archive-It program’s director, this position has technical responsibility to evolve this service while still being straightforward enough to be operated by 300+ partner organizations and their users with minimal technical skills. Our current system is primarily Java based and we are looking to help build the next-generation of Archive-It using the latest web technologies. The ideal candidate will possess a desire to work collaboratively with a small internal team and a large, vocal and active user community; demonstrating independence, creativity, initiative and technological savvy, in addition to being a great programmer/architect.

The ideal candidate will have:


  • 5+ years work experience in Java and Python web application development
  • Experience with Hadoop, specifically HBase and Pig
  • Experience developing web application database back-end (SQL or NoSQL).
  • Good understanding of latest web framework technologies, both JVM and non-JVM based, and trade-offs between them.
  • Strong familiarity with all aspects of web technology and protocols, including: HTTP, HTML, and Javascript
  • Experience with a variety of web applications, machine clusters, distributed systems, and high-volume data services.
  • Flexibility and a sense of humor
  • BS Computer Science, or equivalent work experience

Bonus points for:

  • Experience with web crawlers and/or applications designed to display [archived] web content (especially server-side apps)
  • Open source practices experience
  • Experience and/or interest in user interface design and information architecture
  • Familiarity with Apache SOLR or similar facet-based search technologies
  • Experience with the building/architecture of social media sites
  • Experience building out a mobile platform

To apply:

Please send your resume and cover letter to kristine at archive dot org with the subject line “Web App Developer Archive-It”.

The Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted. No phone calls please!

We are an equal opportunity employer.

Open Call for tumblr Collaborators

Screen Shot 2013-04-08 at 3.25.52 PMWhen it comes to collaborative culture, tumblr is where it’s at – and we’re ready to jump in. We’re not going to just redirect this blog, though, we’re opening up our tumblr URL to anyone interested in messing around with our content.

We’re looking at this as an opportunity to show the world some of the amazing stuff we’ve collected – over 10 petabytes of information just waiting to be juxtaposed, made into macros, remixed, glitched, written on, moshed, analyzed, sequenced and combined in ways we haven’t dreamed of.

We will be accepting 52 people. We’ll be here to offer support and guide them in their exploration with content and code, then we’ll feature their finished work for a week on the official tumblr. Each person’s residency will also be archived, of course. That’s what we do!

Check out http://internetarchive.tumblr.com for more details and an application form.

Memorial for Aaron Swartz in SF at Internet Archive Thurs 7pm

 

Dear Friends,Please join us as we gather to remember Aaron Swartz on the evening of Thursday, January 24th.

Reception at 7:00pm
Memorial at 8:00pm
at the Internet Archive
300 Funston Avenue
San Francisco 94118

Speakers will include Danny O’Brien, Lisa Rein, Peter Eckersley, Molly Shaffer Van Houweling, Cindy Cohn, Brewster Kahle, Tim O’Reilly, Elliot Peters, Alex Stamos, and Carl Malamud; there will be an opportunity for brief remembrances.

Please consider RSVPing so that we know how many people to expect. If you are unable to join us, you can watch a live stream of the event.

From Aaron’s friends at: Creative Commons, Electronic Frontier Foundation, Noisebridge, Internet Archive, Wikimedia Foundation, Stanford Center for Internet and Society, O’Reilly and Blurryedge.

Internet Archive & EFF successfully block Washington State law

Earlier this year the Internet Archive with EFF’s help joined a suit to challenge the enforcement of a new Washington state law, SB 6251. While the law was intended to curb advertising for underage sex workers, the language was overly broad and made online service providers and libraries criminally liable for providing access to third parties’ offensive materials, which is in conflict with federal law.

We have learned today that the challenge was successful— the law is permanently blocked. You can read more about the case on EFF’s site.

Associated Press article on this.

Launch of the DigiBaeck Project

DigiBaeck

The Internet Archive, working with the Leo Baeck Institute, is pleased to be a part of the Oct 16, 2012 launch of their DigiBaeck project, a massive (formerly print) archival collection of history pertaining  to German speaking Jewry.

Robert Miller, Global Director of Books for the Internet Archive states that “digitizing over 4,000 linear feet of material whose scope ran the gamut of post cards from Berlin to letters from Auschwitz was both empowering and humbling at the same time.” He continues, “One of my staff, who worked on the collection, family was from Poland and suffered terribly during the Holocaust. Being able to assist in putting these original documents online was cathartic for her.”

The Leo Baeck Institute helped teach Miller’s teams in Princeton, NJ and San Francisco, CA. how to work with and handle unique and high value archival materials. And he and his staff helped teach Leo Baeck how to move from print to on-line pixels. It was a true partnership in every sense of the word.

Brewster Kahle, founder of the Internet Archive, states, “it is collections going public like Leo Baeck’s that remind us of the adage that collections that remain private or not digital are for all intents and purposes extinct. I applaud Leo Baeck for the direction they have taken.”

Baeck Institute logoLinks to the Internet Archive’s copy of the the Leo Baeck Material may be found at archive.org/details/LeoBaeckInstitute and details about the Leo Beck Collection may be found on their site at www.lbi.org/digibaeck.

The link to the New York Times Piece may be found here at http://artsbeat.blogs.nytimes.com/2012/10/09/archive-of-jewish-life-in-central-europe-going-online/.

Our Ten Petabyte Party: Live Streamed or In Person! Thurs Oct 25th 6-7:30PT

Please join us for a free reception and short presentations, Thursday, October 25th from 6 to 7:30pm, in person, or live streamed at http://toc.oreilly.com/:

  • petaboxTelevision News Broadcasts are now Searchable (350,000 of them!)
  • All of Balinese Literature now online and more books in the Lending Library
  • Digital Archive of Japan 2011 disaster
  • Hundreds of newly digitized Home movies and other ephemeral films

and, drum roll,

  • Ten Petabytes (10,000,000,000,000,000 bytes) of cultural material saved!

** this just in**  Don Knuth will be playing the organ as we start the event!

This will be a fun party that celebrates the community that is building and supporting this astonishing library.

Lets bring millions of books, music, movies, software and web pages online to over 2 million people every day and celebrate the 10,000,000,000,000,000th byte being added to the Archive.

Invite anyone and everyone.

Thursday, October 25th
Cocktail Reception at 6PM
Presentations 6:30-7:15PM

Location: Internet Archive
300 Funston Ave, San Francisco, CA 94118
415.561.6767

Please RSVP to June at RSVP@archive.org

archive building