Author Archives: Brewster Kahle

Building Libraries Together: New Tools for a New Direction

building-logo

(NYtimes on this announcement, video of talks)

Let’s work together to save all human knowledge.  Today the Internet Archive is announcing a new beta site and new tools to encourage everyone to lend a hand.

Prototype Table Top Scribe for scanning books

Prototype Table Top Scribe for scanning books

We were founded in 1996 as an archive OF the Internet; we saved web pages and made them available through the Wayback Machine starting in 2001. In 2002 we became an archive ON the internet when we began digitizing and hosting movies, books, TV, music and software by working closely with libraries and online communities. Much of the work of building the current archive has been done by us and a relatively small number of selected partners.

Today marks a change in direction.

Listening Room

Listening Room

We are creating new tools to help every media-based community build their own collections on a long term platform that is available to the entire world for free. Collectors will be able to upload media, reference media from other collections, use tools to coordinate the activities of their community, and create a distinct Internet presence while also offering users the chance to explore diverse collections of other content.

In this future, communities and libraries will take the central role in building collections, leveraging the tools and storage of the Internet Archive.

Political campaign ads

Political campaign pilot interface

Still in its early development, the Internet Archive is looking for feedback and help in this new direction.  Shaping these tools will be a joint process with our library and community partners.

Introducing new tools today, with further developments to come:

    • Table-top book scanner that works with back-end Archive technology and staff to create beautiful online books
Beta preview of archive.org

Beta preview of archive.org

The Internet Archive needs your help to create and use these tools.   Your donations of time, money, digital and physical materials can help us Build Libraries Together.

Building Music Libraries

The Internet Archive is working with partners to preserve our musical heritage. The music collections started 8 years ago with the etree.org live music recordings and grew when we started hosting netlabels.

Scanning an LP cover

Scanning an LP cover

Now through new efforts and partnerships we have begun to expand and explore the music collections further.  We are working with researchers, record labels, collectors, internet communities and other archives to gather music media, build tools for preservation and expand metadata for exploration.

We have already made tremendous progress. We have archived millions of tracks, we are working with the Archive of Contemporary Music to digitize portions of their extensive collections of physical media, the MusicBrainz.org community has provided meticulous metadata, and researchers from university programs have begun to analyze the music.

Listening Room

Listening Room

A prototype “listening room” in the Internet Archive’s building in San Francisco is available free to the public to listen to the full musical holdings.  Access to these collections will also be provided to select computer science researchers via a secure “virtual reading room” in our data center.  As tools and the collections grow, we will offer everyone access to the metadata to help them explore, and then offer links to commercial sites for listening or purchasing.

We invite interested people to participate:

Archives. The Internet Archive and the Archive of Contemporary Music in New York have started digitizing ACM’s holdings with consistent, high quality, standards-based methods to build a scalable workflow.  We welcome other archives with similar projects, or who would like to help.  “Digitizing our large physical collections is an important step for our archive to allow others to learn from this deep legacy,” said Bob George, Director of the Archive of Contemporary Music, NYC.

ACMdigitization

Digitizing CDs at the Archive of Contemporary Music

Collectors.  Digitize, donate, or lend material for digitization.  Improve metadata or provide context to help others understand the depth and cultural relevance of these collections.  “Recycled Records is happy to have directed the donation of many thousands of LPs to the Internet Archive to help with their projects and for the love of music,” Bruce Lyall, proprietor of Recycled Records.

Labels.  Preserving a complete collection of everything published by a label is best done by or with the record label.  We would like to work with labels to get their releases archived and properly cataloged.  “The upcoming Music Libraries program continues the very work that enables our label, and the musicians who record for us, to bring the music of earlier times to audiences today. We are proud to participate in a tradition of preservation that has brought joy to so many through music.”  said David Fox, Co-founder of Musica Omnia.

Cataloging services.  Commercial and non-commercial cataloging services can participate by making sure there are proper links from and to these collections.  The musicbrainz.org open, community-created catalog has already been very helpful.

ellisquote

Commercial vendors and streaming services.  Links from these collections to commercial services can help users buy and listen to full tracks.  These services might have valuable metadata as well that can help users navigate.

Musicians and bands.  Please create more great works that libraries can preserve and provide access to.  We would like to hear your ideas about making the site useful for both musicians and the general public.

Researchers, historians, and music lovers.  Annotate, organize, datamine, and surface music in the collections, and help us preserve those works not yet in the collections.  “Access to a comprehensive archive of commercial music audio is the key missing link for research relating signal processing to listener behavior,” said Daniel Ellis, professor at Columbia University.  By analyzing the rhythms, keys, instruments, and genres, researchers will help create more complete metadata and aid discovery.

Looking to the future, we hope to expand these shared music collections by uniting the work done by other archives and collectors.  By bringing all of this music and its metadata into a shared library, we hope to bring the richness of our musical heritage to people all over the world.

Visit the Listening Room

Internet Archive
300 Funston Ave
San Francisco, CA 94118
Hours: Fridays from 1-4pm, or by appointment.

If you would like to participate in any way, please email us.

Please Help Protect Net Neutrality

Please stand with the Internet Archive to Protect Net Neutrality by writing to your congressperson.    Today, many organizations are putting “Internet Loading” symbols on their sites to bring awareness to the stakes to those of us that would be at the mercy of the Cable and Phone Companies to selectively slow down our sites for profit or just because they may not like our policies.

China started blocking the Internet Archive again a couple of months ago, we believe, because they do not like our open access policies.    In this way, we have started to understand the power in the hands of the Internet service providers.    Lets keep our access to Internet sites “Neutral” and not at the discretion of companies and governments.

Please write to your congressperson.

Working to Stop Rewriting Copyright Laws via TPP Treaty

The Internet Archive joined Our Fair Deal along with EFF and Public Knowledge to stop the US from using the Trans-Pacific Partnership treaty from changing our copyright laws.   The coalition sent two open letters to TPP negotiators today on critical issues that you can learn about here. Let’s foster open debate and proper process before further changes to copyright laws restrict public access even more.

Please consider joining this coalition.

Bitcoin and the Internet Archive Swag Store

bitcoinrotateSan Francisco Weekly said we are the best Bitcoin Evangelists in their BestOf section.   Fun.

We now accept bitcoin at our Archive swag store.    We continue to offer bitcoins to our emplInternet Archive TShirtoyees as salary, eat sushi for bitcoin next door, supported bitcoin as well as could at our credit union, have a cool honor-based bitcoin ATM (please come and use it), accept bitcoin at movies, as well as graciously accept bitcoins as donations to keep our servers humming.   (We get a few bits every day, thank you!)

Go Bitcoin!

Sushi for Bitcoins

Sushi for Bitcoins

Heartbleed bug and the Archive

Bottom line: The Internet Archive is safe to use.

Internet Archive has always been interested in protecting the privacy of our patrons.  We try not to record IP addresses, and when Edward Snowden showed that traffic going over the open Internet was not safe from government spying we turned on encryption by default on our web services.   Unfortunately, some of the encryption software we use (along with more than half the sites on the internet) was vulnerable due to the “Heartbleed” bug; we have upgraded our software to fix this issue.

A bit more detail:  A common piece of code, OpenSSL, was revealed to have a security bug that allowed anyone on the Internet to probe a vulnerable server and read a set of information that happens to be in RAM in that remote process.   This could be used to read a site’s “private key” which would allow a bad actor that could intercept traffic to impersonate a website via what is called a “man in the middle” attack.   If a site’s past encrypted traffic had been recorded, then it might be possible to go back now with the private key and see what happened in those past web sessions.  If you would like a more thorough explanation of “Heartbleed” you can watch a video overview.

Some of the Internet Archive’s web services did use the vulnerable version of OpenSSL up until yesterday.    At this point the Internet Archive’s services have been upgraded and we will be renewing our private key in case that was compromised.   On some of our services we have used “perfect forward secrecy” so even if our private key had been taken, and someone had recorded past traffic, and if they cared enough to try to then discover what had been read, they would still not be able to get it.   We will be implementing this on all services in the future.   Qualys SSL Labs has a useful report on our site.

Never a dull day!

Software Wanted: Political TV Commercial Detection and Naming

Volunteers needed:   We have a fabulous TV collection, and the US is going into an election period.    We would like to pull out the TV Commercials, including the political ads, and match them with the other occurrences, and then put names on them.    Then we and others can datamine and surface this information.

We hope we could find all ads so we can know when and were they ran. We would like to not just limit this to political ads because sometimes the ads are the best parts of shows, and many ads are stealthy-political.

To help in this process, we have closed caption transcripts of what is said in US TV as well as full resolution TV recordings.   We also often have a rebroadcast of the same program which would likely then have different commercials.    We do have to be careful with this data so, we would like to run this locally in our virtual machine “virtual reading room“.

We tried the open source commercial detector included in MythTV, but it seemed to leave all the commercials in a commercial break in a block.  Also it was not that reliable.   It needs more work.

This is not an easy project, and do not have a budget (yet) to pay for it, unfortunately, so maybe fame and helping the open world.    If you can help in this project, we would appreciate it.

Please leave a comment on this post or send a note to Roger Macdonald, the leader of the TV News project.

Thank you.

Public Access to the Public Domain: Copyright Week

It’s Copyright Week, and many organizations are highlighting the need to make works in the public domain readily accessible. One of the many challenges we face sounds almost paradoxical: works in the public domain are often not publicly available. The Internet Archive hosts several projects to address that concern.

RecapRECAP:  Created by Aaron Swartz and automated by a group at Princeton University, RECAP brings free access to some two million court documents from a million cases.

Google Books: Aaron Swartz collected 900,000 public domain books on Google’s site; we’re currently adding more.

FOIA and Government Documents: The Internet Archive hosts over 160,000 from DocumentCloud, including Freedom of Information Act and other government documents.

Digitization of Public Domain Books: The Internet Archive works with over 500 libraries to digitize public domain books to offer them to the world for free with no restrictions at all. We’re grateful to the libraries that are funding this amazing resource.

fedflix_logoFedflix: This joint venture between the National Technical Information Service and Public.Resource.Org provides free access to 8,700 U.S. government training and historical films such as the film below, Blast Measurement Group in Operation Sandstone.

Servers for the New Year: Thank you!

 

6661008-L

Year-end donations went past our goal of $1 million (almost $1.3m !) — thank you all for donating.    With this money we can buy the ten racks (10 petabytes, 10,000,000,000,000,000bytes) of server space to store the upcoming books music video and webpages we expect for this year.   (Since we serve from a duplicate as well, we have space for about 5PB of data).  We were greatly helped by a generous 3-to-1 match for the contributions made.

A few stats:   We received thousands of individual donations, the vast majority were under $100, and we received 20 that were $1000 or more.    We received 16 bitcoins which translates to $48k including the match.

The notes you left with their donations were heart warming and motivating.   It is wonderful to see how many people want the full breadth of information available to everyone in the world and are willing to put their effort and money behind it.    Still lots to do, and glad there is such a strong community to make it actually happen.

Universal Access to All Knowledge.

Thank you, and lets rock in 2014!