What’s new with v2

As many of you have already seen, we are working on the next generation of the archive.org web site, which we call Version 2.0 (v2). It’s in beta right now, so go check it out!


Version 1 (v1) showing the banner to try the BETA Version 2 (v2)

We get a lot of feedback from the people who have elected to try out v2, and we read ALL of it. As themes emerge about what people are having trouble with, we make changes to the design and then we pay attention to subsequent feedback to try to gauge whether we solved the problem (or not).


Volume prepended to title

The goal of this redesign is to make the site more inviting and easier to use. Right now our work is focused on how the site looks and how things are organized on the page. For the most part, everything that is available to you in Version 1 (v1) of the site is available to you in v2 – but those things may be in different places!

Rights information displayed in About tab

Rights information displayed in About tab

We have a lot of long-time users of the site, and we know that any major changes will cause them to have to relearn where things are and how to accomplish the things they already know how to do on v1. This kind of major change can be very annoying, so we’re working hard to make sure you only need to relearn things once. While we will be adding more features as time goes by, we expect those changes to be incremental and not to affect the basic layout of pages.

If you’ve been using v2, you’ve probably noticed some changes over the last few weeks. I’ll discuss some of those changes here, and some of them are highlighted in the included images.


The collection About tab contains a longer description, info about contributors, and stats for reviews, forums, views and items

Volume information.  We have a lot of journals and books with Volume information that was not showing in search, collection or account pages. The volume information is now prepended to the title for easier visual scanning within a collection.

Live Music. Rights information for a collection is now displayed on the About tab. We also changed the way shows are described in band collections to list the date and venue before the band name, making it easier to visually scan the items in a collection.

Mobile. On most mobile devices we decreased the initial number of search results from 50 to 25 in order to lighten the page load time.

Collections Page

Go to list view for a collection and click the "Show details" checkbox

Go to list view for a collection and click the “Show details” checkbox

Collection description. The description area for the collection at the top of the page has been shortened. We encourage collection builders to add useful descriptions, and you can see the additional information in the new About tab.

Click to see additional collections for an item

Click to see additional collections for an item

About tab. The About tab replaces the Contributors tab. We wanted to have a place for all of the information about a collection, and “Contributors” didn’t cover it. The new About tab contains the longer description for a collection, rights information (when it exists), data about how many reviews and forum posts are in that collection, and the content from the previous Contributors tab – the collection creator, people who have added to the collection, and charts for Views and Items over time.  You will also find related collections listed on the About tab below the graphs. Parent collections and subcollections still show up in the Collections tab, since they are part of a collection’s direct hierarchy.

The See All Files page

The See All Files page

Collection tab. The Collection tab has a few changes as well. In list view, you can now “show details” for each item if you want to see more information.

Item Pages

Additional collections. If an item belongs to more than one collection, you can choose to view those additional collections.

Upload tile on user account page

Upload tile on user account page

Stream only. When an item is not available for download, you will see a “Stream Only” notification where the “Download” button normally appears. We made some visual changes to this notification to make it seem less button-like.

Favorites list sorted by Date Favorited

Favorites list sorted by Date Favorited

See All Files. In the “see all files” view, “playable” media files are pushed to the top, just under the “all files” options for torrent and zip. Files are grouped logically, with the original first and bolded and the derivative files listed below.

User Account Page

uploadicon Uploads. Your Uploads tab has a new “Upload” tile in it, just to make uploading easier to find. You can still upload from anywhere on the site by clicking the upload icon at the top of the page, of course.

Favorites. Your Favorites list (called bookmarks in v1) will now display your favorites sorted by “date favorited” so that you can see your most recently favorited items first.

Tell Us!

As always, please use the Beta feedback link in the top right corner to let us know what you think.  Is everything awesome?  Are you confused about where to find something?  Tell us!

If you’re interested in a more detailed running log of changes from our lead developer, Tracey Jaquith, you can get the “nerd version” here: https://archive.org/CHANGELOG.txt

This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.

Posted in Archive Version 2 | Comments Off on What’s new with v2

Locking the Web Open, a Call for a Distributed Web

Presentation by Brewster Kahle, Internet Archive Digital Librarian at Ford Foundation NetGain gathering, — a call from 5 top foundations to think big about prospects for our digital future.  (More detailed version)

Hi, I’m Brewster Kahle, Founder of the Internet Archive. For 25 years we’ve been building this fabulous thing—the Web. I want to talk to you today about how can we Lock the Web Open.

Code=LawOne of my heroes, Larry Lessig, famously said that “Code is Law.” The way we code the Web will determine the way we live online. So we need to bake our values into our code.

Freedom of expression needs to be baked into our code. Privacy should be baked into our code. Universal access to all knowledge. But right now, those values are not embedded in the Web.

IA_serversIt turns out that the World Wide Web is very fragile. But it is huge. At the Internet Archive we collect 1 billion pages a week. We now know that Web pages only last about 100 days on average before they change or disappear. They blink on and off in their servers.

map_China_RussiaAnd the Web is massively accessible, unless you live in China. The Chinese government has blocked the Internet Archive, the New York Times, and other sites from its citizens. And so do other countries every once in a while.

Censorship_flic.kr_p_gZZRQvSo the Web is not reliableAnd the Web isn’t private. People, corporations, countries can spy on what you are reading. And they do. We now know that Wikileaks readers were targeted by the NSA and the UK’s equivalent. We, in the library world, know the value of reader privacy.

It is FunBut the Web is fun. We got one of the three things right. So we need a Web that is Reliable, Private but is still Fun. I believe it is time to take that next step. And It’s within our reach.

Imagine “Distributed Web” sites that are as functional as Word Press blogs, Wikimedia sites, or even Facebook. But How?

Tubes_flic_kr_p_89HvvdContrast the current Web to the internet—the network of pipes that the World Wide Web sits on top of. The internet was designed so that if any one piece goes out, it will still function. The internet is a truly distributed system. What we need is a Next Generation Web; a truly distributed Web.

Peer2PeerHere’s a way of thinking about it: Take the Amazon Cloud. The Amazon Cloud works by distributing your data. Moving it from computer to computer—shifting machines in case things go down, getting it closer to users, and replicating it as it is used more. That’s a great idea. What if we could make the Next Generation Web work that, but across the entire internet, like an enormous Amazon Cloud?

In part, it would be based on Peer-to-peer technology—systems that aren’t dependent on a central host or the policies of one particular country. In peer-to-peer models, those who are using the distributed Web are also providing some of the bandwidth and storage to run it.

Instead of one web server per website we would have many. The more people or organizations that are involved in the distributed Web, the safer and faster it will become. The next generation Web also needs a distributed authentication system without centralized log-in and passwords. That’s where encryption comes in.

PrivateAnd it also needs to be Private—so no one knows what you are reading. The bits will be distributed—across the Net—so no one can track you from a central portal.

 MemoryAnd this time the Web should have a memory. We’d build in a form of versioning, so the Web is archived thru time. The Web would no longer exist in a land of the perpetual present.

Plus it still needs to be Fun—malleable enough spur the imaginations of a millions of inventors. How do we know that it can work? There have been many advances since the birth of the Web in 1992.

Blockchain_JavaWe have computers that are 1000 times faster. We have JAVAScript that allows us to run sophisticated code in the browser. So now readers of the distributed web could help build it. Public key encryption is now legal, so we can use it for authentication and privacy. And we have Block Chain technology that enables the Bitcoin community to have a global database with no central point of control.

NewWebI’ve seen each of these pieces work independently, but never pulled together into a new Web. That is what I am challenging us to do.

Funders, and leaders, and visionaries– This can be a Big Deal. And it’s not being done yet! By understanding where we are headed, we can pave the path.

DistributedWebLarry Lessig’s equation was Code = Law. We could bake the First Amendment into the code of a next generation Web.

We can lock the web open.
Making openness irrevocable.
We can build this.
We can do it together.

Delivered February 11, 2015 at the Ford Foundation-hosted gathering: NetGain, Working Together for a Stronger Digital Society

Posted in Announcements, News | Tagged , , , , , | 14 Comments

Internet Archive Supports Critical Updates to Electronic Privacy Law in California

The California Electronic Communications Privacy Act (CalECPA), a newly introduced bill in California, would help bring state law up to date and require law enforcement to get a warrant before searching private online accounts or personal electronic devices. The Internet Archive is pleased to join a long and diverse list organizations and companies supporting CalECPA. To learn more, see write-ups by State Senator Mark Leno’s office, the ACLU of California, and the Electronic Frontier Foundation.

Posted in News | Comments Off on Internet Archive Supports Critical Updates to Electronic Privacy Law in California

$4 Million Available for Digitization in 2015 Application Deadline is April 30th Let’s Apply Together!

Internet Archive wants to partner with you to bring your ‘Hidden Collections’ into the public domain and become part of a global digital library!

The Council on Library and Information Resources (CLIR) with generous support from the Andrew W. Mellon Foundation has launched Digitizing Hidden Special Collections and Archives: Enabling New Scholarship through Increasing Access to Unique Materials.

This competition will award up to $4 Million to institutions, consortia and collaborative groups to digitize and provide access to collections of rare and ephemeral material with high scholarly value.

CLIR endeavors that “Digitizing Hidden Collections will enhance the emerging global digital research environment in ways that support new kinds of scholarship for the long term,ensuring that the full wealth of resources held by institutions of cultural memory becomes integrated with the open Web” (http://www.clir.org/hiddencollections/about-the-program).The focus of these grants is to bring entire collections into the public domain,while promoting strategic partnerships and best practices for ensuring preservation and accessibility that is both stable and enduring.

Grants of between $50,000 and $250,000 for a single-institution project, or between $50,000 and $500,000 for a collaborative project may be sought for work beginning between January 1st and June 1st, 2016 and be completed by May 31st, 2019. (http://www.clir.org/hiddencollections/applicants)

How Can the Internet Archive Digitization Team Help?


Let’s Cooperate on Your Grant Together – marry your great content with our end-to-end digitization skills to get your content up online safely and inexpensively.

We offer a Total Digitization Solution. Starting with non-destructive image capture, to storage and preservation, and ending with online discovery and access, our digitization solution saves you from having to worry about these details.

Translatable Metadata. Our existing relationship with Digital Public Library of America provides a possible route for your materials to join DPLA’s growing national collection.

Our Global Team Digitizes over 1000 eBooks and items every day. No need to reinvent the wheel. With our experience, training and engineering skills, we supply an end-to-end solution that allows our library partners and content contributors to focus on developing their collections, not on the back end details. For those new to digitization, we have the skills to help you avoid the common and costly mistakes of starting up a project.

We Don’t Just Digitize Books! Over the last decade, our format capabilities have expanded to: archival finds/ ephemera; microfilm and microfiche; audio; film and video; TV News; software and web. Let’s also apply together for grants to digitize other formats!

Many of Our Partnerships Have Been Consortial. We are proud to have driven projects for the Boston Library Consortium (BLA), LYRASIS, Consortium of Academic Libraries in Illinois (CARLI), Biodiversity Heritage Library and Ontario Council of University Libraries (OCUL), among others. This means collections can be contributed by more than one institution, with funding issued centrally and distributed locally.

Far-flung Collections Come Together With Internet Archive. Our collections gather material from international contributors in one place; in the public domain. In some cases this has meant repatriating material digitally across great distances. Highlights include collections from the Medical Heritage Library, Biodiversity Library and Genealogy (in collaboration with FamilySearch).

Preparing Your Grant—What can Internet Archive Do?


Large and Small-Scale Digitization Capabilities. Take advantage of our experience working with collection sizes – ranging from hundreds of thousands of items to unique collections with only dozens of one-of-a-kind monographs.

We Can Tailor The Project to Your Needs. Having worked with over 1275 content providers during the last decade, our processes can be adjusted to meet your requirements.

Our Equipment and Software has been tested and Proven. Our non-destructive digitization process can be done inside your library by IA staff, or in one of our regional centers. The images can even be captured by you! We have a new Table Top Scribe system that can be purchased if your institution wishes to do the image capture in-house. It is portable, easy to use, and uploads material directly to archive.org. Our service package provides the technical back-end processes including preserving and ‘future-proofing’ your digital data 25 years, AND organizing your collections online so they can be discovered and used for scholarly research.

Our Digitization Specifications Have Become the De Facto Library Standard. Over 1,500 global libraries have used our services to digitally preserve, and importantly, make their material accessible. Our partners include 25 of the top 30 largest research and national libraries in North America.

Our Staff is located in 33 Locations, Including 26 Sites in North America. With this geographic footprint, your materials don’t have to travel far if you choose to have it digitized in one of our specialized digitization centers. This also provides opportunities to submit a grant proposal where the content might be located in 2 or 3 different libraries.

Let’s think big and make collections vital for scholarship and cultural heritage available to the world!

Want to know more? Attend the the upcoming webinars for applicants on February 4th and March 4th, 2015 from 2-3pm Eastern Time. (https://clir.adobeconnect.com/_a960001693/hiddencollections/)—looking forward to the resulting conversations, and we hope to see you there!

For more information about working with Internet Archive, contact Robert Miller.

Posted in Books Archive, Hardware, News | 3 Comments

Knight Foundation to Support Toolsets for Building Libraries Together


Last September, the John S. and James L Knight Foundation issued this challenge:  “How might we leverage libraries as a platform to build more knowledgeable communities?” Today we are proud to announce that the Internet Archive’s plan for “Building Libraries Together” will make archive.org more community-driven, with a major grant from the kflogo-tag-3000pxKnight Foundation.  The Knight Foundation is the leading funder of journalism and media innovation, seeking to promote informed and engaged communities.

At the Internet Archive, we know we can’t preserve the world’s knowledge alone.  We will need the public’s help to curate our shared human culture.  So we are embarking on a two-year project to build a toolset and user interface that allow communitBuildingLibrariesTogetheries outside the Archive to save, manage and share their cultural treasures— further democratizing access to all knowledge. Citizen-archivists will be able to build collections, enhance metadata and join like-minded communities in deciding what of our history gets archived and made accessible to everyone, forever, for free.

A look at the Internet Archive's software library in the new user interface.

A look at the Internet Archive’s software library in the new user interface.

What Wikimedia did for encyclopedia articles, the Internet Archive hopes to do for collections of media:  give people the tools to build library collections together and make them accessible to everyone.

Please try out our new beta-version of our site here.

Posted in Announcements, Archive Version 2, News | 1 Comment

archive.org download counts of collections of items updates and fixes

Every month, we look over the total download counts for all public items at archive.org.  We sum item counts into their collections.  At year end 2014, we found various source reliability issues, as well as overcounting for “top collections” and many other issues.

archive.org public items tracked over time

archive.org public items tracked over time

To address the problems we did:

  • Rebuilt a new system to use our database (DB) for item download counts, instead of our less reliable (and more prone to “drift”) SOLR search engine (SE).
  • Changed monthly saved data from JSON and PHP serialized flatfiles to new DB table — much easier to use now!
  • Fixed overcounting issues for collections: texts, audio, etree, movies
  • Fixed various overcounting issues related to not unique-ing <collection> and <contributor> tags (more below)
  • Fixes to character encoding issues on <contributor> tags

Bonus points!

  • We now track *all collections*.  Previously, we only tracked items tagged:
    • <mediatype> texts
    • <mediatype> etree
    • <mediatype> audio
    • <mediatype> movies
  • For items we are tracking <contributor> tags (texts items), we now have a “Contributor page” that shows a table of historical data.
  • Graphs are now “responsive” (scale in width based on browser/mobile width)


The Overcount Issue for top collection/mediatypes

  • In the below graph, mediatypes and collections are shown horizontally, with a sample “collection hierarchy” today.
  • For each collection/mediatype, we show 1 example item, A B C and D, with a downloads/streams/views count next to it parenthetically.   So these are four items, spanning four collections, that happen to be in a collection hierarchy (a single item can belong to multiple collections at archive.org)
  • The Old Way had a critical flaw — it summed all sub-collection counts — when really it should have just summed all *direct child* sub-collection counts (or gone with our New Way instead)


So we now treat <mediatype> tags like <collection> tags, in terms of counting, and unique all <collection> tags to avoid items w/ minor nonideal data tags and another kind of overcounting.


… and one more update from Feb/1:

We graph the “difference” between absolute downloads counts for the current month minus the prior month, for each month we have data for.  This gives us graphs that show downloads/month over time.  However, values can easily go *negative* with various scenarios (which is *wickedly* confusing to our poor users!)

Here’s that situation:

A collection has a really *hot* item one month, racking up downloads in a given collection.  The next month, a DMCA takedown or otherwise removes the item from being available (and thus counted in the future).  The downloads for that collection can plummet the next month’s run when the counts are summed over public items for that collection again.  So that collection would have a negative (net) downloads count change for this next month!

Here’s our fix:

Use the current month’s collection “item membership” list for current month *and* prior month.  Sum counts for all those items for both months, and make the graphed difference be that difference.  In just about every situation that remains, graphed monthly download counts will be monotonic (nonnegative and increasing or zero).



Posted in Audio Archive, Books Archive, Education Archive, Image Archive, Live Music Archive, Movie Archive, Music, Software Archive, Technical, Video Archive | Tagged , , , | Comments Off on archive.org download counts of collections of items updates and fixes

Community Wireless

The Internet Archive’s mission is universal access to knowledge.   For us, that access happens over the Internet. In many places, there are two or few providers of fast Internet access, which tends to lead to high prices, bad service and makes censorship too easy. We would like to see more options and are doing something where we can: in places where we own buildings, the Internet Archive provides free and fast Internet access. Currently, we cover parts of San Francisco and Richmond, California with Community Wireless.  Our most recent community project is with Atchison Village, in Richmond.

There are two layers to this, an access layer that anyone can connect to with WiFi devices, and a backbone layer that connects the access layer to the Internet at large. The backbone layer is built and operated by the Internet Archive. We monitor its performance and upgrade parts as needed.

The access layer is largely build in a crowd-sourced manner by willing participants. Anybody can connect with their own WiFi devices. The Internet Archive recommends specific devices that we know work well, but access is not limited to those. We also recommend connecting rooftop-to-rooftop; while rooftop-to-couch might work for some people, best results are achieved with devices mounted outdoors with line-of-sight to the closest access point.

Participants will be responsible for their own devices, including purchasing them, mounting them, pointing them and keeping them powered. For recommended devices the Internet Archive can provide initial configurations. If such a device’s configuration is changed, it is the participants responsibility to make it work.

There are a few caveats: Both layers operate in unlicensed frequency bands where interference is common and expected. The network is also a shared resource. Thus, experienced bandwidth and latency can and do vary. The Internet Archive will do a best effort to keep the backbone running well, but we cannot guarantee specific performance metrics. Also, over time expectations of what is an acceptable speed tend to go up. For this reason, we recommend upgrading devices about every three years, just like computers and phones.

Posted in News | Comments Off on Community Wireless

The New Yorker: The Cobweb–Can the Internet be archived?

Harvard history professor and New Yorker staff writer, Jill Lepore, has crafted a remarkable history of Web archiving–and the role of our own Brewster Kahle and the Wayback Machine.

Screenshot 2015-01-20 18.43.27

From the January 26, 2015 edition of The New Yorker.

My favorite passage:

Where is the Internet’s memory, the history of our time?

“It’s right here!” Kahle cries.

The machine hums and is muffled. It is sacred and profane. It is eradicable and unbearable. And it glows, against the dark. 

It’s well worth a read here.


Posted in Announcements, News | 5 Comments

University of California Libraries to partner with Archive-It

cdl_logoThis week, the University of California California Digital Libraries and the UC Libraries announced a partnership with Internet Archive’s Archive-It Service.

In the coming year, CDL’s Web Archiving Service (WAS) collections and all core infrastructure activities, i.e., crawling, indexing, search, display, and storage, will be transferred to Archive-It. WAS partners have captured close to 80 terabytes of archived content most of which will be added to the 450 terabytes Archive-It partners have collected.

We are excited to work with CDL as we transition over the UC (and other) libraries to the Archive-It service. These UC libraries have unique and compelling collections (some dating back to 2006) including their Grateful Dead Web Archive: http://webarchives.cdlib.orggdarchive/a/gratefuldead which of course fits in quite nicely with the Internet Archive’s large collection of downloadable and streamed Grateful Dead shows in our Live Music Archive.

By collaborating with CDL, Archive-it can continue to expand the core functionalities of web archiving and work with CDL and other colleagues to develop new tools to advance the use of web archives. Such collaboration is sorely needed at this juncture and we welcome the opportunity to expand the capabilities of web archiving. By working together as a community we can create useful and sustainable web archives and ensure growth in the field of web archiving.

Be sure and check out some of the CDL collections:

Archiving the LGBT Web: Eastern Europe and Eurasia- UCB: http://webarchives.cdlib.org/a/lgbtwebeasterneurope
Federal Regional Agencies in California Web Archive- UC Davis: http://webarchives.cdlib.org/a/uscalagencies
Salvadoran Presidential Election March 2009 – Web Archive- UC Irving: http://webarchives.cdlib.org/a/salvador
2009 H1N1 Influenza A (Swine Flu) Outbreak- UC San Diego: http://webarchives.cdlib.org/a/h1n1
California Tobacco Control Web Archive- UCSF http://webarchives.cdlib.org/a/caltobaccocontrol

Posted in Announcements, Archive-It, News | 2 Comments

Mirroring the Stone Oakvalley Music Collection


The Internet Archive has begun mirroring a fantastic collection of music called the “Stone Oakvalley Music Collection”. When you visit one of their websites, the archive.org mirror is one of the choices for download. Going forward, the Archive will offer a full backup of the entire site (over a terabyte) for permanent storage.

Why the Stone Oakvalley Collection is important

Manufactured from the early 1980s to the mid 1990s, the Commodore 64 computer was a revolutionary piece of hardware and a critical introduction to programming for generations. It also had, within its design, a very well-regarded sound chip: the 6581/8580 SID (Sound Interface Device), whose unique properties in wave generation and effects gave a special sound in the hands of the right developers and musicians.



This successful piece of hardware was manufactured in the millions across the life of the C64, and in the late 1980s, the introduction of the Commodore Amiga computer brought to life an improved chipset for generating sound; the 8364, or PAULA. With a range of improvements to what sounds and music could come out of this chip, the Amiga soared with capabilities that took years to match in other machines.

paula8364The Archive hosts many examples of music generated by these chips: our C64 Games Archive has videos in the hundreds of games played on a Commodore 64, and searching for terms like “Amiga Music”, “Chiptunes” and “C64 Music” will yield a good amount of sound to enjoy.

But nothing comes close to the Stone Oakvalley Collection in terms of breadth, dedication, and craft in ensuring the unique sound of these chips can be enjoyed in the future.


The process, which is documented here, involved setting up a large amount of Commodore hardware connected to servers which would reboot the machines, over and over, playing thousands of pieces of music in different configurations, and automatically cataloging and saving the resulting waveforms. Considerations for modifications of the chipset over the years, of stereo versus mono recordings, and verification of the resulting 400,000 files have provided the highest quality of snapshots of this period.

Browsing the Collection

Currently, there are two websites for Stone Oakvalley’s collection – one based around the C64, and the other based around the Amiga.  Impeccable work has been done to catalog the music, so if there are songs or games you remember, they are likely to be saved on the site (and powered from Archive.org’s servers). Otherwise, browse the stacks of the sites and enjoy a soundscape of computer history.

The Internet Archive strives to provide universal access to the world’s knowledge. Through mirroring, hosting and gathering of data, our mission allows millions to gain ad-free, fast access to information and materials. Be sure to check our many collections on our main site.

Posted in Cool items, Software Archive | Comments Off on Mirroring the Stone Oakvalley Music Collection

Update to Terms of Use

Terms difInternet Archive’s terms of use were written in March of 2001, and they haven’t changed once – until today.  The terms were written before the Wayback Machine was launched (in October 2001) when we had 4 billion web pages with no public access and 360 Prelinger Archive movies in the archive.  Now we have 435 billion web pages and more than 15 million public audio, video and text items.  Times have changed, and we have made a small change to our terms to reflect this.

In the interest of transparency, we want to show you exactly what the change is.

We have made small changes in paragraphs two and three of the terms.  The previous version of these sections is in red below:

“…You agree not to interfere with the work of other users or Archive personnel, servers, or resources. Further, you agree not to recirculate your password to other people or organizations or to copy offsite any part of the Collections without written permission. Please report any unauthorized use of your password promptly to info@archive.org…

“…You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you certify that your use of any part of the Archive’s Collections will be noncommercial and will be limited to noninfringing or fair use under copyright law. In using the Archive’s site, Collections, and/or services…”

This is the new version with the changed portion in green type:

“…You agree not to interfere with the work of other users or Archive personnel, servers, or resources. Further, you agree not to recirculate your password to other people or organizations. Please report any unauthorized use of your password promptly to info@archive.org…

“…You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you certify that your use of any part of the Archive’s Collections will be limited to noninfringing or fair use under copyright law. If a Creative Commons or other license has been declared for particular material on the Archive, to the extent you trust the declaration and declarer (which is rarely the Internet Archive), you may use the content according to the terms and conditions of the applicable license. In using the Archive’s site, Collections, and/or services…”

Thank you for continuing to use the amazing resources housed in the Internet Archive.

UPDATE 12/31/14:  The change on 12/30 applied to the language in the third paragraph of the terms.  On 12/31 we made an additional small change to the language in the second paragraph, and modified the text of this post to reflect both changes.

Posted in News | 2 Comments

Burning Brewster’s Bitcoin

[Guest post, hope you enjoy. -brewster]

Burning Brewster’s Bitcoin
First Installment – Coinbase offers a service that is contrary to everything the company professes to hold dear
Internet Archive
Morgen E. Peck

This fall, Brewster reached out to me with a proposition. He wanted to know more about what it’s like moving between bitcoin and fiat currencies—where the trades are happening, which ones are scams and which ones are legit, how long they take to go through, how much of my privacy I have to forfeit, and especially what kinds of fees traders are skimming from each individual transaction. In short, what’s it like for people who have no bitcoin and want to get in? And once they do get in, what options do they have?

To get the answers, Brewster sent me on my way with one bitcoin. He told me to sell it and buy it again in as many ways as possible, and not to come back until I had whittled his money down to nothing.

So. This is the mission. Find out how many licks it takes to get to the center of a bitcoin, or lose it all to thievery and grift (crrrruuunch!!!). We’ll be running updates on my progress through this blog with the hope of informing casual bitcoin users and digital currency gurus alike.

______________________________________________________________________ First Stop: Coinbase

Coinbase is a bitcoin “wallet” (I’ll explain in a minute why I put this word in quotes) merged together with an exchange platform. Most of the people I know who are playing with Bitcoin as a whimsical investment arrived at Coinbase as the first point of entry. I suspect this is because Coinbase accounts link up with external bank accounts, thereby offering an intuitive and familiar interface to the financial infrastructure with which we’re all so well acquainted.

After Brewster sent one bitcoin to my address, I opened a Coinbase account and used the blockchain.info browser-based wallet to dump my funds into it.

Before we even get started, I’d like to note that using blockchain.info is the best experience I’ve encountered so far in this little experiment and I want to hold this transaction up as the ideal that we can use to judge all future stunts. The only better option would be to handle my transactions with a full Bitcoin client.

What I like is that the guys at Blockchain.info have done everything they can to keep their software true to the heart of Bitcoin. I can set up a wallet without giving them my name or email address. The private keys are in my sole possession. Basically, it’s all on

me. If I lose the information that I need to access my account or I let it leak into the hands of a thief, well then I’m flat out of luck and I’ll probably learn to be more careful in the future.

This is what Bitcoin looks like without her makeup on, when she’s dragging herself off the couch to open the door for a package. And, the way I see it, she now has two options. She can either gussy herself up for people or she can try to teach people to accept her for who she is. I advocate the latter (and not merely because I’m wearing sweatpants as I type). I think that the best services will be the ones that leave most of the risk with the users while simultaneously taking pains to tutor them on how to manage key pairs, use cold storage, etc. In other words, part of what’s required in getting this whole Bitcoin thing to work is giving people a new way to understand digital ownership and, in general, just making people smarter. That’s not a bad thing.

Which brings me to Coinbase. As an exchange, Coinbase has functionalities, and therefore responsibilities, that surpass my blockchain.info wallet. It has to operate in conjunction with a world of passwords, bank account numbers and identity verification protocols, many of which are determined by federal regulations. But I still think it’s fair and instructive to ask whether or not the service retains any of the features that Bitcoin the network brings to the table.

What are these features? Coinbase lays out three of the most important ones right on the homepage of its own website. It touts Bitcoin as an open, global network, one which is “not controlled by any company or country,” (that’s #1) with transactions that are secure, “fast and cheap,” (that’s #2) which are processed without the need for collecting sensitive details about the user. “There is no need to give companies extra information or a blank check to bill you” (that’s #3).

Unfortunately, transactions made through Coinbase retain none of these properties. Not a single one. Unlike Bitcoin, Coinbase is a company and when you move your bitcoins to a Coinbase account, you give the company complete control over them. This is because, as I hinted at before, a Coinbase wallet is not a real wallet.

I know that Bitcoin has only been around for 5 years and the community is still in a tug- of-war over semantics. So maybe the term “wallet” is a work in progress. But it shouldn’t be. To me, it’s very clear what this word means. When we talk about wallets in the physical world, we’re talking about something we use to carry our cash around (and all the cursed things that accumulate in a billfold). The important thing about a wallet is that we have access to its contents. At any time we can reach in and pull out the money.

In Bitcoin, the proper analog for cash is the private keys that are used to sign transactions on the Bitcoin blockchain. Private keys are the only thing you really own in Bitcoin, and therefore, any real wallet should give you complete access to them.

Query Coinbase as to how to get your private keys and you will be directed to this message:

As Coinbase is a hosted wallet, we do not provide users with their private keys; doing so would prevent us from taking advantage of our secure cold-storage technology to protect your bitcoin funds.

Instead, you can submit transactions and sign messages using our web-based interface, bypassing the need for control of the private keys.

That pretty much does away with feature number one. Trust Coinbase with your bitcoins and you must trust them completely, because they give you no direct control. This is not a gateway to Bitcoin. It is a surrogate.

On to number two. Transactions processed through the Bitcoin network are fast and cheap. The transaction fees are mere pennies and the transactions themselves usually clear within an hour.

The same is true of a Coinbase transaction if all you are doing is moving money from one Bitcoin address to another. But buying and selling them is another matter completely. Hooking my Coinbase wallet up with my credit union account took days. Once that was settled, I sold my bitcoin across the Coinbase online exchange and waited for the money to land in my checking account. This took another four days, which is longer than I’ve waited when using other services like PayPal or Chase’s QuikPay bank transfer.

The fee was actually not too bad. I sold my bitcoin at $372.62. Of that, Coinbase took $3.88, which is just about one percent.

So, on to number three. Bitcoin is a payment network that eliminates the need for users to divulge sensitive information about themselves. Ownership is verified through strong cryptography that references pseudonyms rather than real-world identities.

This one you can definitely say goodbye to if you start trading on Coinbase and even if you just use their wallet. As I mentioned, the company now knows my name and my bank account number (which they also have the ability to dip in and out of), and my email address. In addition to that, I’ve given Coinbase my phone number in order to set up 2-factor identification. And because they possess the private keys to all of the bitcoins I store in my Coinbase wallet, the company can associate my identity with any transactions they process.

Everything that was attractive about the Bitcoin protocol has been sacrificed to make the Coinbase service user friendly in a way that simulates modern banking and that indulges the dangerous, but well-engrained notion that we are better off trusting professionals to secure our digital information than we would be if we took control of it ourselves.

I’m only picking on Coinbase because it’s the first online exchange I’ve used. I hope to take a look at more of them in the coming weeks and I suspect to find these strategies to be endemic.

But if I were to offer an opinion, I would recommend anyone who has any admiration for Bitcoin—and for what this technology is doing to disrupt traditional payment processors —to go ahead and use Coinbase to exchange between Bitcoin and fiat currencies, but to get in and out as quickly as possible. The fees are pretty low compared to what else is available. But once you start using Coinbase to process transactions on the blockchain, you’re throwing everything beautiful about Bitcoin out the window.

Next up, I hit the Bitcoin ATMs in New York City and the open air trading nights at the Bitcoin Center near Wall Street.

Posted in Announcements, News | 3 Comments

Crusading librarian for openness passes: Cathy Norton

cathy-nortonA live wire in the library field, and a firebrand for openness, Cathy Norton helped keep libraries free and open during this current digitization wave.

Fun and opinionated, we learned that she had the background and evidence to make the bold statements she did–  keep the library materials free and open.

Cathy played a very important role in the development of our Book Digitization project in it’s early years. These were years when the future of book digitization’s growth and it’s public access was not certain. She stood up to the biggest tech companies; she took on publishers, she badgered research libraries to be broader than their local agendas and, at the end of the day, made a difference. Cathy remained contemporary, relevant and vocal up to the very end.

I (brewster) was grateful when I would sail Woods Hole and show up with bags of laundry and a salty demeanor, she would be welcoming and helpful.   Always up for an adventure, she had a firm idea of the world she was trying to build.

On behalf of the Digital Readers everywhere, the Internet Archive would like to want to raise a digital book to Cathy Norton, a champion of open knowledge, a positive force for collaboration and just a truly fun person who was up to take on any challenge related to moving libraries and public access forward. Thank you Cathy for what you helped create!

With celebration and sadness,

The Internet Archive, Brewster Kahle, Robert Miller, and the Open World

Here is the obituary that appeared for Cathy.

With sadness, the MBL notes the passing of former Library Director, Catherine N. Norton, who died peacefully at home after a battle with cancer. Cathy graduated from Sacred Hearts Academy, Fairhaven, MA, Regis College, Weston MA, and taught psychology at Chamberlane Jr. College while at Boston College graduate school more than fifty years ago.   She and her husband Thomas J. Norton moved to Falmouth for the “summer” but never left. She is survived by her 4 children whom she idolized and were with her when she passed, Dr. Margaret Molly Norton, Michael Norton, Kerrie Norton Marzot, and Thomas “Packy” Norton; and her grandchildren: Buddy Norton Estes, Toby Marzot, Drew Norton, Kate Norton, Hailey Norton, Roberto Marzot, and Julietta Marzot.

Cathy was active in community affairs. She served on the Falmouth school committee in the eighties and early nineties as chair and vice chair, was a town meeting member, and most recently represented Falmouth on the Steamship Authority board. She was instrumental in naming the new vessel “Woods Hole” that will be serving the islands from the Mainland.

Cathy lived for her family, friends, fun, faith and flowers. She remained long time friends with classmates from grammar school all the way through graduate school and showed how much she valued their friendship. In her professional life at the Marine Biological Laboratory she helped build international networks that spread digital information freely to countries that needed it from South America, to Africa, to Europe, and all the countries in between. A proponent of open access, she loved to travel to these countries and spread the word about the Biodiversity Heritage Library Project. As President of the Boston Library Consortium she helped form a group of libraries that worked with the Internet Archive to digitize open access books and journals, making them available to anyone with an internet connection.

Cathy had a flair for life, and her tremendous energy and can-do attitude guided her more than 30-year career at the MBL. Cathy came to the MBL in 1980 as a member of the MBLWHOI Library staff and earned a Masters in Information Science from Simmons College in 1984. In 1991, as the electronic frontier began to enhance information access, Cathy embraced change to become the MBL’s first Director of Information Systems. In 1994, she was appointed Library Director and became a leader in promoting the digital library and open access.

During her tenure she spearheaded the development of uBio, a digital biodiversity database that served as a foundation for the Encyclopedia of Life project. She helped develop an innovative Biomedical Informatics course sponsored by the National Library of Medicine designed to enable biomedical researchers and practitioners to embrace the power of technology. Cathy was also a founding member and served as Chairman of the Biodiversity Heritage Library, a worldwide collaboration of libraries and museums making biodiversity literature freely available. In 2011 Cathy retired as MBLWHOI Library Director and was named Library Scholar.

Beyond the MBL, Cathy was a Justice of the Peace for 39 years, marrying many happy couples on the beaches and back porches of Cape Cod.

Everyone who knew her has a “Cathy story” – how she inspired them with a project, connected them with another collaborator, worked her “magic” to make the seemingly impossible a reality, or made them laugh, especially with stories of weddings she presided over as a Justice of the Peace.

The MBL has established an endowed fund in Cathy’s honor, and its flag will be lowered in her memory. The family has requested that in lieu of flowers, please make donations to the Catherine N. Norton Endowed Fellowship at the MBL, www.mbl.edu/research/norton-fellowship.

A memorial service will be held on Saturday, December 27 at 11 AM at St. Patricks church on Main Street in Falmouth.

Posted in Announcements, Books Archive, News | 6 Comments

Lost Landscapes of San Francisco: Fundraiser Benefitting Internet Archive — Friday, December 19, 2014

FerryBldgFromWaterDuskRick Prelinger’s Lost Landscapes of San Francisco is back for one final performance this year!   Now you can catch this perennially sold-out show and your ticket donation will benefit the Internet Archive, a nonprofit digital library which hosts the Prelinger Collection. Please give generously to support the effort.

Friday, December 19, 2014
6 pm Reception
7:30 pm Film

300 Funston Ave.
San Francisco, CA 94118

Get tickets here!

TouristsGGBopening1936ATripDownMarketStreet1906_1This year’s LOST LANDSCAPES brings together familiar and unseen archival film clips showing San Francisco as it was and is no more. Blanketing the 20th-century city from the Bay to Ocean Beach and the Presidio to Bayview, this screening includes San Franciscans at work and play; early hippies in the Haight; a highly privileged walk on the unfinished Golden Gate Bridge;
newly-discovered images of Playland and the waterfront; families living and playing in their neighborhoods; detail-rich streetscapes of the late 1960s; peace rallies in Golden Gate Park; 1930s color images of a busy Market Street; a selected reprise of greatest hits from years 1-8; and much, much more.

As usual, the viewers make the soundtrack — audience members are asked to identify places and events, ask questions, share their thoughts, and create an unruly interactive symphony of speculation about the city we’ve lost and the city we’d like to live in.

The film begins at 7:30 pm and is preceded by an informal
reception that begins at 6:00 pm.

Posted in Announcements, News | 2 Comments

Declaration to be ‘Defensive’ for the Defensive Patent License

The Internet Archive hereby declares itself ‘Defensive’ by committing to offer a Defensive Patent License, version 1.1 or any later version, for any of its patents, to any DPL User.   The Internet Archive does not have any patents at this time.

Our contact address is:  info@archive.org


Founder, Digital Librarian
Internet Archive

 Birthday and Announcement about DPL.

Posted in News | 1 Comment

Defensive Patent License: Troll Proofed. Innovation Protected.

Today the Defensive Patent License is officially released.   It is designed to bring free software ideas to the patent arena by encouraging patent owners to declare themselves “defensive,” and share their patents with others that have declared themselves defensive.


This way a large number of patents can be used to help create new products and services without fear of being sued.  As more organizations join in becoming defensive, then the set of patents gets larger and the incentive to become defensive grows.

The Internet Archive hosted the “birthday party” as the license was refined, and declared itself defensive.  Brewster Kahle helped spur this generation of the idea by collaborating with lawyers who worked for years to get this to happen.

In celebration of this release, today John Gilmore is dedicating an important portfolio of patents from Pixel Qi to be defensive.   Pixel Qi was a company run by Mary Lou Jepsen of OLPC fame, and partially funded by Brewster Kahle and John Gilmore.

Please consider joining in by declaring your organization defensive, whether you have patents or not.  The Internet Archive has declared itself defensive to support this effort.




Posted in Announcements, News | 3 Comments

430 Billion Web Pages Saved….Help Us Do More!

141117-BrewsterDear Friends,

Today we launch our End-of-Year Campaign.  Once a year, I ask all of you to keep the Internet Archive going and growing stronger.   Please help us reach our goal of raising $1.5 million by the end of the year.  Your support will help pay for servers, bandwidth and our dedicated staff.

I founded the Internet Archive as a non-profit with a huge goal:  to give everyone access to all knowledge—the books, web pages, audio, television and software of our shared human culture. Forever.

Book Scanning with Table Top Scribe

Lan Zhu, a scanner at Internet Archive, at the Table Top Scribe. Zhu can scan a 300-page book in thirty minutes. Since 2005, the Internet Archive has digitized over 2.4 million books.

Together we are building the digital library of the future. A place where we can all go to learn and explore.

At the Internet Archive, we’ve preserved 430 billion web pages. People download 20 million books on our site each month. We get more visitors in a year than most libraries do in a lifetime. The key is to keep improving—and to keep it free. That’s where you can help us.

For the cost of buying a book, you can make a book permanently available for the next generation. Please consider donating $10, $25, $50 or whatever you can afford  to support the Internet Archive before the end of this year. It’s is a small amount to inform millions. Help us do more. I promise you, it’s money well spent.

Thank you,

Brewster Kahle
Founder, Digital Librarian
Internet Archive

Photos by David Rinehart/Internet Archive

Posted in Announcements, News | 14 Comments

Partnership Promotes Jobs and Builds Free Global Library

BARM1As part of their Building Libraries Together initiative the Internet Archive is testing a new socially-responsible jobs model with Bay Area Rescue Mission (BARM) of Richmond, California.

The Internet Archive has been digitizing books for nearly 10 years, but needed help reaching a goal of 10 million eBooks. “We had so much high value content that needed to be digitized, but not enough staff to do the work”, explains Robert Miller, Director of Digital Books and Media. “We wondered how we could make our problem someone else’s solution.” BARM offers a ‘Healthy Living’ addiction recovery program, where over 350 men and women work in a residential setting designed to move them towards self-sufficiency and independent living. The challenge for the staff at BARM is that most of their graduating clients lacked the job skills and professional résumé required for securing a job. Internet Archive can offer job skills and a work history. A conversation between Miller and Tim Hammock, Vice-President of the Bay Area Rescue Mission ensued and the Work Transition Program was born.

BARM2Candidates for the Internet Archive Work Transition Program are men and women from BARM who have completed a 12-month sober living, drug counseling or domestic abuse crisis program and are ready to re-enter the job market. This group often lacks relevant job skills, recent work experience, interpersonal and work relationship skills, self-confidence and, a résumé that a national or local employer would find compelling enough to grant an interview. The curriculum for the Internet Archive Work Transition Program lasts 9 months and focuses on ‘Learning-to-Work’. This three-phase program was based on lessons learned from the 600+ staff that the Archive has hired over the past 8 years. From these lessons, a program of progressive responsibility, constant feedback and a merit badging system was built to meet this challenge. Miller notes that this is not a make-work program. The work is substantive and needs to be completed to help get content online to share with the global community. “The Internet Archive Texts collections have over 20 million downloads each month and the material digitized by the team maintains our high standard of quality.”


BARM3To ‘grease the skids’ for the Work Transition Program graduates, Hammack and Miller contacted local companies, explaining that the program was not a handout and they weren’t looking for charity. They simply asked for a commitment from employers to grant the graduate an interview. Upon reviewing the program goals and expectations, local businesses including UPS, San Francisco Public library, Costco and others signed on. The first class graduates in February 2015, but already two of the candidates have secured part-time employment.

Hammack is thrilled with the program, adding that “We take people on the worst day of their lives and help them achieve dignity, learn healthy living habits, while getting clean and sober. The Work Transition Program continues this path to recovery by helping them earn a job; a huge accomplishment!”


Special thanks to the teams at Internet Archive: Jesse Bell Digitization Coordinator, and Antoine McGrath, Work Transition Supervisor, and at Bay Area Rescue Mission, headed by Tim Hammack ,Vice- President of Operations. For more information about the program, contact Robert Miller.

Posted in Announcements, Books Archive, News | Comments Off on Partnership Promotes Jobs and Builds Free Global Library

Music Analysis Beginnings

As mentioned in our recent Building Music Libraries post, we are working with researchers at Columbia University and UPF in Barcelona to run their code on the music collection to help their research and to provide new analyses that could help with exploration and understanding.

We are doing some pilot runs to generate files which some close observers may see in the music item directories on archive.org.  Audio fingerprints from audfprint are .afpt and music attributes from Essentia are in _esslow.json.gz (download sample) and _esshigh.json.gz.

Spectrogram of a Grateful Dead track

Spectrogram of a Grateful Dead track

We are also creating image files showing the audio spectrum used.  We hope this is useful for those that want to see if files have been compressed in the past (even if they are posted as flac files now).  There is also a .png for each audio file of a basic waveform that is being used in the archive’s beta site as eye candy.

More as it happens, but we wanted you know there is some progress and you will see some new files.  If you have proposed other analyses that would benefit from being run over a large corpus, please let us know by contacting info at archive dot org.

Thank you to the researchers and the Archive programmers who are working together to make this happen.


Posted in Audio Archive, Live Music Archive, Music | Comments Off on Music Analysis Beginnings

Using Docker to Encapsulate Complicated Program is Successful

The Internet Archive has been using docker in a useful way that is a bit out of the mainstream: to package a command-line binary and its dependencies so we can deploy it on a cluster and use it in the same way we would a static binary.

Columbia University’s Daniel Ellis created an audio fingerprinting program that was used in a competition.   It was not packaged as a debian package or other distribution approach.   It took a while for our staff to find how to install it and its many dependencies consistently on Ubuntu, but it seemed pretty heavy handed to install that on our worker cluster.    So we explored using docker and it has been successful.   While old hand for some, I thought it might be interesting to explain what we did.

1) Created a docker file to make a docker container that held all of the code needed to run the system.

2) Worked with our systems group to figure out how to install docker on our cluster with a security profile we felt comfortable with.   This included running the binary in the container as user nobody.

3) Ramped up slowly to test the downloading and running of this container.   In general it would take 10-25 minutes to download the container the first time. Once cached on a worker node, it was very fast to start up.    This cache is persistent between many jobs, so this is efficient.

4) Use the container as we would a shell command, but passed files into the container by mounting a sub filesystem for it to read and write to.   Also helped with signaling errors.

5) Starting production use now.

We hope that docker can help us with other programs that require complicated or legacy environments to run.

Congratulations to Raj Kumar, Aaron Ximm, and Andy Bezella for the creative solution to problem that could have made it difficult for us to use some complicated academic code in our production environment.

Go docker!

Posted in Music, Technical | 3 Comments