Digital Books on archive.org

Many people think of the Internet Archive as just the Wayback Machine or just one collection or another, but there is much more.  For instance, books!

As a nonprofit library we buy and lend books to the public, but we do even more than that. Working with hundreds of libraries, we buy ebooks, digitize physical books, offer them to the print-disabled, and lend books to one reader at a time, all for free via archive.org and openlibrary.org.

Archive.org is the website that offers free public access to all sorts of materials uploaded by users, collected by the Internet Archive, and digitized by the Internet Archive.  Archive.org includes books, music, video, webpages, and software.  OpenLibrary.org, a site that is maintained by the Internet Archive, is a catalog of books with the mission to offer “One webpage for every book.”  This open source catalog site, started in 2005, is editable by its users and has many code contributors. It links to various resources about that book, for instance, links to amazon.com and betterworldbooks.org to buy the book, to local libraries that own the book, to archive.org for print-disabled access or to borrow a digitized version of the book, and to other sites that have digital versions.

The goals of libraries are preservation and access. For physical books, we buy and receive donations of hundreds of thousands of books that we preserve for the long term in archival, non-circulating stacks. Support for this comes from libraries, used book vendors, foundations, and tens of thousands of individual donors to the Internet Archive, a public charity.

We also work with more than 500 libraries to help digitize their books, now more than 3 million of them, to preserve them digitally and offer online access. These libraries make their older books (mostly pre-1923)  available for free public downloading, and fantastically over 25 million older books are viewed every month.

Unfortunately, the books of the 20th century are largely not available either physically or digitally. These graphs show how the 20th century’s books are not available through Amazon for purchase, or from the Internet Archive. Some have reasoned this is because of copyright. 1923 is a special date in US copyright law because works published before this date are in the  Public Domain, while afterwards copyright status can be very complicated. Unfortunately, 1923 in these graphs also demarks a sharp drop in commercial availability of many books. These books are often only available through libraries.

Starting 10 years ago the Internet Archive began digitizing modern books, mostly from the 20th century,  for access by the blind and dyslexic. Those that are certified disabled by the Library of Congress get a decryption key for accessing Library of Congress scanned books. This key can also decrypt digitized books available on archive.org. This combined with special formats for the blind and dyslexic of the older books has brought millions of books to people that have had difficulty in the past. We are working to make these books more available to these communities in other special formats.

Publishers have been using digital protection technologies for years for ebooks sold to retail customers, often referred to as DRM (digital rights management).  Libraries lend ebooks using the same DRM, and the Internet Archive has followed that lead, using Adobe Digital Editions.

The digital protection allows books to be lent via downloads that disappear (or become inaccessible) when the loan period ends (e.g. two weeks).  For users who prefer to read their ebooks directly in a browser, the same thing happens. The book becomes inaccessible at the end of the loan period, and the next reader in line has a chance to borrow it.

While it is technically possible to break the digital protections of these technologies, it is illegal to do so. Moreover, the typical user does not do this, allowing for a flourishing ebook marketplace for current books. The Internet Archive is able to make available for loan older books that are not available in ebook format. In every case, an authorized print copy has been acquired and made unavailable for simultaneous loan.

Many of the books in our collection are books that libraries believe to be of historical importance such that they do not want to throw them away, but are not worth keeping on their physical shelves. The digitized versions are therefore made available to a single user at a time, while the physical book no longer circulates. Since the books which are lent using the controlled digital lending technologies are limited to one reader at a time, it works best for “long tail” books, books that are not available in other ways. Fortunately, many of these books are wonderful and important and we are proud to bring them to a generation of digital learners who may not have physical access to major public libraries.

We hope many more libraries start controlled digital lending of their books as this is a way to bring public access to the purchases and collections they have built over centuries.

We have recently made available a small number of books (currently 61 books) published between 1923 and 1941 under a provision of US Copyright law that was written to permit libraries to copy and lend titles that are no longer subject to commercial exploitation, and selection is currently overseen by  lawyers expert in US copyright law.

As a completely separate service from buying ebooks and loaning to users with controlled digital lending, the Internet Archive offers free hosting for cultural works (texts, audio, moving images) that are uploaded by the general public. Millions of documents from court cases, and digitized books from other projects such as the Google book program and the Digital Library of India have been uploaded over the years.

When a rights holder wants a work that was uploaded by a user taken down, a well known “Notice and Takedown” procedure is in place. The Internet Archive takes prompt action and follows the procedure, generally resulting in the work being taken down.

Where is this all going?  We are looking for partners and ideas to help bring more books to more people in more ways. More books (and more accessible books) for the print disabled, complete collections of books from the 20th century online and available, clickable footnotes for books cited in Wikipedia to bring up the full text on the right page, and many more books in bookstores and libraries. This generation of digital learners is looking for this, is expecting this. Collectively, libraries, booksellers, publishers, and authors– old and new– share these same interests.  The good news is the technologies are now available– we all have to do our parts to do to serve digital learners everywhere.

As a library, we strive to provide “Universal Access to All Knowledge.” The digital technologies make this a feasible dream.  We are working with publishers, booksellers, authors, other libraries, and most of all digital learners to find balanced and respectful ways to try to achieve this goal. If you want to help, or have ideas on what we can do to get there, please let us know.

 

One thought on “Digital Books on archive.org

  1. Dianne Nolin

    Internet Archive books are great for genealogy. I write a genealogy blog and find for my readers links to many books at Internet Archive that help researchers find their ancestor’s names in school registers, military books, herd books, society magazines, etc. One category that is really appreciated by many are the British Parliamentary Publications to do with Ireland, since most Irish records were destroyed in the uprising in 1922.

    I must take this opportunity to thank you for all you do, you are helping genealogists everywhere! Thanks!!!

Comments are closed.