Open-Access Text Archives

International Libraries and the Internet Archive collaborate to build Open-Access Text Archives

Today, a number of International libraries have committed to putting their digitized books in open-access archives, starting with one at the Internet Archive. This approach will ensure permanent and public access to our published heritage. Anyone with an Internet connection will have access to these collections and the growing set of tools to make use of them. In this way we are getting closer to the goal of Universal Access to All Knowledge.

By working with libraries from 5 countries, and working to expand this number, we are bringing a broad range of materials to every interested individual. This growing commitment to open access through public archives marks a significant commitment to broad, public, and free access. While still early in its evolution, works in dozens of languages are already stored in the Internet Archive’s Open-Access Text Archive offering a breadth of materials to everyone.

Over one million books have been committed to the Text Archive. Currently over twenty-seven thousand are available and an additional fifty thousand are expected in the first quarter of 2005. Advanced processing of these multilingual books will offer unprecedented access.

Researchers, scholars, and the general public will be able to leverage these collections in ways that have been familiar to library users for centuries– unfettered searching through catalogs, reading and annotating the books, and sharing pieces with collegues. The public domain or appropriately licensed books will be viewed on-screen, searched, and printed for free using PDF and DJVU. Leveraging the book catalogs of the individual libraries, RLG (The Research Libraries Group, Inc.) and other catalogs, these books will be available to traditional library users without much retraining.

Technology allows us to provide more enhanced access to these materials. First would be to offer similar access to’s trademark Search Inside the Book system for public domain books. Therefore library users would be able to find books that mention relevant words and phrases without having to have the catalog reflect each topic.

Beyond those uses, however, we see a new type of library user– one that uses computers to analyze and compare materials. Imagine being able to analyze the changes to the English language over time. Imagine being able to use the hand translated versions of past books as a way to train automatic translation technologies so we can more effectively translate any book into any language. Imagine being able to analyze the interrelation of papers through their footnotes and links to find new patterns of thought. Each of these projects is already proceeding using the digital holdings of the Internet Archive by researchers. As the Text Archive grows, these researchers will be able to do this over a much broader range of texts as the public domain is added to the Text Archive.

Commercial companies are currently working with libraries to digitize materials as well. We are encouraging of these efforts and hope most of these materials will be also available through Text Archives, which would be an alternative and complimentary means of getting to these digital materials. Deeper and less commercial access can augment the commercial offerings. Leveraging a public-private partnership, where the libraries and archives provide deep academic and research access and the commercial companies offer packaged services, we can see a flowering of the public domain for many uses.

The Internet Archive hosts a Text Archive working with a number of other libraries and archives directly and indirectly. At this time, the libraries that have committed to hosting books on the Internet Archive include:

* Carnegie Mellon University and the Million Book Project, USA (Raj Reddy)
* University of Toronto, Canada (Carole Moore)
* Library of Congress American Memory Project, USA (Deanna Marcum)
* McMaster University, Canada (Graham Hill)
* University of Ottawa, Canada (Leslie Weir)
* Bibliotheca Alexandrina, Egypt (Noha Adly)
* Indian Institute of Science, India (N. Balakrishnan)
* International Institute of Information Technology, India (Dr. Jawahar Lakshmi)
* Zhejiang University, China (Professor Zhao)
* European Archive, Netherlands (Julien Masans)
* Internet Archive, San Francisco USA (Brewster Kahle)

All libraries and archives are encouraged to join this free association by contacting the Internet Archive at

The Internet Archive’s activities are supported by such groups as the Hewlett Foundation, Sloan Foundation, Kahle/Austin Foundation, National Science Foundation, and grants and contracts from the Library of Congress, US National Archives, British National Archives, Bibliothéque Nationale de France.