by Robert Miller, Global Director of Books, Internet Archive
“Reading a book from the inside out!”. Well not quite, but a new way to read our eBooks has just been launched. Check out this great BBC article:
Here is the fabulous Flickr commons collection:
What is it and how did it get done?
A Yahoo research fellow at Georgetown University, Kalev Leetaru, extracted over 14 million images from 2 million Internet Archive public domain eBooks that span over 500 years of content. Because we have OCR’d the books, we have now been able to attach about 500 words before and after each image. This means you can now see, click and read about each image in the collection. Think full-text search of images!
How many images are there?
As of today, 2.6 million of the 14 million images have been uploaded to Flickr Commons. Soon we will be able to add continuously to this collection from the over 1,000+ new eBooks we scan each day. Dr. Simon Chaplin, Head of the Wellcome Library says, “This way of discovering and reading a book will help transform our medical heritage collection as it goes up online. This is a big step forward and will bring digitized book collections to new audiences.”
What is fun to do with this collection?
Trying typing in the word “telephone’ and enjoy what images appear? Curious about how death has been characterized over 500 years of images – type in “mordis”. Feeling good about health care – type in medicine and prepare to be amazed. Remember, all of these images are in the public domain!
We will be working with our wonderful friends at Flickr and our great Library partners to make this collection even more interesting – more images, more sub-collections and some very interesting ideas of how to use some image recognition tools to help us learn more about, well, anything!
Questions about this collection, projects or things to come?
Email me at email@example.com