The Center for Intelligent Information Retrieval at UMass Amherst, the Perseus Digital Library Project at Tufts, and the Internet Archive are investigating large-scale information extraction and retrieval technologies for digitized book collections. The NSF has awarded a grant of $2.7 million for a project to apply advanced OCR, topic modeling and metadata extraction techniques to over one million books at the Internet Archive.