Category Archives: Open Library

Weaving Books into the Web—Starting with Wikipedia

[announcement video, Wired]

The Internet Archive has transformed 130,000 references to books in Wikipedia into live links to 50,000 digitized Internet Archive books in several Wikipedia language editions including English, Greek, and Arabic. And we are just getting started. By working with Wikipedia communities and scanning more books, both users and robots will link many more book references directly into Internet Archive books. In these cases, diving deeper into a subject will be a single click.

Moriel Schottlender, Senior Software Engineer, Wikimedia Foundation, speech announcing this program

“I want this,” said Brewster Kahle’s neighbor Carmen Steele, age 15, “at school I am allowed to start with Wikipedia, but I need to quote the original books. This allows me to do this even in the middle of the night.”

For example, the Wikipedia article on Martin Luther King, Jr cites the book To Redeem the Soul of America, by Adam Fairclough. That citation now links directly to page 299 inside the digital version of the book provided by the Internet Archive. There are 66 cited and linked books on that article alone. 

In the Martin Luther King, Jr. article of Wikipedia, page references can now take you directly to the book.

Readers can see a couple of pages to preview the book and, if they want to read further, they can borrow the digital copy using Controlled Digital Lending in a way that’s analogous to how they borrow physical books from their local library.

“What has been written in books over many centuries is critical to informing a generation of digital learners,” said Brewster Kahle, Digital Librarian of the Internet Archive. “We hope to connect readers with books by weaving books into the fabric of the web itself, starting with Wikipedia.”

You can help accelerate these efforts by sponsoring books or funding the effort. It costs the Internet Archive about $20 to digitize and preserve a physical book in order to bring it to Internet readers. The goal is to bring another 4 million important books online over the next several years.  Please donate or contact us to help with this project.

From a presentation on October 23, 2019 by Moriel Schottlender, Tech lead at the Wikimedia Foundation.

“Together we can achieve Universal Access to All Knowledge,” said Mark Graham, Director of the Internet Archive’s Wayback Machine. “One linked book, paper, web page, news article, music file, video and image at a time.”


Hamilton Public Library joins Open Libraries

In an effort to meet users’ growing needs around access to library materials, Hamilton Public Library has joined the Internet Archive’s Open Libraries program.

Hamilton Public Library’s Central Library, Hamilton, ON

“In today’s rapidly changing access to digital content, it is important that the broader library community works together to build lasting and growing collections of digital content for our customers and communities,” said Paul Takala, Chief Librarian and CEO of Hamilton Public Library. “The Internet Archive has developed a responsible and balanced controlled digital lending (CDL) model. I look forward to a future where all the unique titles we have in our collection – many of which are out of print – can be shared with researchers and learners everywhere. With the Internet Archive’s Open Libraries program, this future is now possible. I encourage all libraries to join this important effort.”

Internet Archive’s Open Libraries program uses controlled digital lending (CDL) to deliver nearly one million volumes of digitized texts to readers and researchers all over the world. Controlled digital lending is a process by which libraries can lend print books to patrons in digitized form, and has been described by copyright experts from major research libraries. Through CDL, libraries use controls to ensure an “owned-to-loaned” ratio, meaning the library circulates the exact number of copies of a specific title it owns, regardless of format, putting controls in place to prevent users from redistributing or copying the digitized version. CDL is not meant to replace existing licensing agreements for modern ebooks; instead, CDL helps libraries provide access to twentieth century publications that don’t have an electronic equivalent.

Participating in Open Libraries is easy.  After signing on to the program, libraries share their catalogs with Internet Archive and our engineers perform an overlap analysis to determine the physical books in a library’s collection that match the books we have digitized.  Where there’s a match, the Internet Archive returns links and catalog records to the digital book so that the library can include these in their catalog.

If your library is interested in learning more about the Open Libraries program, please consider joining one of our upcoming webinars:

Tuesday, June 18, 2019
11:00 AM – 12:00 PM CDT
Register for the free webinar

Wednesday, July 10, 2019
11:00 AM – 12:00 PM CDT
Register for the free webinar

Revised wish list now available: 1.5M books we want

Earlier this year we released our Open Libraries wish list, which brought together four datasets to help inform our collection development priorities for Open Libraries.  After working with the wish list for a few months and reviewing our approach, we decided to make a few revisions to the ways in which we brought together the data.  Our wish list was always intended to be an iterative work-in-progress, and we are pleased to release our latest version here: https://archive.org/details/open_libraries_wish_list

Download wish list now

What’s in the wish list?

To create the wish list, we brought together four datasets:

  1. OCLC’s list of one million most widely held books, based on holdings records of libraries worldwide;
  2. Library Link’s holdings records of North American libraries, leveraging the decisions of thousands of librarians in prioritizing collections for patron use;
  3. Open Syllabus Project, which has collected syllabi from the Internet to compile the most assigned books in classrooms;
  4. Data about book and scholarly article citations in Wikipedia, published by the Wikimedia Foundation.

These data help us define a collection of 1.5M books, identified by their ISBNs, that are widely held and frequently cited.  We continue to work on human-mediated efforts to identify collections that are reflective of the diverse voices in our communities.

What’s changed?

In this latest revision to the wish list, we decided to keep the focus on materials that are widely held and widely cited by fine tuning the thresholds for inclusion on the list in the following ways:

  • In our previous wish list, we had included xISBN “synonyms” to the ISBNs on the list as a way of increasing the breadth of materials, but realized that approach created scenarios where we could have digitized a different edition than the one cited by a Wikipedia editor, or included on a syllabus.  In the latest revision, we chose to include only the ISBNs included on each list.
  • We also revised our approach to the Wikipedia citations, including those books that had been cited more than once.

This latest revision gives us a wish list comprised of 1.5M ISBNs that we feel confident in using as a core collection around which to focus our acquisition and digitization priorities.

How can you help?

If you’d like to help us build our digital collection, you can contribute in the following ways:

  • Donate books
    • You can donate books to our physical archive. If you are a library, a publisher, or have a private collection with more than 1,000 books to donate, please contact Chris Freeland, Director of Open Libraries, at chrisfreeland@archive.org. We will add these books to our digitization queue and they will become ebooks available through Open Libraries as funding becomes available.
  • Identify books
    • If you are an author who would like to add your own books to the list, you can donate physical copies, and/or contact us to let us know you’d like us to ensure that your work will be preserved and available to future generations.
    • If you’re a librarian, educator, or other book lover and would like to help us continue to curate the wish list to ensure that it includes the most useful, important and culturally diverse books, please reach out to us.
  • Scan books
    • If you have books on our wish list but don’t want to donate them to our physical archive, we offer scanning services and can digitize your books in one of our regional scanning centers.

 

If you are interested in participating, or have questions about our program or plans, please contact Chris Freeland, Director of Open Libraries, at chrisfreeland@archive.org.

The 20th Century Time Machine

by Nancy Watzman & Katie Dahl

Jason Scott

With the turn of a dial, some flashing lights, and the requisite puff of fog, emcees Tracey Jaquith, TV Architect, and Jason Scott, Free Range Archivist, cranked up the Internet Archive 20th Century Time Machine on stage before a packed house at the Internet Archive’s annual party on October 11.

Eureka! The cardboard contraption worked! The year was 1912, and out stepped Alexis Rossi, director of Media and Access, her hat adorned with a 78rpm record.

1912

D’Anna Alexander (center) with her mother (right) and grandmother (left).

“Close your eyes and listen,” Rossi asked the audience. And then, out of the speakers floated the scratchy sounds of Billy Murray singing “Low Bridge, Everybody Down” written by Thomas S. Allen. From 1898 to the 1950s, some three million recordings of about three minutes each were made on 78rpm discs. But these discs are now brittle, the music stored on them precious. The Internet Archive is working with partners on the Great 78 Project to store these recordings digitally, so that we and future generations can enjoy them and reflect on our music history. New collections include the Tina Argumedo and Lucrecia Hug 78rpm Collection of dance music collected in Argentina in the mid-1930s.

1927

Next to emerge from the Time Machine was David Leonard, president of the Boston Public Library, which was the first free, municipal library founded in the United States. The mission was and remains bold: make knowledge available to everyone. Knowledge shouldn’t be hidden behind paywalls, restricted to the wealthy but rather should operate under the principle of open access as public good, he explained. Leonard announced that the Boston Public Library would join the Internet Archive’s Great 78 Project, by authorizing the transfer of 200,000 individual 78s and LPs to preserve and make accessible to the public, “a collection that otherwise would remain in storage unavailable to anyone.”

David Leonard and Brewster Kahle

Brewster Kahle, founder and Digital Librarian of the Internet Archive, then came through the time machine to present the Internet Archive Hero Award to Leonard. “I am inspired every time I go through the doors,” said Kahle of the library, noting that the Boston Public Library was the first to digitize not just a presidential library, of John Quincy Adams, but also modern books.  Leonard was presented with a tablet imprinted with the Boston Public Library homepage by Internet Archive 2017 Artist in Residence, Jeremiah Jenkins.

1942

Kahle then set the Time Machine to 1942 to explain another new Internet Archive initiative: liberating books published between 1923 to 1941. Working with Elizabeth Townsend Gard, a copyright scholar at Tulane University, the Internet Archive is liberating these books under a little known, and perhaps never used, provision of US copyright law, Section 108h, which allows libraries to scan and make available materials published 1923 to 1941 if they are not being actively sold. The name of the new collection: the Sony Bono Memorial Collection, named for the now deceased congressman and former representative who led the passage of the Copyright Term Extension Act of 1998, which included the 108h provision as a “gift” to libraries.

One of these books includes “Your Life,” a tome written by Kahle’s grandfather, Douglas E. Lurton, a “guide to a desirable living.” “I have one copy of this book and two sons. According to the law, I can’t make one copy and give it to the other son. But now it’s available,” Kahle explained.

1944

Sab Masada

The Time Machine cranked to 1944, out came Rick Prelinger, Internet Archive Board member, archivist, and filmmaker. Prelinger introduced a new addition to the Internet Archive’s film collection: long-forgotten footage of an Arkansas Japanese internment camp from 1944.  As the film played on the screen, Prelinger welcomed Sab Masada, 87, who lived at this very camp as a 12-year-old.

Masada talked about his experience at the camp and why it is important for people today to remember it. “Since the election I’ve heard echoes of what I heard in 1942,” Masada said. “Using fear of terrorism to target the Muslims and people south of the border.”

1972

Next to speak was Wendy Hanamura, the director of partnerships. Hanamura explained how as a sixth grader she discovered a book at the library, Executive Order 9066, published in 1972, which chronicled photos of Japanese internment camps during World War II.

“Before I was an internet archivist, I was a daughter and granddaughter of American citizens who were locked up behind barbed wire in the same kind of camps that incarcerated Sab,” said Hanamura. That one book – now out of print – helped her understand what had happened to her family.

Inspired by making it to the semi-final round of the MacArthur 100&Change initiative with a proposal that provides libraries and learners with free digital access to four million books, the Internet Archive is forging ahead with plans, despite not winning the $100 million grant. Among the books the Internet Archive is making available: Executive Order 9066.

1985

The year display turned to 1985, Jason Scott reappeared on stage, explaining his role as a software curator. New this year to the Internet Archive are collections of early Apple software, he explained, with browser emulation allowing the user to experience just what it was like to fire up a Macintosh computer back in its hay day. This includes a collection of the then wildly popular “HyperCards,” a programmatic tool that enabled users to create programs that linked materials in creative ways, before the rise of the world wide web.

1997

After Vinay Goelthis tour through the 20th century, the Time Machine was set to 1997. Mark Graham, Director of the Wayback Machine and Vinay Goel, Senior Data Engineer, stepped on stage. Back in 1997, when the Wayback Machine began archiving websites on the still new World Wide Web, the entire thing amounted to 2.2 terabytes of data. Now the Wayback Machine contains 20 petabytes. Graham explained how the Wayback Machine is preserving tweets, government websites, and other materials that could otherwise vanish. One example: this report from The Rachel Maddow Show, which aired on December 16, 2016, about Michael Flynn, then slated to become National Security Advisor. Flynn deleted a tweet he had made linking to a falsified story about Hillary Clinton, but the Internet Archive saved it through the Wayback Machine.

Goel took the microphone to announce new improvements to Wayback Machine Search 2.0. Now it’s possible to search for keywords, such as “climate change,” and find not just web pages from a particular time period mentioning these words, but also different format types — such as images, pdfs, or yes, even an old Internet Archive favorite, animated gifs from the now-defunct GeoCities–including snow globes!

Thanks to all who came out to celebrate with the Internet Archive staff and volunteers, or watched online. Please join our efforts to provide Universal Access to All Knowledge, whatever century it is from.

Editor’s Note, 10/16/17: Watch the full event https://archive.org/details/youtube-j1eYfT1r0Tc  

 

Syncing Catalogs with thousands of Libraries in 120 Countries through OCLC

We are pleased to announce that the Internet Archive and OCLC have agreed to synchronize the metadata describing our digital books with OCLC’s WorldCat. WorldCat is a union catalog that itemizes the collections of thousands of libraries in more than 120 countries that participate in the OCLC global cooperative.

What does this mean for readers?
When the synchronization work is complete, library patrons will be able to discover the Internet Archive’s collection of 2.5 million digitized monographs through the libraries around the world that use OCLC’s bibliographic services. Readers searching for a particular volume will know that a digital version of the book exists in our collection. With just one click, readers will be taken to archive.org to examine and possibly borrow the digital version of that book. In turn, readers who find a digital book at archive.org will be able, with one click, to discover the nearest library where they can borrow the hard copy.

There are additional benefits: in the process of the synchronization, OCLC databases will be enriched with records describing books that may not yet be represented in WorldCat.

“This work strengthens the Archive’s connection to the library community around the world. It advances our goal of universal access by making our collections much more widely discoverable. It will benefit library users around the globe by giving them the opportunity to borrow digital books that might not otherwise be available to them,” said Brewster Kahle, Founder and Digital Librarian of the Internet Archive. “We’re glad to partner with OCLC to make this possible and look forward to other opportunities this synchronization will present.”

“OCLC is always looking for opportunities to work with partners who share goals and objectives that can benefit libraries and library users,” said Chip Nilges, OCLC Vice President, Business Development. “We’re excited to be working with Internet Archive, and to make this valuable content discoverable through WorldCat. This partnership will add value to WorldCat, expand the collections of member libraries, and extend the reach of Internet Archive content to library users everywhere.”

We believe this partnership will be a win-win-win for libraries and for learners around the globe.

Better discovery, richer metadata, more books borrowed and read.

Read the OCLC press release.

Books from 1923 to 1941 Now Liberated!

[press: boingboing]

The Internet Archive is now leveraging a little known, and perhaps never used, provision of US copyright law, Section 108h, which allows libraries to scan and make available materials published 1923 to 1941 if they are not being actively sold. Elizabeth Townsend Gard, a copyright scholar at Tulane University calls this “Library Public Domain.”  She and her students helped bring the first scanned books of this era available online in a collection named for the author of the bill making this necessary: The Sonny Bono Memorial Collection. Thousands more books will be added in the near future as we automate. We hope this will encourage libraries that have been reticent to scan beyond 1923 to start mass scanning their books and other works, at least up to 1942.

While good news, it is too bad it is necessary to use this provision.

Trend of Maximum U.S. General Copyright Term by Tom W Bell

If the Founding Fathers had their way, almost all works from the 20th century would be public domain by now (14-year copyright term, renewable once if you took extra actions).

Some corporations saw adding works to the public domain to be a problem, and when Sonny Bono got elected to the House of Representatives, representing Riverside County, near Los Angeles, he helped push through a law extending copyright’s duration another 20 years to keep things locked-up back to 1923.  This has been called the Mickey Mouse Protection Act due to one of the motivators behind the law, but it was also a result of Europe extending copyright terms an additional twenty years first. If not for this law, works from 1923 and beyond would have been in the public domain decades ago.

Lawrence Lessig

Lawrence Lessig

Creative Commons founder, Larry Lessig fought the new law in court as unreasonable, unneeded, and ridiculous.  In support of Lessig’s fight, the Internet Archive made an Internet bookmobile to celebrate what could be done with the public domain. We drove the bookmobile across the country to the Supreme Court to make books during the hearing of the case. Alas, we lost.

Internet Archive Bookmobile in front of
Carnegie Library in Pittsburgh: “Free to the People”

But there is an exemption from this extension of copyright, but only for libraries and only for works that are not actively for sale — we can scan them and make them available. Professor Townsend Gard had two legal interns work with the Internet Archive last summer to find how we can automate finding appropriate scanned books that could be liberated, and hand-vetted the first books for the collection. Professor Townsend Gard has just released an in-depth paper giving libraries guidance as to how to implement Section 108(h) based on her work with the Archive and other libraries. Together, we have called them “Last Twenty” Collections, as libraries and archives can copy and distribute to the general public qualified works in the last twenty years of their copyright.  

Today we announce the “Sonny Bono Memorial Collection” containing the first books to be liberated. Anyone can download, read, and enjoy these works that have been long out of print. We will add another 10,000 books and other works in the near future. “Working with the Internet Archive has allowed us to do the work to make this part of the law usable,” reflected Professor Townsend Gard. “Hopefully, this will be the first of many “Last Twenty” Collections around the country.”

Now it is the chance for libraries and citizens who have been reticent to scan works beyond 1923, to push forward to 1941, and the Internet Archive will host them. “I’ve always said that the silver lining of the unfortunate Eldred v. Ashcroft decision was the response from people to do something, to actively begin to limit the power of the copyright monopoly through action that promoted open access and CC licensing,” says Carrie Russell, Director of ALA’s Program of Public Access to Information. “As a result, the academy and the general public has rediscovered the value of the public domain. The Last Twenty project joins the Internet Archive, the HathiTrust copyright review project, and the Creative Commons in amassing our public domain to further new scholarship, creativity, and learning.”

We thank and congratulate Team Durationator and Professor Townsend Gard for all the hard work that went into making this new collection possible. Professor Townsend Gard, along with her husband, Dr. Ron Gard, have started a company, Limited Times, to assist libraries, archives, and museums implementing Section 108(h), “Last Twenty” collections, and other aspects of the copyright law.

Prof. Elizabeth
Townsend Gard

Tomi Aina
Law Student

Stan Sater
Law Student

 

 

 

 

 

 

Hundreds of thousands of books can now be liberated. Let’s bring the 20th century to 21st-century citizens. Everyone, rev your cameras!

MacArthur Foundation’s $100 Million Award Finalists

Today, the MacArthur Foundation announced the finalists for its 100&Change competition, awarding a single organization $100 million to solve one of the world’s biggest problems. The Internet Archive’s Open Libraries project, one of eight semifinalists, did not make the cut to the final round. Today we want congratulate the 100&Change finalists and thank the MacArthur Foundation for inspiring us to think big. For the last 15 months, the Internet Archive team has been building the partnerships that can transform US libraries for the digital age and put millions of ebooks in the hands of more than a billion learners. We’ve collaborated with the world’s top copyright experts to clarify the legal framework for libraries to digitize and lend their collections. And we’ve learned an amazing amount from the leading organizations serving the blind and people with disabilities that impact reading.  

To us, that feels like a win.

In the words of MacArthur Managing Director, Cecilia Conrad:

The Internet Archive project will unlock and make accessible bodies of knowledge currently located on library shelves across the country. The proposal for curation, with the selection of books driven not by commercial interests but by intellectual and cultural significance, is exciting. Though the legal theory regarding controlled digital lending has not been tested in the courts, we found the testimony from legal experts compelling. The project has an experienced, thoughtful and passionate team capable of redefining the role of the public library in the 21st Century.

Copyright scholar and Berkeley Law professor, Pam Samuelson (center), convenes a gathering of more than twenty legal experts to help clarify the legal basis for libraries digitizing and lending physical books in their collections.

So, the Internet Archive and our partners are continuing to build upon the 100&Change momentum. We are meeting October 11-13 to refine our plans, and we invite interested stakeholders to join us at the Library Leaders Forum. If you are a philanthropist interested in leveraging technology to provide more open access to information—well, we have a project for you.

For 20 years, at the Internet Archive we have passionately pursued one goal: providing universal access to knowledge. But there is almost a century of books missing from our digital shelves, beyond the reach of so many who need them. So we cannot stop. We now have the technology, the partners and the plan to transform library hard copies into digital books and lend them as libraries always have. So all of us building Open Libraries are moving ahead.

Members of the Open Libraries Team at the Internet Archive headquarters, part of a global movement to provide more equitable access to knowledge.

Remember: a century ago, Andrew Carnegie funded a vast network of public libraries because he recognized democracy can only exist when citizens have equal access to diverse information. Libraries are more important than ever, welcoming all of society to use their free resources, while respecting readers’ privacy and dignity. Our goal is to build an enduring asset for libraries across this nation, ensuring that all citizens—including our most vulnerable—have equal and unfettered access to knowledge.

Thank you, MacArthur Foundation, for inspiring us to turn that idea into a well thought-out project.

Onward!

–The Open Libraries Team

Open Library New Features and Fixes

OpenLibrary team has added pages for 200,000 new modern works and rolled out a brigade of fixes and features.

screen shot of book reader

Prioritized by feedback from openlibrary patrons,

  • Full-text search through all books hosted on the Internet Archive is back online and is faster than ever. You can try the new feature, for example, to see over 115,000 places where works reference Benjamin Franklin’s maxim: “Little strokes fell great oaks”.
  • Updated new Book Reader, which looks great on mobile devices and provides a much clearer and simpler book borrowing experience. Try out the new Book Reader and see for yourself!

There are a few small changes in the BookReader that we think you’ll like specifically. EPUB and PDF loans can be initiated from within an existing BookReader loan. What this means for Open Library users is two pretty cool things you’ve long requested:

  • Users who start loans from the BookReader can borrow either EPUB or PDF formats, and switch formats during the loan period.
  • Users who start loans from the BookReader can return loans early, even EPUBs and PDFs.

 

screen shot showing onscreen areas to download and return books

We hope these changes will delight readers, empower developers, and help the community to make even more quality contributions. The path ahead looks even more promising. With clear direction and exciting redesign concepts in the works, the Open Library team is eager to bring you an Open Library at the cutting edge of the 21st century while giving you access to five centuries’ of texts.

image from old reading textbook

Thank you to Jessamyn West, Brenton Cheng, Mek Karpeles, Giovanni Damiola, Richard Carceres, and the many volunteers in the community.

[from the Open Library blog]

Sharing Data for Better Discovery and Access

horizontal_logo_standard_Jan2015
The Internet Archive and the Digital Public Library of America (DPLA) are pleased to announce a joint collaborative program to enhance sharing of collections from the Internet Archive in the Digital Public Library of America (DPLA).

ia-logo-220x221The Internet Archive will work with interested libraries and content providers to help ensure their metadata meets DPLA’s standards and requirements. After their content is digitized, the metadata would then be ready for ingestion into the DPLA if the content provider has a current DPLA provider agreement.

The DPLA is excited to collaborate with the Internet Archive in this effort to improve metadata quality overall, by making it more consistent with DPLA requirements, including consistent rights statements. Better data means better access. In addition to providing DPLA compliant metadata services, the Internet Archive also offers a spectrum of digital collection services, such as digitization, storage and preservation. Libraries, archives and museums who chose Internet Archive as their service provider have the added benefit of having their content made globally available through Internet Archive’s award winning portals, OpenLibrary.org and Archive.org.

“We are thrilled to be working with the DPLA”, states Robert Miller, Internet Archive General Manager of Digital Libraries. “With their emphasis on providing not only a portal and a platform, but also their advocacy for public access of content, they are a perfect partner for us”.

Rachel Frick, DPLA Business Development Director says, “The Internet Archive’s mission of ‘Universal Access to All Knowledge’, coupled with their end-to-end digital library solutions complements our core values.”

Program details are available upon request. Please contact:
Rachel Frick – DPLA Business Development Director, Rachel@dp.la
Robert Miller – General Manager of Digital Libraries, robert@archive.org