Author Archives: Brewster Kahle

Can You Help us Make the 19th Century Searchable?

Posted on August 21, 2020 by Brewster Kahle

In 1847, Frederick Douglass started a newspaper advocating the abolition of slavery that ran until 1851. After the Civil War, there was a newspaper for freed slaves, the Freedmen’s Record. The Internet Archive is bringing these and many more works online for free public access. But there’s a problem:

Our Optical Character Recognition (OCR), while the best commercially available OCR technology, is not very good at identifying text from older documents.

Take for example, this newspaper from 1847. The images are not that great, but a person can read them:

The problem is our computers’ optical character recognition tech gets it wrong, and the columns get confused.

What we need is “Culture Tech” (a riff on fintech, or biotech) and Culture Techies to work on important and useful projects–the things we need, but are probably not going to get gushers of private equity interest to fund. There are thousands of professionals taking on similar challenges in the field of digital humanities and we want to complement their work with industrial-scale tech that we can apply to cultural heritage materials.

One such project would be to work on technologies to bring 19th-century documents fully digital. We need to improve OCR to enable full text search, but we also need help segmenting documents into columns and articles. The Internet Archive has lots of test materials and thousands are uploading more documents all the time.

What we do not have is a good way to integrate work on these projects with the Internet Archive’s processing flow. So we need help and ideas there as well.

Maybe we can host an “Archive Summer of CultureTech” or something…Just ideas. Maybe working with a university department that would want to build programs and classes around Culture Tech… If you have ideas or skills to contribute, please post a comment here or send an email to info@archive.org with some of this information.

Libraries lend books, and must continue to lend books: Internet Archive responds to publishers’ lawsuit

Posted on July 29, 2020 by Brewster Kahle

Yesterday, the Internet Archive filed our response to the lawsuit brought by four commercial publishers to end the practice of Controlled Digital Lending (CDL), the digital equivalent of traditional library lending. CDL is a respectful and secure way to bring the breadth of our library collections to digital learners. Commercial ebooks, while useful, only cover a small fraction of the books in our libraries. As we launch into a fall semester that is largely remote, we must offer our students the best information to learn from—collections that were purchased over centuries and are now being digitized. What is at stake with this lawsuit? Every digital learner’s access to library books. That is why the Internet Archive is standing up to defend the rights of hundreds of libraries that are using Controlled Digital Lending.

The publishers’ lawsuit aims to stop the longstanding and widespread library practice of Controlled Digital Lending, and stop the hundreds of libraries using this system from providing their patrons with digital books. Through CDL, libraries lend a digitized version of the physical books they have acquired as long as the physical copy doesn’t circulate and the digital files are protected from redistribution. This is how Internet Archive’s lending library works, and has for more than nine years. Publishers are seeking to shut this library down, claiming copyright law does not allow it. Our response is simple: Copyright law does not stand in the way of libraries’ rights to own books, to digitize their books, and to lend those books to patrons in a controlled way.

What is at stake with this lawsuit? Every digital learner’s access to library books. That is why the Internet Archive is standing up to defend the rights of hundreds of libraries that are using Controlled Digital Lending.

“The Authors Alliance has several thousand members around the world and we have endorsed the Controlled Digital Lending as a fair use,” stated Pamela Samuelson, Authors Alliance founder and Richard M. Sherman Distinguished Professor of Law at Berkeley Law. “It’s really tragic that at this time of pandemic that the publishers would try to basically cut off even access to a digital public library like the Internet Archive…I think that the idea that lending a book is illegal is just wrong.”

These publishers clearly intend this lawsuit to have a chilling effect on Controlled Digital Lending at a moment in time when it can benefit digital learners the most. For students and educators, the 2020 fall semester will be unlike any other in recent history. From K-12 schools to universities, many institutions have already announced they will keep campuses closed or severely limit access to communal spaces and materials such as books because of public health concerns. The conversation we must be having is: how will those students, instructors and researchers access information — from textbooks to primary sources? Unfortunately, four of the world’s largest book publishers seem intent on undermining both libraries’ missions and our attempts to keep educational systems operational during a global health crisis.

Ten percent of the world’s population experience disabilities that impact their ability to read. For these learners, digital books are a lifeline. The publishers’ lawsuit against the Internet Archive calls for the destruction of more than a million digitized books.

The publishers’ lawsuit does not stop at seeking to end the practice of Controlled Digital Lending. These publishers call for the destruction of the 1.5 million digital books that Internet Archive makes available to our patrons. This form of digital book burning is unprecedented and unfairly disadvantages people with print disabilities. For the blind, ebooks are a lifeline, yet less than one in ten exists in accessible formats. Since 2010, Internet Archive has made our lending library available to the blind and print disabled community, in addition to sighted users. If the publishers are successful with their lawsuit, more than a million of those books would be deleted from the Internet’s digital shelves forever.

I call on the executives at Hachette, HarperCollins, Wiley, and Penguin Random House to come together with us to help solve the pressing challenges to access to knowledge during this pandemic. Please drop this needless lawsuit.

Libraries have been bringing older books to digital learners: Four publishers sue to stop it

Posted on July 22, 2020 by Brewster Kahle

I wanted to share my thoughts in response to the lawsuit against the Internet Archive filed on June 1 by the publishers Hachette, Harpercollins, Wiley, and Penguin Random House.

I founded the Internet Archive, a non-profit library, 24 years ago as we brought the world digital. As a library we collect and preserve books, music, video and webpages to make a great Internet library.

We have had the honor to partner with over 1,000 different libraries, such as the Library of Congress and the Boston Public Library, to accomplish this by scanning books and collecting webpages and more. In short, the Internet Archive does what libraries have always done: we buy, collect, preserve, and share our common culture.

But remember March of this year—we went home on a Friday and were told our schools were not reopening on Monday. We got cries for help from teachers and librarians who needed to teach without physical access to the books they had purchased.

Over 130 libraries endorsed lending books from our collections, and we used Controlled Digital Lending technology to do it in a controlled, respectful way. We lent books that we own—at the Internet Archive and also the other endorsing libraries. These books were purchased and we knew they were not circulating physically. They were all locked up. In total, 650 million books were locked up just in public libraries alone. Because of that, we felt we could, and should, and needed to make the digitized versions of those books available to students in a controlled way to help during a global emergency. As the emergency receded, we knew libraries could return to loaning physical books and the books would be withdrawn from digital circulation. It was a lending system that we could scale up immediately and then shut back down again by June 30th.

And then, on June 1st, we were sued by four publishers and they demanded we stop lending digitized books in general and then they also demanded we permanently destroy millions of digital books. Even though the temporary National Emergency Library was closed before June 30th, the planned end date, and we are back to traditional controlled digital lending, the publishers have not backed down.

Schools and libraries are now preparing for a “Digital Fall Semester” for students all over the world, and the publishers are still suing.

Please remember that what libraries do is Buy, Preserve, and Lend books.

Controlled Digital Lending is a respectful and balanced way to bring our print collections to digital learners. A physical book, once digital, is available to only one reader at a time. Going on for nine years and now practiced by hundreds of libraries, Controlled Digital Lending is a longstanding, widespread library practice.

What is at stake with this suit may sound insignificant—that it is just Controlled Digital Lending—but please remember– this is fundamental to what libraries do: buy, preserve, and lend.

With this suit, the publishers are saying that in the digital world, we cannot buy books anymore, we can only license and on their terms; we can only preserve in ways for which they have granted explicit permission, and for only as long as they grant permission; and we cannot lend what we have paid for because we do not own it. This is not a rule of law, this is the rule by license. This does not make sense.

We say that libraries have the right to buy books, preserve them, and lend them even in the digital world. This is particularly important with the books that we own physically, because learners now need them digitally.

This lawsuit is already having a chilling impact on the Digital Fall Semester we’re about to embark on. The stakes are high for so many students who will be forced to learn at home via the Internet or not learn at all.

Librarians, publishers, authors—all of us—should be working together during this pandemic to help teachers, parents and especially the students.

I call on the executives at Hachette, HarperCollins, Wiley, and Penguin Random House to come together with us to help solve the pressing challenges to access to knowledge during this pandemic.

Please drop this needless lawsuit.

–Brewster Kahle, July 22, 2020

Temporary National Emergency Library to close 2 weeks early, returning to traditional controlled digital lending

Posted on June 10, 2020 by Brewster Kahle

Within a few days of the announcement that libraries, schools and colleges across the nation would be closing due to the COVID-19 global pandemic, we launched the temporary National Emergency Library to provide books to support emergency remote teaching, research activities, independent scholarship, and intellectual stimulation during the closures.

We have heard hundreds of stories from librarians, authors, parents, teachers, and students about how the NEL has filled an important gap during this crisis.

Ben S., a librarian from New Jersey, for example, told us that he used the NEL “to find basic life support manuals needed by frontline medical workers in the academic medical center I work at. Our physical collection was closed due to COVID-19 and the NEL allowed me to still make available needed health informational materials to our hospital patrons.” We are proud to aid frontline workers.

Today we are announcing the National Emergency Library will close on June 16th, rather than June 30th, returning to traditional controlled digital lending. We have learned that the vast majority of people use digitized books on the Internet Archive for a very short time. Even with the closure of the NEL, we will be able to serve most patrons through controlled digital lending, in part because of the good work of the non-profit HathiTrust Digital Library. HathiTrust’s new Emergency Temporary Access Service features a short-term access model that we plan to follow.

We moved up our schedule because, last Monday, four commercial publishers chose to sue Internet Archive during a global pandemic. However, this lawsuit is not just about the temporary National Emergency Library. The complaint attacks the concept of any library owning and lending digital books, challenging the very idea of what a library is in the digital world. This lawsuit stands in contrast to some academic publishers who initially expressed concerns about the NEL, but ultimately decided to work with us to provide access to people cut off from their physical schools and libraries. We hope that similar cooperation is possible here, and the publishers call off their costly assault.

Controlled digital lending is how many libraries have been providing access to digitized books for nine years. Controlled digital lending is a legal framework, developed by copyright experts, where one reader at a time can read a digitized copy of a legally owned library book. The digitized book is protected by the same digital protections that publishers use for the digital offerings on their own sites. Many libraries, including the Internet Archive, have adopted this system since 2011 to leverage their investments in older print books in an increasingly digital world.

We are now all Internet-bound and flooded with misinformation and disinformation—to fight these we all need access to books more than ever. To get there we need collaboration between libraries, authors, booksellers, and publishers.

Let’s build a digital system that works.

Four commercial publishers filed a complaint about the Internet Archive’s lending of digitized books

Posted on June 1, 2020 by Brewster Kahle

This morning, we were disappointed to read that four commercial publishers are suing the Internet Archive.

As a library, the Internet Archive acquires books and lends them, as libraries have always done. This supports publishing, authors and readers. Publishers suing libraries for lending books, in this case protected digitized versions, and while schools and libraries are closed, is not in anyone’s interest.

We hope this can be resolved quickly.

Thank you for helping us increase our bandwidth

Posted on May 11, 2020 by Brewster Kahle

Last week the Internet Archive upped our bandwidth capacity 30%, based on increased usage and increased financial support. Thank you.

This is our outbound bandwidth graph that has several stories to tell…

A year ago, usage was 30Gbits/sec. At the beginning of this year, we were at 40Gbits/sec, and we were handling it. That is 13 Petabytes of downloads per month. This has served millions of users to materials in the wayback machine, those listening 78 RPMs, those browsing digitized books, streaming from the TV archive, etc. We were about the 250th most popular website according to Alexa Internet.

Then Covid-19 hit and demand rocketed to 50Gbits/sec and overran our network infrastructure’s ability to handle it. So much so, our network statistics probes had difficulty collecting data (hence the white spots in the graphs).

We bought a second router with new line cards, and got it installed and running (and none of this is easy during a pandemic), and increased our capacity from 47Gbits/sec peak to 62Gbits/sec peak. And we are handling it better, but it is still consumed.

Alexa Internet now says we are about the 160th most popular website.

So now we are looking at the next steps up, which will take more equipment and is more wizardry, but we are working on it.

Thank you again for the support, and if you would like to donate more, please know it is going to build collections to serve millions. https://archive.org/donate

The National Emergency Library – Who Needs It? Who Reads It? Lessons from the First Two Weeks

Posted on April 7, 2020 by Brewster Kahle

At a time when every day can feel like a month, it’s hard to believe that the National Emergency Library has only existed for two weeks. Recognizing the unique challenges of connecting students and readers with books now on shelves they cannot reach, the Internet Archive loosened the restrictions on our controlled digital lending library to allow increased lending of materials. Reactions have been passionate, to say the least—elation by teachers able to access our virtual stacks, concern by authors about the program’s impact, and fundamental questions about our role as a library in these dire times when one billion students worldwide are cut off from their classrooms and libraries.

For those of you who are being introduced to us for the first time due to the National Emergency Library: Welcome! The doors of the Internet Archive have been open for nearly 25 years and we’ve served hundreds of millions of visitors—we’ve always got room to welcome one more. And for those of you who have tracked our evolution through the years, we know you have questions.

When we turned off waitlists for our lending library on March 24th, it was in response to messages and requests we’d been getting from many sources—librarians who were closing their doors in response to lockdowns, school teachers who were concerned their students could no longer do research and discovery through the primary sources they had on campus, and organizations we respected who knew we had the capability to fill an unexpected gap. A need that we knew we could provide quickly in response.

We moved in “Internet Time” and the speed and swiftness of our solution surprised some and caught others off guard. In our rush to help we didn’t engage with the creator community and the ecosystem in which their works are made and published. We hear your concerns and we’ve taken action: the Internet Archive has added staff to our Patron Services team and we are responding quickly to the incoming requests to take books out of the National Emergency Library. While we can’t go back in time, we can move forward with more information and insight based on data the National Emergency Library has generated thus far.

The Internet Archive takes reader privacy seriously, so we don’t have specific analytics or logs to share (we took the government to court to assure we didn’t have to do that,) but we do have some general information that may be of use to authors, publishers and readers about the ways patrons are using the National Emergency Library. We will be sharing more in the coming weeks of this crisis.

Majority of books are borrowed for less than 30 minutes

Even with a preview function where readers can see the first few pages of a book, most people who go through the check out process are looking at the book for less than 30 minutes, with no more interactions until it is automatically returned two weeks later. We suspect that fewer than 10% of books borrowed are actually opened again after the first day (but we have more work to do to confirm this). Patrons may be using the checked-out book for fact checking or research, but we suspect a large number of people are browsing the book in a way similar to browsing library shelves.

The total number of books that are checked out and read is about the number of books borrowed from a town library

Trying to compare a physical check-out of a book with a digital check-out is difficult. Assuming that the number of physical books borrowed from a library corresponds to digitally borrowed books that are read after the first day, then the Internet Archive currently lends about as many as a US library that serves a population of about 30,000.

Our usage pattern may be more like a serendipitous walk through a bookstore or the library stacks. In the real world, a patron takes a book off the shelf, flips through to see if it’s of interest, and then either selects the book or puts it back on the shelf. However, in our virtual library, to flip fully through the book you have to borrow it. The large number of books that have no activity beyond the first few minutes of interaction suggest patrons are using our service to browse books.

90% of the books borrowed were published more than 10 years ago, two-thirds were published during the 20th century

The books in the National Emergency Library were published between 1925 and 5 years ago, because books older than that are in the public domain—out of copyright and fully downloadable. Books newer than 5 years are not in the National Emergency Library. Unlike the age of most books in bookstores, the books readers are borrowing are older books, with 10% being from the last 10 years. Two-thirds of these books were published during the 20th century.

And when people find what they need, it solves a problem, such as this subject librarian who found a book published in 1975:

I just found a book that student was looking, for as an ebook on the National Emergency Library ‘Indeginous African Architecture’ https://t.co/Hina7QqvuG #architecture @UniLincolnArts pic.twitter.com/9p8KjGBoMK
— Oonagh🧜‍♀️ (@GCWOonagh) April 6, 2020

A bit of Fun: Some of the least common subject catagories of borrowed books

These subject tags come from library catalog records and other annotations by organizations such as ISKME has done with the Universal School Library collection, assigned to aid search and discovery of resources for educators.

We’ll continue to glean and share what we can as this project continues and we hope that the needs that gave rise to the National Emergency Library come to an end soon.

When school’s out, what will we learn?

Posted on March 26, 2020 by Brewster Kahle

More than 100 countries have closed their schools, including 43 states in the U.S.

Forty years ago as a freshman, I pulled my first book off the shelves of Hayden Library at MIT. This month, every MIT undergraduate departed from campus in an attempt to contain COVID-19, leaving behind the vast resources of that library. Ready or not, we are all being thrust into an enormous experiment in online learning. One that can have positive and permanent outcomes, if we handle it right.

With schools closing from Changshu to Cambridge, suddenly students are cut off from the physical resources they rely on: the teachers, the classrooms and libraries that are the backbone of learning. And in this flux, those in marginalized communities—from rural areas without broadband or schools with few online books—are even more profoundly challenged. The Economist reports that in the United states, “7 million school-age children cannot access the internet at home.”

“If this is just a prolonged pause in our education and economy, without the benefits of learning and adapting, one of the most profound impacts of COVID-19 may be…a “quiet brain drain.” It will be time our children never get back.”

But here’s the good news: we know how to do this, to impart knowledge at scale over the Internet. Online courses, online libraries and broadband all exist—but we need to expand and upgrade them to meet the needs of the close to one billion learners around the world whose classrooms have been shuttered.

24 years ago, I founded the Internet Archive as a nonprofit digital library serving more than a million learners every day. Today, the Internet Archive is working with hundreds of public, school and university libraries to digitize their core collections and make them freely available over the Internet. Even as MIT was sending students home, we were working with MIT Libraries to see how many of their books we have already digitized. In 24 hours, we were able to hand them back 166,000 digitized books to lend online through their catalogue and via archive.org. This week, the Internet Archive created a National Emergency Library of 1.4 million digitized books to serve the needs of students, educators and learners who can now access them from home.

At archive.org/nel or OpenLibrary.org, you can borrow 1.4 million digitized books for free during the COVID-19 crisis.

Think of this as a huge experiment. In one big push, we can improve online learning and its infrastructure in a way that may otherwise have taken years. This crisis encourages universities to be bold, to make investments that ultimately may mean many more students can benefit. Perhaps 500 undergraduates can fill a hall at MIT, but how many millions can take an online MIT course, once the books, materials and lessons are online?

China is a few weeks ahead of the United States when it comes to experimenting with online learning. In January, my son, Caslon, was teaching English to 4th graders in Changshu. Now he is teaching them from San Francisco, with recorded lessons and online interaction. Next month, his school in China is poised to reopen, but I suspect it will be forever changed.

If this is just a prolonged pause in our education and economy, without the benefits of learning and adapting, one of the most profound impacts of COVID-19 may be what Dr. Kate Tairyan, Chief Medical Officer of the online college NextGenU.org, calls a “quiet brain drain.” It will be time our children never get back.

But we have the opportunity to harness American ingenuity to build a stronger, more robust educational system—by leveraging the Internet, new technologies, and our investments in digitizing books at scale into something that democratizes learning for a generation to come.

Brewster Kahle is the founder and Digital Librarian of the Internet Archive. A passionate advocate for public Internet access and a successful entrepreneur, he has spent his career intent on a singular focus: providing Universal Access to All Knowledge. Kahle graduated from the Massachusetts Institute of Technology, where he studied artificial intelligence.

Internet Archive Staff and Covid-19: Work-at-Home for Most, Full-Pay Furlough & Medical for Scanners

Posted on March 25, 2020 by Brewster Kahle

This is an unsettling time, and the Internet Archive has been working with staff, partner libraries, and patron communities to weather this storm.

Our staff and community is core of who we are– we are not the data, we are people. We care deeply and have been taking the following steps to support staff.

Most of the Internet Archive staff now work at home– this is going well: zoom, slack, jitsi, whereby, google docs, broadband– the miracles of our Internet world make this possible. Fortunately, we had already become a largely distributed staff because of prices in San Francisco and our interest in engaging the best people we could no matter where they live.

For the 50 book scan center staff that work in libraries that are now closed, we do not have enough productive remote work and no paid work. (Libraries paying for our scanning services is a major source of earned income for the Internet Archive.) For these important employees we are leveraging government assistance to accomplish a furlough for 3 months at regular pay with medical benefits. So our scanners are safe, not working, and paid.

Figuring out how to do this in England, the US, and Canada, has been challenging especially trying to leverage ever-changing government subsidies. Fortunately England announced added help for furloughed workers, and the United States seems to be working on expanded benefits. We always look to save money but we will make sure our furloughed employees are fully paid with medical during this period in any case. We have made sure they are safe now and that they know we want them to come back to work.

For the few that will not have jobs after the lights come back on, based on org changes, we have supported them at a higher level than those on furlough to help them through this time and relaunch.

To pay for these measures, we have gotten some donations and some employees have offered to work 4 days a week for the coming months to help, but it will hurt. Your support is most welcome.

Thankfully, so far, the libraries that support us are planning to restart scanning when it is safe to do so. Based on the now-apparent need to digitize modern books for remote digital access, we hope more libraries will support our scanning services.

With strong staff and partnerships we can grow to produce new services that are appropriate for these times such as the National Emergency Library that is now lending books to thousands of displaced students.

Thank you for your support and stay safe.

Libraries and Publishing Now– Viva la Library!

Posted on December 30, 2019 by Brewster Kahle

Readers consume publisher’s products many hours every day– and consume on publisher’s terms. Publisher’s framing on our screens, publisher’s business models, publisher’s flow and pacing. Yes, there are many publishers now, but we are, mostly, locked into their presentation forms. We check into their black box theaters and consume as intended.

Libraries have always bought publisher’s products but have traditionally offered alternative access modes to these materials, and can again. As an example let’s take newspapers. Published with scoops and urgency, yesterday is “old news,” the paper it was printed on is then only useful the next day as “fish wrap”– the paper piles up and we felt guilty about the trash. That is the framing of the publisher: old is useless, new is valuable. This has carried into social media– flip up to read on. Scroll through your “feed” (gosh, the word “feed” is illustrative, what happens after “feed” is “fed”? Well, it comes out the other end in a way we do not cherish 🙂 ).

But a library gives old news a new life, not a commercial life, but a life that encourages reflection, perspective, critique, analysis. In a word– “History”. The library keeps the former “news” and offers it in new ways in a new framing, with new tools– not just flip flip flip. It can be quoted, placed side by side with other publisher’s news and enable researchers to inject commentary.

This capture, representation, searching, rethinking is not a crime– it is thought, it is memory and our history– it builds to become our culture. It has been supported, nurtured, taught.

But the library is in danger in our digital world. In print, one could keep what one had read. In digital that is harder technically, and publishers are specifically making it harder. Technical enforcement measures and laws are making remembering difficult, and worse, a crime.

Libraries live to offer new ways to see published works that were often produced for a different purpose. But this is difficult in a digital world.

Digital newspapers sometimes disappear from their web presence. App-based newspapers can not be pointed to with a citation or URL. Archives, sometimes available, are segmented into each publisher’s platforms.

Similarly, digital books live in proprietary digital book readers that disappear the books. If “cut and paste” functions at all, often just inside that “platform.” Annotations are stored with the vendor, with their terms and conditions.

A personal library now means a purchase list on a website.

Libraries and publishers have lived together throughout the paper era, not always peacefully, but libraries were possible because of paper technologies, laws, and funding. Multiple copies were kept in different libraries ensuring preservation and creating different access modes for different communities.

Once publications became electronic, preservation and access became harder. Radio and television did not fit into the library mold. Early tele-text, Lexis-Nexis, Westlaw, and AOL really did not work as library collections in traditional libraries. Academic journal publishing shifted to digital and libraries moved to serve as customer service departments for leased database access.

Some of us helped build the Internet so digital works could be archived and “libraried”. And then made archives of Web pages and created services around them.

But it turns out that few of us did this, and the biggest, Google, did it privately and for profit. The Internet Archive was created to help and has archived billions of Web Pages, millions of hours of TV and radio, millions of books, records, movies and software.

Most traditional libraries have done little to preserve digital materials. The Internet Archive is quite unique in focusing on this mission and I would say under supported. Encouraging, however, is that 100,000 individuals a year now donate to support the Internet Archive’s public services. Hope is there.

We need libraries of digital materials, tools to use these libraries, and ways to protect them, fund them and integrate them into schools and our lives more generally. This way we can remember, think, and build on the past.

With so much in digital form, and storage and communication so easy, it should be the librarian’s day! It can be the library user’s day…

Let’s build that world… of preservation and access, of reflection and critique, with confidence that what happened actually happened so that our histories can rely on immutable evidence.

Libraries do not command the world, but libraries are necessary in the functioning of a thoughtful world.

Thank you for supporting the Internet Archive.

Viva la Library!

Internet Archive Blogs

A blog from the team at archive.org