Author Archives: Brewster Kahle

When school’s out, what will we learn?

More than 100 countries have closed their schools, including 43 states in the U.S.

Forty years ago as a freshman, I pulled my first book off the shelves of Hayden Library at MIT. This month, every MIT undergraduate departed from campus in an attempt to contain COVID-19, leaving behind the vast resources of that library. Ready or not, we are all being thrust into an enormous experiment in online learning. One that can have positive and permanent outcomes, if we handle it right.

With schools closing from Changshu to Cambridge, suddenly students are cut off from the physical resources they rely on: the teachers, the classrooms and libraries that are the backbone of learning. And in this flux, those in marginalized communities—from rural areas without broadband or schools with few online books—are even more profoundly challenged. The Economist reports that in the United states, “7 million school-age children cannot access the internet at home.”

“If this is just a prolonged pause in our education and economy, without the benefits of learning and adapting, one of the most profound impacts of COVID-19 may be…a “quiet brain drain.” It will be time our children never get back.”

But here’s the good news: we know how to do this, to impart knowledge at scale over the Internet. Online courses, online libraries and broadband all exist—but we need to expand and upgrade them to meet the needs of the close to one billion learners around the world whose classrooms have been shuttered.

24 years ago, I founded the Internet Archive as a nonprofit digital library serving more than a million learners every day. Today, the Internet Archive is working with hundreds of public, school and university libraries to digitize their core collections and make them freely available over the Internet. Even as MIT was sending students home, we were working with MIT Libraries to see how many of their books we have already digitized. In 24 hours, we were able to hand them back 166,000 digitized books to lend online through their catalogue and via archive.org. This week, the Internet Archive created a National Emergency Library of 1.4 million digitized books to serve the needs of students, educators and learners who can now access them from home.

At archive.org/nel or OpenLibrary.org, you can borrow 1.4 million digitized books for free during the COVID-19 crisis.

Think of this as a huge experiment. In one big push, we can improve online learning and its infrastructure in a way that may otherwise have taken years. This crisis encourages universities to be bold, to make investments that ultimately may mean many more students can benefit. Perhaps 500 undergraduates can fill a hall at MIT, but how many millions can take an online MIT course, once the books, materials and lessons are online?

China is a few weeks ahead of the United States when it comes to experimenting with online learning. In January, my son, Caslon, was teaching English to 4th graders in Changshu. Now he is teaching them from San Francisco, with recorded lessons and online interaction. Next month, his school in China is poised to reopen, but I suspect it will be forever changed.

If this is just a prolonged pause in our education and economy, without the benefits of learning and adapting, one of the most profound impacts of COVID-19 may be what Dr. Kate Tairyan, Chief Medical Officer of the online college NextGenU.org, calls a “quiet brain drain.” It will be time our children never get back.

But we have the opportunity to harness American ingenuity to build a stronger, more robust educational system—by leveraging the Internet, new technologies, and our investments in digitizing books at scale into something that democratizes learning for a generation to come.

Brewster Kahle is the founder and Digital Librarian of the Internet Archive. A passionate advocate for public Internet access and a successful entrepreneur, he has spent his career intent on a singular focus: providing Universal Access to All Knowledge. Kahle graduated from the Massachusetts Institute of Technology, where he studied artificial intelligence.

Internet Archive Staff and Covid-19: Work-at-Home for Most, Full-Pay Furlough & Medical for Scanners

This is an unsettling time, and the Internet Archive has been working with staff, partner libraries, and patron communities to weather this storm.

Our staff and community is core of who we are– we are not the data, we are people. We care deeply and have been taking the following steps to support staff.

Most of the Internet Archive staff now work at home– this is going well: zoom, slack, jitsi, whereby, google docs, broadband– the miracles of our Internet world make this possible.  Fortunately, we had already become a largely distributed staff because of prices in San Francisco and our interest in engaging the best people we could no matter where they live.

For the 50 book scan center staff that work in libraries that are now closed, we do not have enough productive remote work and no paid work. (Libraries paying for our scanning services is a major source of earned income for the Internet Archive.) For these important employees we are leveraging government assistance to accomplish a furlough for 3 months at regular pay with medical benefits. So our scanners are safe, not working, and paid.

Figuring out how to do this in England, the US, and Canada, has been challenging especially trying to leverage ever-changing government subsidies.  Fortunately England announced added help for furloughed workers, and the United States seems to be working on expanded benefits. We always look to save money but we will make sure our furloughed employees are fully paid with medical during this period in any case. We have made sure they are safe now and that they know we want them to come back to work.

For the few that will not have jobs after the lights come back on, based on org changes, we have supported them at a higher level than those on furlough to help them through this time and relaunch.

To pay for these measures, we have gotten some donations and some employees have offered to work 4 days a week for the coming months to help, but it will hurt. Your support is most welcome.

Thankfully, so far, the libraries that support us are planning to restart scanning when it is safe to do so. Based on the now-apparent need to digitize modern books for remote digital access, we hope more libraries will support our scanning services.  

With strong staff and partnerships we can grow to produce new services that are appropriate for these times such as the National Emergency Library that is now lending books to thousands of displaced students.

Thank you for your support and stay safe.

Libraries and Publishing Now– Viva la Library!

Readers consume publisher’s products many hours every day– and consume on publisher’s terms. Publisher’s framing on our screens, publisher’s business models, publisher’s flow and pacing. Yes, there are many publishers now, but we are, mostly, locked into their presentation forms. We check into their black box theaters and consume as intended.

Libraries have always bought publisher’s products but have traditionally offered alternative access modes to these materials, and can again. As an example let’s take newspapers. Published with scoops and urgency, yesterday is “old news,” the paper it was printed on is then only useful the next day as “fish wrap”– the paper piles up and we felt guilty about the trash. That is the framing of the publisher: old is useless, new is valuable. This has carried into social media– flip up to read on. Scroll through your “feed” (gosh, the word “feed” is illustrative, what happens after “feed” is “fed”?  Well, it comes out the other end in a way we do not cherish 🙂 ).

But a library gives old news a new life, not a commercial life, but a life that encourages reflection, perspective, critique, analysis. In a word– “History”. The library keeps the former “news” and offers it in new ways in a new framing, with new tools– not just flip flip flip. It can be quoted, placed side by side with other publisher’s news and enable researchers to inject commentary.

This capture, representation, searching, rethinking is not a crime– it is thought, it is memory and our history– it builds to become our culture. It has been supported, nurtured, taught.

But the library is in danger in our digital world. In print, one could keep what one had read. In digital that is harder technically, and publishers are specifically making it harder. Technical enforcement measures and laws are making remembering difficult, and worse, a crime.

Libraries live to offer new ways to see published works that were often produced for a different purpose. But this is difficult in a digital world.

Digital newspapers sometimes disappear from their web presence. App-based newspapers can not be pointed to with a citation or URL. Archives, sometimes available, are segmented into each publisher’s platforms.

Similarly, digital books live in proprietary digital book readers that disappear the books. If “cut and paste” functions at all, often just inside that “platform.” Annotations are stored with the vendor, with their terms and conditions.

A personal library now means a purchase list on a website.

Libraries and publishers have lived together throughout the paper era, not always peacefully, but libraries were possible because of paper technologies, laws, and funding. Multiple copies were kept in different libraries ensuring preservation and creating different access modes for different communities.

Once publications became electronic, preservation and access became harder. Radio and television did not fit into the library mold. Early tele-text, Lexis-Nexis, Westlaw, and AOL really did not work as library collections in traditional libraries. Academic journal publishing shifted to digital and libraries moved to serve as customer service departments for leased database access.

Some of us helped build the Internet so digital works could be archived and “libraried”. And then made archives of Web pages and created services around them.

But it turns out that few of us did this, and the biggest, Google, did it privately and for profit.  The Internet Archive was created to help and has archived billions of Web Pages, millions of hours of TV and radio, millions of books, records, movies and software.

Most traditional libraries have done little to preserve digital materials. The Internet Archive is quite unique in focusing on this mission and I would say under supported. Encouraging, however, is that 100,000 individuals a year now donate to support the Internet Archive’s public services. Hope is there.

We need libraries of digital materials, tools to use these libraries, and ways to protect them, fund them and integrate them into schools and our lives more generally. This way we can remember, think, and build on the past.

With so much in digital form, and storage and communication so easy, it should be the librarian’s day!  It can be the library user’s day…

Let’s build that world… of preservation and access, of reflection and critique, with confidence that what happened actually happened so that our histories can rely on immutable evidence.

Libraries do not command the world, but libraries are necessary in the functioning of a thoughtful world.

Thank you for supporting the Internet Archive.

Viva la Library!

Weaving Books into the Web—Starting with Wikipedia

[announcement video, Wired]

The Internet Archive has transformed 130,000 references to books in Wikipedia into live links to 50,000 digitized Internet Archive books in several Wikipedia language editions including English, Greek, and Arabic. And we are just getting started. By working with Wikipedia communities and scanning more books, both users and robots will link many more book references directly into Internet Archive books. In these cases, diving deeper into a subject will be a single click.

Moriel Schottlender, Senior Software Engineer, Wikimedia Foundation, speech announcing this program

“I want this,” said Brewster Kahle’s neighbor Carmen Steele, age 15, “at school I am allowed to start with Wikipedia, but I need to quote the original books. This allows me to do this even in the middle of the night.”

For example, the Wikipedia article on Martin Luther King, Jr cites the book To Redeem the Soul of America, by Adam Fairclough. That citation now links directly to page 299 inside the digital version of the book provided by the Internet Archive. There are 66 cited and linked books on that article alone. 

In the Martin Luther King, Jr. article of Wikipedia, page references can now take you directly to the book.

Readers can see a couple of pages to preview the book and, if they want to read further, they can borrow the digital copy using Controlled Digital Lending in a way that’s analogous to how they borrow physical books from their local library.

“What has been written in books over many centuries is critical to informing a generation of digital learners,” said Brewster Kahle, Digital Librarian of the Internet Archive. “We hope to connect readers with books by weaving books into the fabric of the web itself, starting with Wikipedia.”

You can help accelerate these efforts by sponsoring books or funding the effort. It costs the Internet Archive about $20 to digitize and preserve a physical book in order to bring it to Internet readers. The goal is to bring another 4 million important books online over the next several years.  Please donate or contact us to help with this project.

From a presentation on October 23, 2019 by Moriel Schottlender, Tech lead at the Wikimedia Foundation.

“Together we can achieve Universal Access to All Knowledge,” said Mark Graham, Director of the Internet Archive’s Wayback Machine. “One linked book, paper, web page, news article, music file, video and image at a time.”


Thank you for the donation of 78rpm records from a Craigslist poster

Mark Ellis alerted us to a Craiglist post of a storage locker of records being offered for free in San Jose in 2 hours. The owner wanted them gone. The Internet Archive sprang into action and our truck rolled.

Lots of people had responded to the ad that wanted specific records for free, but not that many that wanted 78rpm records. We love 78rpm records. We preserve them and digitize ones we do not have for the Great 78 Project.  At the end we got 1 pallet full of 78’s, maybe 2,700 discs, and they are queued for digitization.

Thank you to Joey Myers for posting on Craigslist, to Mark Ellis for alerting Jason Scott of the Archive, and the Archive staff that jumped on it.

Correct Metadata is Hard: a Lesson from the Great 78 Project

We have been digitizing about 8,000 78rpm record sides each month and now have 122,000 of them done. These have been posted on the net and over a million people have explored them. We have been digitizing, typing the information on the label, and linking to other information like discographies, databases, reviews and the like.

Volunteers, users, and internal QA checkers have pointing out typos, and we decided to go back over a couple of month’s metadata and found problems. And then we contracted with professional proofreaders and they found even more (2% of the records at this point had something to point out, some are matters of opinion or aesthetics, some lead to corrections).

We are going to pay the professional proofreaders to correct the 5 most important fields for all 122,000 records, but can use more help. We are pointing these out here in hopes to interest volunteer proofreaders and to share our experience in continually improving our collections.

Here are some of the issues with the primary performer field: before-the-after that we have now corrected from the June 2019 transfers (before | after) that we hope to upload in the next couple of weeks:

Jose Melis And His Latin American Ensemble | Jose Melis And His-Latin American Ensemble
Columbia-Orchestra | Columbia-Orchester
S. Formichi and T. Chelotti | S. Formichi e T. Chelotti
Dennis Daye and The Rhythmaires | Dennis Day and The Rhythmaires
Harry James and His Orchestra | Harry James and His Orch.
Charles Hart & Elliot Shaw | Charles Hart & Elliott Shaw
Peerless Quartet | Peerless Quartette

Some of the title corrections:

O Vino Fa ‘Papla (Wine Makes You Talk) | ‘O Vino Fa ‘Papla (Wine Makes You Talk)
Masked Ball Salaction | Masked Ball Selection
Moonlight and Roses (Brings Mem’ries Of You) | Moonlight and Roses (Bring Mem’ries Of You)
Que Bonita Eres Tu (You Are Beutiful) | Que Bonita Eres Tu (You Are Beautiful)
Buttered Roll | “Buttered Roll”
Paradise | “Paradise”
Got a Right to Cry | “Got a Right to Cry”
Blue Moods | “Blue Moods”
Auf Wiederseh’n Sweerheart | Auf Wiederseh’n Sweetheart
George M. Cohan Medley – Part 1 | George M. Cohan Medley – Part 2
Dewildered | Bewildered
Lolita (Seranata) | Lolita (Serenata)
Got a Right to Cry | “Got a Right to Cry” Joe Liggins and His Honeydrippers
Blue Moods | “Blue Moods”
Body and Soul | “Body and Soul”
Mais Qui Est-Ce | Mais Qui Est-Ce?
Wail Till the Sun Shines Nellie Blues | Wait Till the Sun Shines Nellie Blues
Que Te Pasa Joe (What Happens Joe) | Que Te Pasa Jose (What Happens Joe)
SAMSON AND DELILAH Softly Awakens My Heart | SAMSON AND DELILAH Softly Awakes My Heart
I’m Gonna COO, COO, COO | (I’m Gonna) COO, COO, COO

Most 20th Century Books Unavailable to Internet Users – We Can Fix That

The books of the 20th century are largely not online.  They are mostly not available from even the biggest booksellers. And, libraries who have collected hard copies of these books have not been able to deliver them in a cost-efficient, simple, digital form to their patrons. 

The way libraries could fill that gap is to adopt and deliver a controlled digital lending service. The Internet Archive is trying to do its part but needs others to join in. 

The Internet Archive has worked with 500 libraries over the last 15 years to digitize 3.5 million books. But based on copyright concerns the selection has often been restricted to pre-1923 books. We need complete libraries and comprehensive access to nurture a well-informed citizenry. The following graph shows the number of books digitized by the Internet Archive, binned by decade:

Up until 1923 the graph shows our collection increasing and mirroring the rise in publications.Then it dips and slows because of concerns and confusion about copyright protections for books published after that date.  It picks up again in the 1990s because these books are more readily available and separate funding has helped us digitize some recent modern books Nevertheless, the end result is that the gap is big – the digital world is missing  a huge chunk of the 20th Century. 

Users can’t even fill that gap by buying the books from that time period. According to a recent paper by Professor Rebecca Giblin, the commercial life of a book is typically exhausted 1.4 to 5 years from publication; some 90% of titles become unavailable in physical form within just two years. Most older books are therefore not available to be purchased in either physical or digital form. The following graph, pulled from a study by Professor Paul Heald, shows books by decade that are available on Amazon.com. It shows that the world’s largest bookseller has the same huge gap – the 20th century is simply missing. 

The 20th Century represents a significant portion of published knowledge – approximately one-third of all books – as shown in the graph below.  These books are largely unavailable commercially, BUT they are not completely lost. Many of these books are on library shelves, accessible only if you physically visit the library that owns those books. Even if you’re willing to visit, those books might still not be accessible. Libraries, pressed to repurpose their buildings, have increasingly moved volumes to off-site storage facilities.

The way to make 20th Century books available to library patrons is to digitize those books and let every library who owns a physical copy lend that book in digital form. This type of service has come to be known as controlled digital lending (CDL).  The Internet Archive has been doing this for years. We lend out-of-copyright and in-copyright volumes that we physically own. We’ve reformatted the physical volume, produced a digital version and lend only that digital version to one user at a time. Our experience shows that this responds to a real demand, fills a genuine need satisfactorily, gives new life to older books, and brings important knowledge to a new audience. Check out this case study for CDL involving the book Wasted which figured prominently in the Brett Kavanaugh Supreme Court nomination hearings.  

Our experience has been replicated by other early adopters and providers of a CDL service. Here’s a list of some of them. We believe every library can transform itself into a digital library. If you own the physical book, you can choose to circulate a digital version instead.

We urge more libraries to join Open Libraries and lend digitized versions of their print collections, making more copies of books available for loan and getting more books into the hands of digital  readers everywhere. 

Helping us judge a book by its cover: software help request

The Internet Archive would appreciate some help from a volunteer programmer to create software that would help determine if a book cover is useful to our users as a thumbnail or if we should use the title page instead. For many of our older books, they have cloth covers that are not useful, for instance:

But others are useful:

Just telling by age is not enough, because even 1923 cloth covers are sometimes good indicators of what the book is about (and are nice looking):

We would like a piece of code that can help us determine if the cover is useful or not to display as the thumbnail of a book. It does not have to be exact, but it would be useful if it knew when it didn’t have a good determination so we could run it by a person.

To help any potential programmer volunteers, we have created folders of hundreds of examples in 3 catatories: year 1923 books with not-very-useful covers, year 1923 books with useful covers, and year 2000 books with useful covers. The filenames of the images are the Internet Archive item identifier that can be used to find the full item:  1922forniaminera00bradrich.jpg would come from https://archive.org/details/1922forniaminera00bradrich.   We would like a program (hopefully fast, small, and free/open source) that would say useful or not-useful and a confidence. 

Interested in helping? Brenton at archive.org is a good point of contact on this project.   Thank you for considering this. We can use the help. You can also use the comments on this post for any questions.

FYI: To create these datasets, I ran these command lines, and then by hand pulled some of the 1923 covers into the “useful” folder.

bash-3.2$ ia search "date:1923 AND mediatype:texts AND NOT collection:opensource AND NOT collection:universallibrary AND scanningcenter:*" --itemlist --sort=downloads\ desc | he\
ad -1000 | parallel --will-cite -j10 "curl -Ls https://archive.org/download/{}/page/cover_.jpg?fail=soon.jpg\&cnt=0 >> ~/tmp/cloth/{}.jpg"

bash-3.2$ ia search "date:2000 AND mediatype:texts AND scanningcenter:cebu" --itemlist --sort=downloads\ desc | head -1000 | parallel --will-cite -j10 "curl -Ls https://archive.\
org/download/{}/page/cover_.jpg?fail=soon.jpg\&cnt=0 >> ~/tmp/picture/{}.jpg"

Low Vision? Disability? 1.8 Million digital books free, now Worldwide

 

You can take action now to expand access to 1.8million books worldwide.

Individuals can qualify now to access 1.8 million digitized books, free.   Internet Archive has recently expanded its program for those with low vision and disabilities.

Libraries, hospitals, schools, other orgs (worldwide!): can now sign up to authorize users, as well as get digital files for further remediation for qualifying users.

Publishers, please contribute your books for this program!

Service organizations, please host your digital books on archive.org to make seamless access to all books. Free.

Press: please help get the word out, let us know how we can help.

Donations needed to get 4 million more books. Donate books or money (about $20 per book).

Now available for access by anyone with disabilities worldwide… or for anyone to contribute books for people with disabilities: both helped by the US recently adopting the Marrakesh Treaty.

Now is our time to bring together a great library for those with disabilities.

Together we can.

 

 

Physical Archive Party October 28th, 2018

Archive friends and family please join us for the Physical Archive party, 2-6pm 2512 Florida Avenue, Richmond CA.   RSVP please so we can keep a count.  Staff, partners, friends of the Internet Archive are all invited.

RSVP HERE

This is a unique opportunity to see some of the behind the scenes activities of the Internet Archive and millions of books, music and movie items. The Internet Archive is well known for its digital collections — the digital books, software, film and music that millions of people access every day through archive.org, but did you know that much of our online content is derived from physical collections, stored in the east bay in Richmond?

2018 has been a year of focus on inventory and ramping up throughput at our digitization centers. At the physical archive we see collections of film, software, documents, books, and music at least three times before it is finally archived in Richmond. Once coming in the door to be inventoried, secondly as we ship it out to be digitized in Hong Kong or Cebu, and thirdly coming back to us for long term storage.

This year, the staff at the physical archive would like you to to see the physical archive and celebrate our achievements in 2018. Bring your roller skates and drones for competitive battles, get a drink at the ‘Open Container Bar’ and then peruse the special collections in our dedicated space. Uninhibitedly show off your dance skills at our silent server disco and enjoy brews and Halloween gruel.

We will also be showcasing our collection of books, music and film in both a working environment and our special collections room. We will have tours of our facilities, demonstration of how we get hundreds of thousands books a year digitized at our Hong Kong (and now Cebu) Super Centers and safely back again.

Prize for scariest librarian costume.
This is a halloween event so costumes are encouraged!