Author Archives: Wendy Hanamura

Meet Eliza Zhang, Book Scanner and Viral Video Star

The glass rises and falls. Quickly and efficiently, a woman turns the pages to the rhythmic beep of the cameras. She never misses a beat.

In its first 48 hours, this tweet about book scanning at the Internet Archive went viral, reaching 7.7 million people. More than 1.5 million people viewed the video, liking it 70,000 times and retweeting it 24,000 more. At the center of it all sits Eliza Zhang, a book scanner at the Internet Archive’s headquarters in San Francisco since 2010. When I asked Eliza what she likes about her job, she replied, “Everything! I find everything interesting. I don’t feel it is boring. Every collection is important to me.”

Eliza, a college graduate from southern China, immigrated to the United States in 2009, seeking a new life and new opportunities. She landed in San Francisco during the midst of an economy-crushing recession. But through a city program called JobsNOW, the Internet Archive hired Eliza and scores of other job seekers, training them to digitize, quality control, and upload metadata for books, newspapers, periodicals and manuals. Often our digitizing staff are making these analog texts available online for the first time.

Eliza Zhang in front of the Scribe (featured in the viral video) that she has operated for more than a decade.

Raising the glass with a foot pedal, adjusting the two cameras, and shooting the page images are just the beginning of Eliza’s work. Some books, like the Bureau of Land Management publication featured in the video, have myriad fold-outs. Eliza must insert a slip of paper to remind her to go back and shoot each fold-out page, while at the same time inputting the page numbers into the item record. The job requires keen concentration.

If this experienced digitizer accidentally skips a page, or if an image is blurry, the publishing software created by our engineers will send her a message to return to the Scribe and scan it again.

Brittle, delicate fold-outs, like this page from “Early London theatres” (1894), make digitization a time-intensive task best handled by a human operator.

Listening to 70s and 80s R & B while she works, Eliza spends a little time each day reading the dozens of books she handles. The most challenging part of her job? “Working with very old, fragile books. The paper is very thin. I always wear rubber fingertips and sometimes gloves when I scan newspapers, because of the ink,” she explained.

Tweets Spark a New Interest in Digitization

Eliza is one of about 70 Scribe operators at the Internet Archive, working in digitization centers embedded in libraries across the United States, United Kingdom, and Canada. The operations are led by Elizabeth MacLeod, who manages our remote operations, and Andrea Mills, who is stationed at the University of Toronto, with support from managers and operators in each center.

“We try to meet libraries where they are,” said MacLeod, who manages remote operations from her home office in North Carolina. “From digitizing a few shipments a year at one of our regional centers to setting up and staffing full-service digitization within the library itself, we have a flexible approach to our library partnerships.”

Across Twitter, another common question arose: “Why hasn’t this job been automated?” To many, the repetitive act of turning the pages in a book and photographing them seems like the natural task for a robot. In fact, some 20 years ago, we tested commercial book scanners that feature a vacuum-powered page-turning arm. It turns out those automated scanners didn’t really work well for brittle books, rare volumes, and other special collections—the kinds of material our library partners ask us to digitize.

Scribe operators and staff at Internet Archive’s former digitization center in San Francisco, ca. 2011.

“Clean, dry human hands are the best way to turn pages,” said Mills, from her socially-distanced office at the University of Toronto. In her 15 years on the job, she has worked with hundreds of librarians to hone our digitization operations, balancing our need to preserve the original pages with minimal impact during the imaging process. “Our goal is to handle the book once and to care for the original as we work with it,” Mills explained.

So what does it take to be a Scribe operator? “It takes a level of zen,” wrote Brewster Kahle, founder and digital librarian of the Internet Archive, responding to one of the many threads about the video that popped up on Reddit. “It takes concentration and a love of books. For those who love working with books and libraries, it fits well.”

As for the hardware used for digitization, like much at the Internet Archive, the equipment is engineered and purpose-built for the job. In the viral video, Eliza is operating the original Scribe machine, designed more than 15 years ago, and Scribe software that was developed in-house and refined continuously over years of operation. “The variation in books makes [automation] difficult to do quickly and without damage,” Kahle elaborates. “We do not disbind the books, which also makes automation more difficult.”

18,000 Books and Climbing

In the decade Eliza has been working with the Internet Archive, she has scanned more than 3 million pages, 14,000 foldouts, and 18,000 items (mostly books).

And what about all the sudden social media attention? Eliza shrugs. She’s never been on Twitter before. “My goal is to guarantee zero errors,” she said. “I want to give our readers a satisfying experience.”


Digitize With Us

The Covid-19 pandemic has both created higher demand for digital content as well as shuttered some of our scanning centers for health and safety. We have reopened following local and national health guidelines and continue to engage with new libraries on their digitization projects. 

If your library is interested in learning more about the Internet Archive’s digitization services, visit https://archive.org/scanning, and contact us at digitallibraries@archive.org

What if you could wander the library stacks…online?

Open Library Explorer is an experimental new interface that allows patrons to search our shelves of 4+ million books.

Introducing the new Open Library Explorer

As a student at the University of Waterloo, whenever Drini Cami felt stressed, he’d head to the library. Wandering through the stacks, flipping through 600-page volumes about quantum mechanics or the properties of prime numbers never failed to calm him down. And the best thing? “I would always leave the library having discovered something new—usually a variety of new things,” Cami explained.  “This is something I haven’t been able to replicate at a digital library like Open Library.” What Drini longed for was the ability to discover new books serendipitously, browsing bookshelves organized by a century of librarians. But unlike most readers, Drini Cami wields a superpower: he is a designer and software developer at the Internet Archive.

Enter the Open Library Explorer, Cami’s new experiment for browsing more than 4 million books in the Internet Archive’s Open Library. Still in beta, Open Library Explorer is able to harness the Dewey Decimal or Library of Congress classification systems to recreate virtually the experience of browsing the bookshelves at a physical library. Open Library Explorer enables readers to scan bookshelves left to right by subject, up and down for subclassifications. Switch a filter and suddenly the bookshelves are full of juvenile books. Type in “subject: biography” and you see nothing but biographies arranged by subject matter.

Why recreate a physical library experience in your browser?

Now that classrooms and libraries are once again shuttered, families are turning online for their educational and entertainment needs. With demand for digital books at an all-time high, the Open Library team was inspired to give readers something closer to what they enjoy in the physical world. Something that puts the power of discovery back into the hands of patrons.

Escaping the Algorithmic Bubble

One problem with online platforms is the way they guide you to new content. For music, movies, or books, Spotify, Netflix and Amazon use complicated recommendation algorithms to suggest what you should encounter next. But those algorithms are driven by the media you have already consumed. They put you into a “filter bubble” where you only see books similar to those you’ve already read. Cami and his team devised the Open Library Explorer as an alternative to recommendation engines. With the Open Library Explorer, you are free to dive deeper and deeper into the stacks. Where you go is driven by you, not by an algorithm..

Zoom out to get an ever expanding view of your library
Change the setting to make your books 3D, so you can see just how thick each volume is.

Cool New Features

By clicking on the Settings gear, you can customize the look and feel of your shelves. Hit the 3D options and you can pick out the 600-page books immediately, just by the thickness of the spine. When a title catches your eye, click on the book to see whether Open Library has an edition you can preview or borrow. For more than 4 million books, borrowing a copy in your browser is just a few clicks away.

Ready to enter the library? Click here, and be sure to share feedback so the Open Library team can make it even better. 

After Searching for a Decade, Legendary Hollywood Research Library Finds a New Home

Over more than 50 years, Lillian Michelson built one of Hollywood’s most famous libraries for film research.

[Press: Hollywood Reporter]

Need to know what an Igloo really looks like? How about a Siberian hut? Or the inside of a 15th Century jail?  For 50 years in Hollywood, generations of filmmakers would beat a path to the Michelson Cinema Research Library, where renowned film researcher Lillian Michelson could hunt down the answer to just about any question. She was the human card catalogue to a library of more than one million books, photos, periodicals and clippings. But ever since Lillian retired a decade ago, the Michelson Cinema Research Library has been languishing in cold storage, looking for a home. Today it has found one. Lillian Michelson, 92, announced that she is donating her library and life’s work to the Internet Archive. For its part, the nonprofit digital library vows to preserve her collection for the long-term and digitize as much of it as possible, making it accessible to the world.

“I feel as if a fantasy I never, never entertained has been handed to me by the universe, by fate,” mused the legendary film researcher.“The Internet Archive saved my library in the best way possible. I hope millions of people will use it [to research] space, architecture, costumes, towns, cities, administration, foreign countries… the crime business!  Westerns! That’s what is amazing to me, that it will be open to everybody.”

Internet Archive founder, Brewster Kahle, explained why his organization was willing to accept the entire Michelson collection and keep it intact: “A library is more than a collection of books. It is the center of a community. For decades, the Michelson Cinema Research Library informed Hollywood—and we want to see that continue. Many organizations wanted pieces of the collection, but I think the importance of keeping it together is so it can continue to help inspire global filmmakers to make accurate and compelling movies.”

Samuel Goldwyn Studios, circa 1938, where the Michelson Cinema Research Library was housed for many decades.

With $20,000 borrowed against her husband Harold’s life insurance policy, Lillian Michelson purchased the reference library in 1969. Over the next half-century, the Michelson Cinema Research Library had many homes. From the Samuel Goldwyn Studios it moved to the American Film Institute, then to Paramount Studios, and finally to Zoetrope Studios at the invitation of director, Francis Ford Coppola. Michelson later received an offer via Jeffrey Katzenberg to move the Michelson Cinema Research Library to the newly opened DreamWorks Pictures, where it remained until Lillian’s retirement due to health reasons 19 years later.

The Michelson Cinema Research Library includes some 5,000+ books dating back to the early 1800s; periodicals, 30,000+ photographs, and 3,000+ clipping files. In storage they filled some 1600 boxes on 45 pallets—enough to fill more than two 18-wheel tractor trailers. Its contents have now been moved for long-term preservation to the Internet Archive’s physical archive in Richmond, California.

In September 2020, Internet Archive Founder & Digital Librarian, Brewster Kahle, was on hand at the Internet Archive’s Physical Archive in Richmond, CA to accept the 1600 boxes of books, photos, clippings, and memorabilia from the Michelson Cinema Research Library. Michelson’s books were then shipped to one of the Internet Archive’s scanning centers to be digitized and ultimately made accessible to the public.

For six decades, Michelson’s research informed scores of Hollywood films, including The Right Stuff, Rosemary’s Baby, Scarface, Fiddler on the Roof, Full Metal Jacket, The Graduate and The Birds.

Harold & Lillian Michelson fueled the creativity of scores of directors, from Alfred Hitchcock to Mel Brooks, and their influence can be traced through countless Hollywood films.

Bringing this historic Hollywood design resource back to life—a largely digital life—can make it a global design resource for art directors, designers, filmmakers and researchers in search of information and visual inspiration. 

“Lillian Michelson opened my eyes to the importance of a research library to all aspects of motion picture production. At a time when the rich and deep research libraries created and maintained by the motion picture studios were being ‘given away’ or otherwise destroyed, Lillian was a beacon of light guiding us to consider them as treasure.”

Academy Award-winning director, Francis Ford Coppola
Harold & Lillian: A Hollywood Love Story” by director Daniel Raims chronicles the couple who became Hollywood’s “secret weapons,” empowering generations of filmmakers and designers to create their most iconic work.

The story of her long and creative union with renowned storyboard artist Harold Michelson was told in Harold and Lillian: A Hollywood Love Story, a 2015 documentary produced and directed by Daniel Raim and currently streaming on Netflix. (To honor this devoted Hollywood couple, the DreamWorks Pictures named the king and queen in Shrek 2 Harold and Lillian.)

Lillian Michelson will preside over a virtual ribbon cutting, panel discussion, and a screening of the documentary on Wednesday, January 27 from 4-6:30 PM Pacific time. There, she will unveil the first phase of her new digital library, available to the world via the Internet Archive’s digital platform, at https://archive.org/details/michelson. Sign up for the screening event here.

What Information Should we be Preserving in Filecoin?

The folks at Protocol Labs love their rockets. And outerspace. And exploration.

So when Filecoin, their cryptocurrency-fueled decentralized storage network launched recently, it was no surprise they called it Filecoin Liftoff. In the payload of that Filecoin rocket are treasures from the Internet Archive:

For 15 years, LibriVox has harnessed a global army of volunteers, creating 14,200 free public domain audiobook projects in 100 different languages. Where else can you listen to Jules Verne’s 20,000 Leagues Under the Sea in French, Spanish, English, German or Dutch…for free? Now, phrases of Shakespeare, Poe, Joyce and Dante will be stored across the Filecoin mainnet, broken into packets to be reconstituted when needed—perhaps in a new century.

The same destiny awaits the home movies, stock footage, educational and amateur films in the public domain, lovingly curated by the Prelinger Archives founder, Rick Prelinger. He encourages creatives to download and reuse these videos, creating countless new works like this one by musician Jordan Paul:

Now filmmakers and connoisseurs can sleep easier, knowing that a new, distributed copy of those films lives in the Filecoin network, (along with the main copy and multiple backups in the Internet Archive’s repositories.)

So what’s next Filecoin explorers?

Today, Protocol Labs and the Internet Archive are happy to announce the Filecoin Archives, a new community project to curate, disseminate and preserve important open access information often at risk of being lost. You can get involved in so many ways: by nominating information to be stored, uploading it to the Internet Archive, preserving the data as a Filecoin node while earning Filecoin for sharing your storage capacity.

What information should we be preserving? Please tell us!

How about 166,000 public domain books (60 terabytes) from the Library of Congress? Including 2100 texts about Abraham Lincoln and slavery?

Or Open Access Journal articles? (The Internet Archive has collected 9.1 million of them.)

It takes a host of global voices with diverse viewpoints to ensure that humanity’s most precious knowledge is represented online and preserved. So we need to hear from you. What open access information or datasets are you interested in preserving?

Between now and November 5, please send us your ideas and vote on the others. We will gather your suggestions, add our own, and publish the list from which we will select information to preserve across a global network of Filecoin nodes.

How to send us your suggestions 

Look for the tweet from @JuanBenet– reply to it with:

  • The Name of the Dataset.
  • The size in GB or TB.
  • An HTTP or @IPFS link to the data.
  • Why it matters.
  • #FilecoinArchives

Bonus points if the data is already stored in the Internet Archive or if you upload it there. Vote for ideas by retweeting them and please help us spread the word!

Juan Benet presents his early vision at the 2016 Decentralized Web Summit at the Internet Archive in San Francisco.

In 2015, a young developer named Juan Benet wandered into the Internet Archive headquarters. He painted a picture of a decentralized stack, something he now calls Web3, where the storage, transport and other layers would be distributed across many machines. Together with the DWeb community, we have imagined a web with our values written into the code: values such as privacy, security, reliability, and control over one’s own identity.  With the launch of Filecoin’s mainnet, a piece of that new web is perhaps within reach. 

Now it’s up to us to make sure the payload includes humanity’s most important knowledge.

Advertising powers the Web. What if it just doesn’t work?

On October 14, the Internet Archive presented a book talk with author Tim Hwang, NYT Tech Reporter, Kashmir Hill, and technologist, Desigan Chinniah, discussing Hwang’s new book is “Subprime Attention Crisis.”

Is the Ad-Tech model powering the Internet really just the next financial bubble?

That is the question at the heart of a significant new book by Internet researcher Tim HwangSubprime Attention Crisis: Advertising and the Time Bomb at the Heart of the Internet. If you don’t already know Tim, he’s is a polymath: former Google AI policy wonk, lawyer, polemicist. In other words, just the kind of thinker we think you should know. Watch the video of a virtual book event with Tim here:

Subprime Attention Crisis makes the case that the core advertising model driving Google, Facebook, and many of the most powerful companies on the internet is—at its heart—a multibillion dollar financial bubble. Drawing parallels to the 2008 subprime mortgage crisis, Tim shines a spotlight on the lack of transparency, flawed incentives, and outright fraud that keep this machine running.

On October 14, the Internet Archive hosted a talk with the author and New York Times technology reporter Kashmir Hill. Their discussion tackled:

  • Why data-driven, online advertising may be much, much less effective than it looks
  • The long-term impact of the COVID-19 recession on the media and online ads
  • Whether or not the giants of Big Tech are already “too big to fail”

This discussion focused not only on the problems of advertising, but also on the future, and how we might be able to transition to a better, more financially robust internet. Joining the discussion was Desigan Chinniah, who co-leads Grant for the Web—a $100 million fund launched by Coil, Mozilla, and Creative Commons to spur open standards and new economic models for the web beyond advertising.

NOTE: We urge you to purchase a copy of Tim’s new book, Subprime Attention Crisis, via our local bookseller, The Booksmith. The first 50 purchasers will receive an autographed copy.

On Bookstores, Libraries & Archives in the Digital Age

The following was a guest post by Brewster Kahle in Against The Grain (ATG). See the original article from September 28, 2020 on the ATG website here.

Back in 2006,  I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a framed blown-up letter “S.”  This was an inside joke about the Internet Archive being named in the singular, Archive, rather than the plural Archives. Of course, he was right, as I should have known all along. The Internet Archive had long since grown out of being an “archive of the Internet”—a singular collection, say of web pages—to being “archives on the Internet,” plural.  My evolving understanding of these different names might help focus a discussion that has become blurry in our digital times: the difference between the roles of publishers, bookstores, libraries, archives, and museums. These organizations and institutions have evolved with different success criteria, not just because of the shifting physical manifestation of knowledge over time, but because of the different roles each group plays in a functioning society. For the moment, let’s take the concepts of Library and Archive.

The traditional definition of a library is that it is made up of published materials, while an archive is made up of unpublished materials. Archives play an important function that must be maintained—we give frightfully little attention to collections of unpublished works in the digital age. Think of all the drafts of books that have disappeared once we started to write with word processors and kept the files on fragile computer floppies and disks. Think of all the videotapes of lectures that are thrown out or were never recorded in the first place. 

Bookstores: The Thrill of the Hunt

Let’s try another approach to understanding distinctions between bookstores, libraries and archives. When I was in my 20’s living in Boston—before Amazon.com and before the World Wide Web (but during the early Internet)—new and used bookstores were everywhere. I thought of them as catering to the specialized interests of their customers: small, selective, and only offering books that might sell and be taken away, with enough profit margin to keep the store in business. I loved them. I especially liked the used bookstore owners—they could peer into my soul (and into my wallet!) to find the right book for me. The most enjoyable aspect of the bookstore was the hunt—I arrived with a tiny sheet of paper in my wallet with a list of the books I wanted, would bring it out and ask the used bookstore owners if I might go home with a bargain. I rarely had the money to buy new books for myself, but I would give new books as gifts. While I knew it was okay to stay for awhile in the bookstore just reading, I always knew the game.

Libraries: Offering Conversations not Answers

The libraries that I used in Boston—MIT Libraries, Harvard Libraries, the Boston Public Library—were very different. I knew of the private Boston Athenæum but I was not a member, so I could not enter. Libraries for me seemed infinite, but still tailored to individual interests. They had what was needed for you to explore and if they did not have it, the reference librarian would proudly proclaim: “We can get it for you!” I loved interlibrary loans—not so much in practice, because it was slow, but because they gave you a glimpse of a network of institutions sharing what they treasured with anyone curious enough to want to know more. It was a dream straight out of Borges’ imagination (if you have not read Borges’ short stories, they are not to be missed, and they are short. I recommend you write them on the little slip of paper you keep in your wallet.) I couldn’t afford to own many of the books I wanted, so it turned off that acquisitive impulse in me. But the libraries allowed me to read anything, old and new. I found I consumed library books very differently. I rarely even brought a book from the shelf to a table; I would stand, browse, read, learn and search in the aisles. Dipping in here and there. The card catalog got me to the right section and from there I learned as I explored. 

Libraries were there to spark my own ideas. The library did not set out to tell a story as a museum would. It was for me to find stories, to create connections, have my own ideas by putting things together. I would come to the library with a question and end up with ideas.  Rarely were these facts or statistics—but rather new points of view. Old books, historical newspapers, even the collection of reference books all illustrated points of view that were important to the times and subject matter. I was able to learn from others who may have been far away or long deceased. Libraries presented me with a conversation, not an answer. Good libraries cause conversations in your head with many writers. These writers, those librarians, challenged me to be different, to be better. 

Staying for hours in a library was not an annoyance for the librarians—it was the point. Yes, you could check books out of the library, and I would, but mostly I did my work in the library—a few pages here, a few pages there—a stack of books in a carrel with index cards tucked into them and with lots of handwritten notes (uh, no laptops yet).

But libraries were still specialized. To learn about draft resisters during the Vietnam War, I needed access to a law library. MIT did not have a law collection and this was before Lexis/Nexis and Westlaw. I needed to get to the volumes of case law of the United States.  Harvard, up the road, had one of the great law libraries, but as an MIT student, I could not get in. My MIT professor lent me his ID that fortunately did not include a photo, so I could sneak in with that. I spent hours in the basement of Harvard’s Law Library reading about the cases of conscientious objectors and others. 

But why was this library of law books not available to everyone? It stung me. It did not seem right. 

A few years later I would apply to library school at Simmons College to figure out how to build a digital library system that would be closer to the carved words over the Boston Public Library’s door in Copley Square:  “Free to All.”  

Archives: A Wonderful Place for Singular Obsessions

When I quizzed the archivist at MIT, she explained what she did and how the MIT Archives worked. I loved the idea, but did not spend any time there—it was not organized for the busy undergraduate. The MIT Library was organized for easy access; the MIT Archives included complete collections of papers, notes, ephemera from others, often professors. It struck me that the archives were collections of collections. Each collection faithfully preserved and annotated.  I think of them as having advertisements on them, beckoning the researcher who wants to dive into the materials in the archive and the mindset of the collector.

So in this formulation, an archive is a collection, archives are collections of collections.  Archivists are presented with collections, usually donations, but sometimes there is some money involved to preserve and catalog another’s life work. Personally, I appreciate almost any evidence of obsession—it can drive toward singular accomplishments. Archives often reveal such singular obsessions. But not all collections are archived, as it is an expensive process.

The cost of archiving collections is changing, especially with digital materials, as is cataloging and searching those collections. But it is still expensive. When the Internet Archive takes on a physical collection, say of records, or old repair manuals, or materials from an art group, we have to weigh the costs and the potential benefits to researchers in the future. 

Archives take the long view. One hundred years from now is not an endpoint, it may be the first time a collection really comes back to light.

Digital Libraries: A Memex Dream, a Global Brain

So when I helped start the Internet Archive, we wanted to build a digital library—a “complete enough” collection, and “organized enough” that everything would be there and findable. A Universal Library. A Library of Alexandria for the digital age. Fulfilling the memex dream of Vanevar Bush (do read “As We May Think“), of Ted Nelson‘s Xanadu, of Tim Berners-Lee‘s World Wide Web, of Danny Hillis‘ Thinking Machine, Raj Reddy’s Universal Access to All Knowledge, and Peter Russell’s Global Brain.

Could we be smarter by having people, the library, networks, and computers all work together?  That is the dream I signed on to.

I dreamed of starting with a collection—an Archive, an Internet Archive. This grew to be  a collection of collections: Archives. Then a critical mass of knowledge complete enough to inform citizens worldwide: a Digital Library. A library accessible by anyone connected to the Internet, “Free to All.”

ABOUT THE AUTHOR

Brewster Kahle, Founder & Digital Librarian, Internet Archive

A passionate advocate for public Internet access and a successful entrepreneur, Brewster Kahle has spent his career intent on a singular focus: providing Universal Access to All Knowledge. He is the founder and Digital Librarian of the Internet Archive, one of the largest digital libraries in the world, which serves more than a million patrons each day. Creator of the Wayback Machine and lending millions of digitized books, the Internet Archive works with more than 800 library and university partners to create a free digital library, accessible to all.

Soon after graduating from the Massachusetts Institute of Technology where he studied artificial intelligence, Kahle helped found the company Thinking Machines, a parallel supercomputer maker. He is an Internet pioneer, creating the Internet’s first publishing system called Wide Area Information Server (WAIS). In 1996, Kahle co-founded Alexa Internet, with technology that helps catalog the Web, selling it to Amazon.com in 1999.  Elected to the Internet Hall of Fame, Kahle is also a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, and holds honorary library doctorates from Simmons College and University of Alberta.

Judge Sets Tentative Trial Date for November 2021

This week, a federal judge issued this scheduling order, laying out the road map that may lead to a jury trial in the copyright lawsuit brought by four of the world’s largest publishers against the Internet Archive. Judge John G. Koeltl has ordered all parties to be ready for trial by November 12, 2021. He set a deadline of December 1, 2020, to notify the court if the parties are willing to enter settlement talks with a magistrate judge. 

Attorneys for the Internet Archive have met with representatives for the publishers, but were unable to reach an agreement. “We had hoped to settle this needless lawsuit,” said Brewster Kahle, Internet Archive’s founder and Digital Librarian. “Right now the publishers are diverting attention and resources from where they should be focused: on helping students during this pandemic.” 

The scheduling order lays out this timeline:

  • Discovery must be completed by September 20, 2021;
  • Dispositive motions must be submitted by October 8, 2021;
  • Pretrial orders/motions must be submitted by October 29, 2021;
  • Parties must be ready for trial on 48 hours notice by November 12, 2021.

In June, Hachette Book Group, Inc., HarperCollins Publishers LLC, John Wiley & Sons, Inc., and Penguin Random House LLC—with coordination by the Association of American Publishers—filed a lawsuit to stop the Internet Archive from digitizing and lending books to the public, demanding that the non-profit library destroy 1.5 million digital books. 

Publishers Weekly Senior Writer Andrew Albanese has been covering the story from the beginning. In a July 31st Beyond the Book podcast for the Copyright Clearance Center, Albanese shared his candid opinions about the lawsuit. “If this was to be a blow out, open-and-shut case for the publishers, what do the publishers and authors get?” Albanese asked. “I’d say nothing.”

“Honestly, a win in court on this issue will not mean more sales for books for publishers. Nor will it protect any authors or publisher from the vagaries of the Internet,” the Publishers Weekly journalist continued. “Here we are in the streaming age, 13 years after the ebook market took off, and we’re having a copyright battle, a court battle over crappy PDFs of mostly out-of-print books? I just don’t think it’s a good look for the industry.”

In order to make the vast majority of 20th Century books accessible to digital learners, libraries such as the Internet Archive have been digitizing the physical books they own and lending them on a 1-to-1 “own to loan” basis—a legal framework called Controlled Digital Lending. Publishers refuse to sell ebooks to libraries, insisting on temporary licenses on restrictive terms.  This business practice “threatens the purpose, values, and mission of libraries and archives in the United States,” explains Kyle K. Courtney, copyright advisor to Harvard University Libraries. “It undermines the ability of the public (taxpayers!) to access the materials purchased with their money for their use in public libraries and state institutions, and further, it is short sighted, and not in the best interest of library patrons or the public at large.” 

“Libraries have always had the right to buy and lend books. It’s at the core of a library’s mission,” said Kahle. “The Internet Archive would like to purchase ebooks, but the publishers won’t sell them to us, or to any library. Instead they are suing us to stop all learners from accessing the millions of digitized books in our library.”

Harvard Copyright Scholar: “Libraries have special authority”

On July 22, 2020, Kyle K. Courtney, Copyright Advisor at Harvard University, spoke at a press conference about the copyright lawsuit against the Internet Archive brought by the publishers Hachette, HarperCollins, Wiley, and Penguin Random House. He holds a J.D. with distinction in Intellectual Property Law and a Master of Science in Library and Information Science (MSLIS) degree. Courtney is a published author and nationally recognized speaker on the topics of copyright, technology, libraries, and the law. These are his remarks:

Part of my work in scholarship is about the roles of copyright and the library landscape. I wrote the white paper on Controlled Digital Lending of library books with my coauthor David Hansen at Duke University Libraries. And it presents the legal rationale supporting the overview document called the Position Statement on Controlled Digital Lending, which has been endorsed by many national library organizations, regional library consortia, specific library systems, themselves, individual librarians, and legal experts. Ultimately, though, this is about how libraries can do what they’ve always done, right? Lend books. The paper looks at the underpinnings of the library’s historical mission through the lens of both fair use and first sale, true critical rights that I think any library uses in their programs, right? Both for lending and preservation. And I discuss how libraries can legally lend digital copies of their print collections using this technology.

But I’d like to point out that a CDL system is not a brand new concept, like Corynne stated: libraries loan books to the public. It’s what they do, for centuries. And libraries do not need permission or a license to loan those books that they have purchased or acquired. Copyright law covers those exact issues. But the difference here, I think, and some of the conflict is that the vendors and publishers have to ask permission, right? They must license. This is their business model. Historically, libraries are special creatures of copyright law; libraries have a legally authorized mandate, by the way, granted by Congress, to complete their mission to provide both access to materials. Congress actually placed all of these specialized copyright exemptions for libraries in the Copyright Act itself. So that’s kind of fun to look at library’s unique role in copyright law, they sit right in the middle, both housing the economic purpose of copyright: “we buy the books, we buy lots of books,” and the access purpose of copyright, which is, “we loan the books out to our users.”

Or if you want to put that in the constitutional narrative: libraries are promoting the progress of science and the useful arts. Libraries have historically provided unfettered access and freedom to the books that they purchase for their communities. Now, because of that, there’s multiple versions of CDL-like systems that are currently used in libraries. But I think the origin of the real legal underpinning concept was first explored by Professor Michelle Wu at Georgetown University School of Law in an article that she wrote that I read many times, “Building a Collaborative Digital Collection.” Later, the Internet Archive formed up the Open Library Program, which Chris talked about, which was nine years ago. And other institutions are exploring this option right in their own individual libraries or part of consortia or within affinity groups.

It’s exciting to see, but at its core, Controlled Digital Lending is about replicating, through the Controlled Digital Lending Process, the legal and economically significant aspects of physical lending. And in other words, let’s put this simply: it continues to preserve the powers in the print. A library has these significant legal usage rights and they have great fiscal value in their collections. Some public library systems have spent millions upon millions of dollars to make their collections accessible to the community. And I believe the CDL structure preserves that value by enhancing access of these works to the public through technology. And as Chris pointed out, it’s the same technology that’s used by publishers to distribute in the commercial marketplace.

Again, this may be about the fear of technology, certainly, but technology should be used to enhance access to materials and do what libraries have always done: increasing access to knowledge by loaning the materials to the public. Just because we’re using technology does not mean that suddenly these acts are new. And in fact, libraries have special authority to provide both access longterm to information and preserve these materials for much longer than the business model of any particular corporation, company, or vendor.

And this is especially true with the 20th Century works that are in libraries, right? They have not been available in the digital world across the board. They call this “The 20th Century Black Hole.” Many 20th Century books are not available for purchase as new copies or in print or digital versions online. And I don’t know if your students are like mine or anyone else or patrons: if it’s not digital, it’s almost like it doesn’t exist. Libraries would like to provide digital access, but we can’t, because these are not available in a licensed format or in a digital format that’s available to loan, but we have them on our shelves.

So as many of our student patrons say, “We want access to these works,” and these could be long lost print works, by the way, that are really not lost; they’re on the shelves of our libraries, just trapped there, and in COVID maybe trapped there for a longer time than anticipated. So imagine the potentially enormous high social and scholarly value and relatively low risk if we make these works available to the public for reading, quoting, citing, adaption, using Wikipedia articles. So that’s kind of the exciting aspect of it. I’m not going to get into great detail, but our principle argument in this paper, which summarizes all these points, is that Controlled Digital Lending is a fair use, which is an equitable rule of reason, that permits libraries to do what they’ve always done. And under the First Sale Doctrine, loan those books to users. Thanks a lot for your time.

Open Libraries Director: “Everyone should have equal and equitable access to a comprehensive library”

On July 22, 2020, Chris Freeland, Director of Open Libraries at the Internet Archive, spoke at a press conference about the copyright lawsuit brought by the publishers Hachette, HarperCollins, Wiley, and Penguin Random House against our non-profit digital library. These are his remarks:

I’m Chris Freeland, I’m a librarian at the Internet Archive and I’m the Director of the Open Libraries Program at the Internet Archive. I’ve been at the Internet Archive for more than two and a half years. Before joining the Archive, I was an associate university librarian at Washington University in St. Louis, and then before that I was the Technical Director of a project called the Biodiversity Heritage Library. And so for more than 15 years, I’ve worked in partnership with the Internet Archive to digitize books and make them as widely available as possible through technology and through copyright.

In that same amount of time, that’s when the Internet Archive was partnering with those one thousand libraries that Brewster just mentioned to digitize nearly four million books. So most of those books, when we were partnering with libraries, most of those books were in the public domain, and that means that those were easily published online. They didn’t need restrictions for use. They didn’t need any kind of controls. But at the Internet Archive, we think that everyone deserves to learn. So our goal is to build a research library with more than four million modern books that we can make available to users all over the world.

Now you may be asking why four million? Four million books is the size of a large metropolitan public library. It’s about the same size as a Chicago Public Library or a San Francisco Public Library. And we think that everyone, regardless of where they live, should have equal and equitable access to a comprehensive library. And so to date, we’ve digitized nearly 1.5 million books on the way towards that four million book goal.

So the way that we lend books to our patrons is through Controlled Digital Lending. So Controlled Digital Lending is a legal practice that makes works accessible that are still in copyright. We started working with Controlled Digital Lending with the Boston Public Library, on a pilot that we called at that time “digitize and lend” those books that were in copyright. Now, nine years later, hundreds of other libraries of all sizes in the US and Canada are also participating in Controlled Digital Lending and they’ve embraced the model.

So here’s the way the Controlled Digital Lending works. We only loan as many copies as we and our library partners own, and those checkouts have time limits and the files are protected by the same digital rights management software that publishers use. It’s not a free-for-all. It’s controlled, that’s the “control” in Controlled Digital Lending. So Controlled Digital Lending helps us make information available, which is incredibly important from my perspective as a librarian. It’s a necessary way to increase equity in our education system, and it’s part of the mission of libraries.

In addition to digitizing, we’re also helping libraries and institutions preserve their collections and to keep them safe and accessible. So let me give you a little example, a story from last year. Marygrove College closed last year, and a central concern for the school’s president, Dr. Elizabeth Burns, and for the Board of Trustees is what do we do with the library? Those 70,000 volumes that are in that library that were in the school that was closing. So after hearing about the Internet Archive’s Controlled Digital Lending program, the college decided to donate the entire library to us for digitization and for preservation, so that the legacy of the college would live on. And so that those books would be available for future scholars.

So in closing my little portion here, I want to leave you with an impact story of why Controlled Digital Lending matters. So we’ve received hundreds of testimonials and published two blog posts that are full of statements from users who have used our lending library while their own libraries and their schools were closed. And one such statement really helped underscore that impact of CDL. And it comes to us from Benjamin Saracco, who is a librarian at a medical center in New Jersey. And Benjamin wrote to us, and to let us know that he was able to find basic life support manuals that were needed by the frontline workers at the medical center where he worked. He needed those and he had to use our library because his physical collection was closed due to COVID-19. It may sound impossible to think, but it’s true. Lives were saved because of Controlled Digital Lending. That is impactful.

Knocking Down the Barriers to Knowledge: Lila Bailey wins IP3 Award

Lila Bailey, Esq.—Policy Counsel for the Internet Archive

This week, Public Knowledge, the public interest policy group, announced the winners of its 17th annual IP3 Awards. IP3 awards honor those who have made significant contributions in the three areas of “IP”—intellectual property, information policy, and internet protocol. On September 24, the 2020 Intellectual Property award will be presented to Lila Bailey, Policy Counsel at the Internet Archive. 

“She has been a tremendous advocate and leader behind the scenes on behalf of libraries and archives, ensuring both can serve the public in the digital era,” said Chris Lewis, President and CEO of Public Knowledge. “Working at the intersection between copyright and information access, Lila has been instrumental in promoting equitable access to contemporary research through Controlled Digital Lending — the library lending practice currently under threat because of a legal challenge from large commercial publishers.” 

“My whole career has been leading up to this moment,” Bailey mused, speaking about her role defending the Internet Archive against the publishers’ copyright lawsuit. “This is what I went to law school to do: to fight for the democratization of knowledge.”

Lila Bailey, center, with lawyers and librarians Michelle Wu, Lisa Weaver, Jim Michalko, Michael Blackwell and Tom Blake in 2019, during visits to Capitol Hill where they helped explain Controlled Digital Lending to key legislators.

As a law school student at Berkeley Law, Bailey was a student attorney at the Samuelson Law, Technology & Public Policy Clinic, where she laid the legal groundwork for the Internet Archive’s Television News Archive.

In private practice at Perkins Cole, Bailey won the Pro Bono Leadership award for her tireless work defending the Internet Archive’s Wayback Machine against a legal challenge. 

Bailey later went on to work for Creative Commons, helping to ensure that everyone everywhere has access to high quality, open educational resources. She served as a fellow at the Electronic Freedom Foundation, and later returned to Berkeley Law as a Teaching Fellow to help train the next generation of public interest technology lawyers. 

“Now that our lives are largely online, copyright law, which is supposed to promote creativity and learning, sometimes creates barriers to these daily activities. The work I am doing is to try to clear some of those barriers away so we can realize that utopian vision of universal access to knowledge.”

–Lila Bailey

Since joining the Internet Archive as Policy Counsel in 2017, Bailey has focused on building a community of practice around Controlled Digital Lending (CDL). Although the library practice has existed for more than a decade, Bailey has been working with Michelle Wu, Kyle K. Courtney, David Hansen, Mary Minow and other legal scholars to help libraries navigate the complex legal framework that allows libraries to bring their traditional lending function online. Today, with hundreds of endorsers, Controlled Digital Lending defines a legal pathway for libraries to digitize the books they already own and lend them online in a secure way. 

“As a copyright lawyer, I find her to be an incredibly inspiring colleague, a natural leader, and great person,” said Harvard Copyright Advisor, Kyle Courtney, who works with Bailey on the CDL Task Force. “I know that her work creates a multiplier effect that can inspire others, like myself, to advocate for greater access to culture and enhance a library’s role in the modern world.”

So what drives this intellectual property warrior forward? “Access to knowledge matters to everyone. It’s the great equalizer. That is what the internet has given us—this vision of everyone having equal ability to learn and also to teach, to read and also to speak,” she explained. “Now that our lives are largely online, copyright law, which is supposed to promote creativity and learning, sometimes creates barriers to these daily activities. The work I am doing is to try to clear some of those barriers away so we can realize that utopian vision of universal access to knowledge.”

On September 24, IP3 Awards will also be presented to Matthew Rantanen (Canadian Cree), Director of Technology for the Southern California Tribal Chairmen’s Association and Director for the Tribal Digital Village Network InitiativeGeoffrey C. Blackwell (Chickasaw, Choctaw, Omaha, Muscogee Creek), Chief Strategy Officer and General Counsel for AMERIND, for their work establishing AMERIND Critical Infrastructure, focused on closing the digital divide in Indian Country. Also being honored is Stop Hate for Profit, a campaign that organized a mass boycott of Facebook advertising.

Previous IP3 Award winners include Bailey’s mentors Professor Pam Samuelson and Internet Archive founder, Brewster Kahle; along with many of her heroes including professors Peter Jaszi, Lateef Mtima, and Rebecca Tushnet. Be sure to attend the award ceremony on September 24, 6-8 PM ET,  by registering here.