“Holy Crow! This is a big deal,” said Brewster Kahle, the Internet Archive’s founder. “And what are we going to do with it? We’re going to invest it in making the Internet Archive more decentralized, so that our digital history is available from thousands of computers, not just a few. The idea is to make a robust and private Internet that has a history that will persist over decades and maybe centuries.”
Filecoin is a decentralized storage system designed to preserve humanity’s most important information. The creators of Filecoin envisioned an independent foundation that would serve as the long-term governance body for the Filecoin ecosystem. In awarding the grant to the Internet Archive, Filecoin Foundation board chair, Marta Belcher, stressed the two organizations’ “common goal of preserving the web and fostering its future.”
It was back in 2015 that Protocol Labs‘ founder, Juan Benet, first visited the Internet Archive, to share his vision for an academic conference dedicated to preserving “humanity’s greatest treasures using decentralized storage.” Building on these conversations, the Internet Archive organized the Decentralized Web Summit in 2016 in San Francisco, the first gathering of its kind. Back then, a decentralized web was mostly a concept, with little working code.
Since 2016, the Internet Archive has worked with several decentralized tech startups to create a decentralized prototype of the digital library. And when the Filecoin main net took off in 2020, stored in Filecoin servers were public domain audiobooks and films from the Internet Archive. Together, the two organizations created the Filecoin Archives, a community-led project to curate, disseminate and preserve important open access to information often at risk of being lost.
“It’s wonderful to see Filecoin come of age. We started six years ago by putting out a call to make a Decentralized Web, a web that would serve us better than the current web–one that is now starting to be dominated by just a few tech behemoths. Can we make a game with many winners?” asked Kahle. “Filecoin has made a huge step forward by deploying decentralized storage at the exabyte level. That’s very different from AWS (Amazon Web Services). It has many participants, not just one player. And its protocols are open-source. We want to see more technologies like this. This was the original vision of the Decentralized Web that the Internet Archive was hoping for five, six years ago. And it’s starting to come to fruition and Filecoin is a leader in that area.”
Although purveyors of cryptocurrencies are often accused of being driven only by short-term gain, in this group Kahle sees a different motivation. “This donation by the Filecoin Foundation is significant financially for the Internet Archive, but I’d say it’s a more interesting one than that,” said the Internet Hall of Fame engineer. “It’s a donation by a new generation of technologists that are building interesting new technologies…bringing the Archive along with it to make it so that history is preserved –that the Internet Archive makes it into this next generation. That is an interesting thing! You don’t often see that. But the Filecoin Foundation, Filecoin and IPFS, and Juan Benet himself have always been interested in preserving history and how history can be woven into the present and the future of these technologies.”
The glass rises and falls. Quickly and efficiently, a woman turns the pages to the rhythmic beep of the cameras. She never misses a beat.
In its first 48 hours, this tweet about book scanning at the Internet Archive went viral, reaching 7.7 million people. More than 1.5 million people viewed the video, liking it 70,000 times and retweeting it 24,000 more. At the center of it all sits Eliza Zhang, a book scanner at the Internet Archive’s headquarters in San Francisco since 2010. When I asked Eliza what she likes about her job, she replied, “Everything! I find everything interesting. I don’t feel it is boring. Every collection is important to me.”
Eliza, a college graduate from southern China, immigrated to the United States in 2009, seeking a new life and new opportunities. She landed in San Francisco during the midst of an economy-crushing recession. But through a city program called JobsNOW, the Internet Archive hired Eliza and scores of other job seekers, training them to digitize, quality control, and upload metadata for books, newspapers, periodicals and manuals. Often our digitizing staff are making these analog texts available online for the first time.
Raising the glass with a foot pedal, adjusting the two cameras, and shooting the page images are just the beginning of Eliza’s work. Some books, like the Bureau of Land Management publication featured in the video, have myriad fold-outs. Eliza must insert a slip of paper to remind her to go back and shoot each fold-out page, while at the same time inputting the page numbers into the item record. The job requires keen concentration.
If this experienced digitizer accidentally skips a page, or if an image is blurry, the publishing software created by our engineers will send her a message to return to the Scribe and scan it again.
Listening to 70s and 80s R & B while she works, Eliza spends a little time each day reading the dozens of books she handles. The most challenging part of her job? “Working with very old, fragile books. The paper is very thin. I always wear rubber fingertips and sometimes gloves when I scan newspapers, because of the ink,” she explained.
Tweets Spark a New Interest in Digitization
Eliza is one of about 70 Scribe operators at the Internet Archive, working in digitization centers embedded in libraries across the United States, United Kingdom, and Canada. The operations are led by Elizabeth MacLeod, who manages our remote operations, and Andrea Mills, who is stationed at the University of Toronto, with support from managers and operators in each center.
“We try to meet libraries where they are,” said MacLeod, who manages remote operations from her home office in North Carolina. “From digitizing a few shipments a year at one of our regional centers to setting up and staffing full-service digitization within the library itself, we have a flexible approach to our library partnerships.”
Across Twitter, another common question arose: “Why hasn’t this job been automated?” To many, the repetitive act of turning the pages in a book and photographing them seems like the natural task for a robot. In fact, some 20 years ago, we tested commercial book scanners that feature a vacuum-powered page-turning arm. It turns out those automated scanners didn’t really work well for brittle books, rare volumes, and other special collections—the kinds of material our library partners ask us to digitize.
“Clean, dry human hands are the best way to turn pages,” said Mills, from her socially-distanced office at the University of Toronto. In her 15 years on the job, she has worked with hundreds of librarians to hone our digitization operations, balancing our need to preserve the original pages with minimal impact during the imaging process. “Our goal is to handle the book once and to care for the original as we work with it,” Mills explained.
So what does it take to be a Scribe operator? “It takes a level of zen,” wrote Brewster Kahle, founder and digital librarian of the Internet Archive, responding to one of the many threads about the video that popped up on Reddit. “It takes concentration and a love of books. For those who love working with books and libraries, it fits well.”
As for the hardware used for digitization, like much at the Internet Archive, the equipment is engineered and purpose-built for the job. In the viral video, Eliza is operating the original Scribe machine, designed more than 15 years ago, and Scribe software that was developed in-house and refined continuously over years of operation. “The variation in books makes [automation] difficult to do quickly and without damage,” Kahle elaborates. “We do not disbind the books, which also makes automation more difficult.”
18,000 Books and Climbing
In the decade Eliza has been working with the Internet Archive, she has scanned more than 3 million pages, 14,000 foldouts, and 18,000 items (mostly books).
And what about all the sudden social media attention? Eliza shrugs. She’s never been on Twitter before. “My goal is to guarantee zero errors,” she said. “I want to give our readers a satisfying experience.”
Digitize With Us
The Covid-19 pandemic has both created higher demand for digital content as well as shuttered some of our scanning centers for health and safety. We have reopened following local and national health guidelines and continue to engage with new libraries on their digitization projects.
As a student at the University of Waterloo, whenever Drini Cami felt stressed, he’d head to the library. Wandering through the stacks, flipping through 600-page volumes about quantum mechanics or the properties of prime numbers never failed to calm him down. And the best thing? “I would always leave the library having discovered something new—usually a variety of new things,” Cami explained. “This is something I haven’t been able to replicate at a digital library like Open Library.” What Drini longed for was the ability to discover new books serendipitously, browsing bookshelves organized by a century of librarians. But unlike most readers, Drini Cami wields a superpower: he is a designer and software developer at the Internet Archive.
Enter the Open Library Explorer, Cami’s new experiment for browsing more than 4 million books in the Internet Archive’s Open Library. Still in beta, Open Library Explorer is able to harness the Dewey Decimal or Library of Congress classification systems to recreate virtually the experience of browsing the bookshelves at a physical library. Open Library Explorer enables readers to scan bookshelves left to right by subject, up and down for subclassifications. Switch a filter and suddenly the bookshelves are full of juvenile books. Type in “subject: biography” and you see nothing but biographies arranged by subject matter.
Why recreate a physical library experience in your browser?
Now that classrooms and libraries are once again shuttered, families are turning online for their educational and entertainment needs. With demand for digital books at an all-time high, the Open Library team was inspired to give readers something closer to what they enjoy in the physical world. Something that puts the power of discovery back into the hands of patrons.
Escaping the Algorithmic Bubble
One problem with online platforms is the way they guide you to new content. For music, movies, or books, Spotify, Netflix and Amazon use complicated recommendation algorithms to suggest what you should encounter next. But those algorithms are driven by the media you have already consumed. They put you into a “filter bubble” where you only see books similar to those you’ve already read. Cami and his team devised the Open Library Explorer as an alternative to recommendation engines. With the Open Library Explorer, you are free to dive deeper and deeper into the stacks. Where you go is driven by you, not by an algorithm..
Cool New Features
By clicking on the Settings gear, you can customize the look and feel of your shelves. Hit the 3D options and you can pick out the 600-page books immediately, just by the thickness of the spine. When a title catches your eye, click on the book to see whether Open Library has an edition you can preview or borrow. For more than 4 million books, borrowing a copy in your browser is just a few clicks away.
Ready to enter the library? Click here, and be sure to share feedback so the Open Library team can make it even better.
Need to know what an Igloo really looks like? How about a Siberian hut? Or the inside of a 15th Century jail? For 50 years in Hollywood, generations of filmmakers would beat a path to the Michelson Cinema Research Library, where renowned film researcher Lillian Michelson could hunt down the answer to just about any question. She was the human card catalogue to a library of more than one million books, photos, periodicals and clippings. But ever since Lillian retired a decade ago, the Michelson Cinema Research Library has been languishing in cold storage, looking for a home. Today it has found one. Lillian Michelson, 92, announced that she is donating her library and life’s work to the Internet Archive. For its part, the nonprofit digital library vows to preserve her collection for the long-term and digitize as much of it as possible, making it accessible to the world.
“I feel as if a fantasy I never, never entertained has been handed to me by the universe, by fate,” mused the legendary film researcher.“The Internet Archive saved my library in the best way possible. I hope millions of people will use it [to research] space, architecture, costumes, towns, cities, administration, foreign countries… the crime business! Westerns! That’s what is amazing to me, that it will be open to everybody.”
Internet Archive founder, Brewster Kahle, explained why his organization was willing to accept the entire Michelson collection and keep it intact: “A library is more than a collection of books. It is the center of a community. For decades, the Michelson Cinema Research Library informed Hollywood—and we want to see that continue. Many organizations wanted pieces of the collection, but I think the importance of keeping it together is so it can continue to help inspire global filmmakers to make accurate and compelling movies.”
With $20,000 borrowed against her husband Harold’s life insurance policy, Lillian Michelson purchased the reference library in 1969. Over the next half-century, the Michelson Cinema Research Library had many homes. From the Samuel Goldwyn Studios it moved to the American Film Institute, then to Paramount Studios, and finally to Zoetrope Studios at the invitation of director, Francis Ford Coppola. Michelson later received an offer via Jeffrey Katzenberg to move the Michelson Cinema Research Library to the newly opened DreamWorks Pictures, where it remained until Lillian’s retirement due to health reasons 19 years later.
The Michelson Cinema Research Library includes some 5,000+ books dating back to the early 1800s; periodicals, 30,000+ photographs, and 3,000+ clipping files. In storage they filled some 1600 boxes on 45 pallets—enough to fill more than two 18-wheel tractor trailers. Its contents have now been moved for long-term preservation to the Internet Archive’s physical archive in Richmond, California.
For six decades, Michelson’s research informed scores of Hollywood films, including The Right Stuff, Rosemary’s Baby, Scarface, Fiddler on the Roof, Full Metal Jacket, The Graduate and The Birds.
Bringing this historic Hollywood design resource back to life—a largely digital life—can make it a global design resource for art directors, designers, filmmakers and researchers in search of information and visual inspiration.
“Lillian Michelson opened my eyes to the importance of a research library to all aspects of motion picture production. At a time when the rich and deep research libraries created and maintained by the motion picture studios were being ‘given away’ or otherwise destroyed, Lillian was a beacon of light guiding us to consider them as treasure.”
—Academy Award-winning director, Francis Ford Coppola
The story of her long and creative union with renowned storyboard artist Harold Michelson was told in Harold and Lillian: A Hollywood Love Story, a 2015 documentary produced and directed by Daniel Raim and currently streaming on Netflix. (To honor this devoted Hollywood couple, the DreamWorks Pictures named the king and queen in Shrek 2 Harold and Lillian.)
Lillian Michelson will preside over a virtual ribbon cutting, panel discussion, and a screening of the documentary on Wednesday, January 27 from 4-6:30 PM Pacific time. There, she will unveil the first phase of her new digital library, available to the world via the Internet Archive’s digital platform, at https://archive.org/details/michelson. Sign up for the screening event here.
The folks at Protocol Labs love their rockets. And outerspace. And exploration.
So when Filecoin, their cryptocurrency-fueled decentralized storage network launched recently, it was no surprise they called it Filecoin Liftoff. In the payload of that Filecoin rocket are treasures from the Internet Archive:
For 15 years, LibriVox has harnessed a global army of volunteers, creating 14,200 free public domain audiobook projects in 100 different languages. Where else can you listen to Jules Verne’s 20,000Leagues Under the Sea in French, Spanish, English, German or Dutch…for free? Now, phrases of Shakespeare, Poe, Joyce and Dante will be stored across the Filecoin mainnet, broken into packets to be reconstituted when needed—perhaps in a new century.
The same destiny awaits the home movies, stock footage, educational and amateur films in the public domain, lovingly curated by the Prelinger Archives founder, Rick Prelinger. He encourages creatives to download and reuse these videos, creating countless new works like this one by musician Jordan Paul:
Now filmmakers and connoisseurs can sleep easier, knowing that a new, distributed copy of those films lives in the Filecoin network, (along with the main copy and multiple backups in the Internet Archive’s repositories.)
So what’s next Filecoin explorers?
Today, Protocol Labs and the Internet Archive are happy to announce the Filecoin Archives, a new community project to curate, disseminate and preserve important open access information often at risk of being lost. You can get involved in so many ways: by nominating information to be stored, uploading it to the Internet Archive, preserving the data as a Filecoin node while earning Filecoin for sharing your storage capacity.
What information should we be preserving?Please tell us!
How about 166,000 public domain books (60 terabytes) from the Library of Congress? Including 2100 texts about Abraham Lincoln and slavery?
It takes a host of global voices with diverse viewpoints to ensure that humanity’s most precious knowledge is represented online and preserved. So we need to hear from you. What open access information or datasets are you interested in preserving?
Between now and November 5, please send us your ideas and vote on the others. We will gather your suggestions, add our own, and publish the list from which we will select information to preserve across a global network of Filecoin nodes.
How to send us your suggestions
Look for the tweet from @JuanBenet– reply to it with:
The Name of the Dataset.
The size in GB or TB.
An HTTP or @IPFS link to the data.
Why it matters.
Bonus points if the data is already stored in the Internet Archive or if you upload it there. Vote for ideas by retweeting them and please help us spread the word!
In 2015, a young developer named Juan Benet wandered into the Internet Archive headquarters. He painted a picture of a decentralized stack, something he now calls Web3, where the storage, transport and other layers would be distributed across many machines. Together with the DWeb community, we have imagined a web with our values written into the code: values such as privacy, security, reliability, and control over one’s own identity. With the launch of Filecoin’s mainnet, a piece of that new web is perhaps within reach.
Now it’s up to us to make sure the payload includes humanity’s most important knowledge.
Subprime Attention Crisis makes the case that the core advertising model driving Google, Facebook, and many of the most powerful companies on the internet is—at its heart—a multibillion dollar financial bubble. Drawing parallels to the 2008 subprime mortgage crisis, Tim shines a spotlight on the lack of transparency, flawed incentives, and outright fraud that keep this machine running.
On October 14, the Internet Archive hosted a talk with the author and New York Times technology reporter Kashmir Hill. Their discussion tackled:
Why data-driven, online advertising may be much, much less effective than it looks
The long-term impact of the COVID-19 recession on the media and online ads
Whether or not the giants of Big Tech are already “too big to fail”
This discussion focused not only on the problems of advertising, but also on the future, and how we might be able to transition to a better, more financially robust internet. Joining the discussion was Desigan Chinniah, who co-leads Grant for the Web—a $100 million fund launched by Coil, Mozilla, and Creative Commons to spur open standards and new economic models for the web beyond advertising.
NOTE: We urge you to purchase a copy of Tim’s new book, Subprime Attention Crisis, via our local bookseller, The Booksmith. The first 50 purchasers will receive an autographed copy.
The following was a guest post by Brewster Kahle in Against The Grain (ATG). See the original article from September 28, 2020 on the ATG website here.
Back in 2006, I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a framed blown-up letter “S.” This was an inside joke about the Internet Archive being named in the singular, Archive, rather than the plural Archives. Of course, he was right, as I should have known all along. The Internet Archive had long since grown out of being an “archive of the Internet”—a singular collection, say of web pages—to being “archives on the Internet,” plural. My evolving understanding of these different names might help focus a discussion that has become blurry in our digital times: the difference between the roles of publishers, bookstores, libraries, archives, and museums. These organizations and institutions have evolved with different success criteria, not just because of the shifting physical manifestation of knowledge over time, but because of the different roles each group plays in a functioning society. For the moment, let’s take the concepts of Library and Archive.
The traditional definition of a library is that it is made up of published materials, while an archive is made up of unpublished materials. Archives play an important function that must be maintained—we give frightfully little attention to collections of unpublished works in the digital age. Think of all the drafts of books that have disappeared once we started to write with word processors and kept the files on fragile computer floppies and disks. Think of all the videotapes of lectures that are thrown out or were never recorded in the first place.
Bookstores: The Thrill of the Hunt
Let’s try another approach to understanding distinctions between bookstores, libraries and archives. When I was in my 20’s living in Boston—before Amazon.com and before the World Wide Web (but during the early Internet)—new and used bookstores were everywhere. I thought of them as catering to the specialized interests of their customers: small, selective, and only offering books that might sell and be taken away, with enough profit margin to keep the store in business. I loved them. I especially liked the used bookstore owners—they could peer into my soul (and into my wallet!) to find the right book for me. The most enjoyable aspect of the bookstore was the hunt—I arrived with a tiny sheet of paper in my wallet with a list of the books I wanted, would bring it out and ask the used bookstore owners if I might go home with a bargain. I rarely had the money to buy new books for myself, but I would give new books as gifts. While I knew it was okay to stay for awhile in the bookstore just reading, I always knew the game.
Libraries: Offering Conversations not Answers
The libraries that I used in Boston—MIT Libraries, Harvard Libraries, the Boston Public Library—were very different. I knew of the private Boston Athenæum but I was not a member, so I could not enter. Libraries for me seemed infinite, but still tailored to individual interests. They had what was needed for you to explore and if they did not have it, the reference librarian would proudly proclaim: “We can get it for you!” I loved interlibrary loans—not so much in practice, because it was slow, but because they gave you a glimpse of a network of institutions sharing what they treasured with anyone curious enough to want to know more. It was a dream straight out of Borges’ imagination (if you have not read Borges’short stories, they are not to be missed, and they are short. I recommend you write them on the little slip of paper you keep in your wallet.) I couldn’t afford to own many of the books I wanted, so it turned off that acquisitive impulse in me. But the libraries allowed me to read anything, old and new. I found I consumed library books very differently. I rarely even brought a book from the shelf to a table; I would stand, browse, read, learn and search in the aisles. Dipping in here and there. The card catalog got me to the right section and from there I learned as I explored.
Libraries were there to spark my own ideas. The library did not set out to tell a story as a museum would. It was for me to find stories, to create connections, have my own ideas by putting things together. I would come to the library with a question and end up with ideas. Rarely were these facts or statistics—but rather new points of view. Old books, historical newspapers, even the collection of reference books all illustrated points of view that were important to the times and subject matter. I was able to learn from others who may have been far away or long deceased. Libraries presented me with a conversation, not an answer. Good libraries cause conversations in your head with many writers. These writers, those librarians, challenged me to be different, to be better.
Staying for hours in a library was not an annoyance for the librarians—it was the point. Yes, you could check books out of the library, and I would, but mostly I did my work in the library—a few pages here, a few pages there—a stack of books in a carrel with index cards tucked into them and with lots of handwritten notes (uh, no laptops yet).
But libraries were still specialized. To learn about draft resisters during the Vietnam War, I needed access to a law library. MIT did not have a law collection and this was before Lexis/Nexis and Westlaw. I needed to get to the volumes of case law of the United States. Harvard, up the road, had one of the great law libraries, but as an MIT student, I could not get in. My MIT professor lent me his ID that fortunately did not include a photo, so I could sneak in with that. I spent hours in the basement of Harvard’s Law Library reading about the cases of conscientious objectors and others.
But why was this library of law books not available to everyone? It stung me. It did not seem right.
A few years later I would apply to library school at Simmons College to figure out how to build a digital library system that would be closer to the carved words over the Boston Public Library’s door in Copley Square: “Free to All.”
Archives: A Wonderful Place for Singular Obsessions
When I quizzed the archivist at MIT, she explained what she did and how the MIT Archives worked. I loved the idea, but did not spend any time there—it was not organized for the busy undergraduate. The MIT Library was organized for easy access; the MIT Archives included complete collections of papers, notes, ephemera from others, often professors. It struck me that the archives were collections of collections. Each collection faithfully preserved and annotated. I think of them as having advertisements on them, beckoning the researcher who wants to dive into the materials in the archive and the mindset of the collector.
So in this formulation, an archive is a collection, archives are collections of collections. Archivists are presented with collections, usually donations, but sometimes there is some money involved to preserve and catalog another’s life work. Personally, I appreciate almost any evidence of obsession—it can drive toward singular accomplishments. Archives often reveal such singular obsessions. But not all collections are archived, as it is an expensive process.
The cost of archiving collections is changing, especially with digital materials, as is cataloging and searching those collections. But it is still expensive. When the Internet Archive takes on a physical collection, say of records, or old repair manuals, or materials from an art group, we have to weigh the costs and the potential benefits to researchers in the future.
Archives take the long view. One hundred years from now is not an endpoint, it may be the first time a collection really comes back to light.
Could we be smarter by having people, the library, networks, and computers all work together? That is the dream I signed on to.
I dreamed of starting with a collection—an Archive, an Internet Archive. This grew to be a collection of collections: Archives. Then a critical mass of knowledge complete enough to inform citizens worldwide: a Digital Library. A library accessible by anyone connected to the Internet, “Free to All.”
ABOUT THE AUTHOR
Brewster Kahle, Founder & Digital Librarian, Internet Archive
A passionate advocate for public Internet access and a successful entrepreneur, Brewster Kahle has spent his career intent on a singular focus: providing Universal Access to All Knowledge. He is the founder and Digital Librarian of the Internet Archive, one of the largest digital libraries in the world, which serves more than a million patrons each day. Creator of the Wayback Machine and lending millions of digitized books, the Internet Archive works with more than 800 library and university partners to create a free digital library, accessible to all.
Soon after graduating from the Massachusetts Institute of Technology where he studied artificial intelligence, Kahle helped found the company Thinking Machines, a parallel supercomputer maker. He is an Internet pioneer, creating the Internet’s first publishing system called Wide Area Information Server (WAIS). In 1996, Kahle co-founded Alexa Internet, with technology that helps catalog the Web, selling it to Amazon.com in 1999. Elected to the Internet Hall of Fame, Kahle is also a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, and holds honorary library doctorates from Simmons College and University of Alberta.
This week, a federal judge issued this scheduling order, laying out the road map that may lead to a jury trial in the copyright lawsuit brought by four of the world’s largest publishers against the Internet Archive. Judge John G. Koeltl has ordered all parties to be ready for trial by November 12, 2021. He set a deadline of December 1, 2020, to notify the court if the parties are willing to enter settlement talks with a magistrate judge.
Attorneys for the Internet Archive have met with representatives for the publishers, but were unable to reach an agreement. “We had hoped to settle this needless lawsuit,” said Brewster Kahle, Internet Archive’s founder and Digital Librarian. “Right now the publishers are diverting attention and resources from where they should be focused: on helping students during this pandemic.”
The scheduling order lays out this timeline:
Discovery must be completed by September 20, 2021;
Dispositive motions must be submitted by October 8, 2021;
Pretrial orders/motions must be submitted by October 29, 2021;
Parties must be ready for trial on 48 hours notice by November 12, 2021.
In June, Hachette Book Group, Inc., HarperCollins Publishers LLC, John Wiley & Sons, Inc., and Penguin Random House LLC—with coordination by the Association of American Publishers—filed a lawsuit to stop the Internet Archive from digitizing and lending books to the public, demanding that the non-profit library destroy 1.5 million digital books.
Publishers Weekly Senior Writer Andrew Albanese has been covering the story from the beginning. In a July 31st Beyond the Book podcast for the Copyright Clearance Center, Albanese shared his candid opinions about the lawsuit. “If this was to be a blow out, open-and-shut case for the publishers, what do the publishers and authors get?” Albanese asked. “I’d say nothing.”
“Honestly, a win in court on this issue will not mean more sales for books for publishers. Nor will it protect any authors or publisher from the vagaries of the Internet,” the Publishers Weekly journalist continued. “Here we are in the streaming age, 13 years after the ebook market took off, and we’re having a copyright battle, a court battle over crappy PDFs of mostly out-of-print books? I just don’t think it’s a good look for the industry.”
In order to make the vast majority of 20th Century books accessible to digital learners, libraries such as the Internet Archive have been digitizing the physical books they own and lending them on a 1-to-1 “own to loan” basis—a legal framework called Controlled Digital Lending. Publishers refuse to sell ebooks to libraries, insisting on temporary licenses on restrictive terms. This business practice “threatens the purpose, values, and mission of libraries and archives in the United States,” explains Kyle K. Courtney, copyright advisor to Harvard University Libraries. “It undermines the ability of the public (taxpayers!) to access the materials purchased with their money for their use in public libraries and state institutions, and further, it is short sighted, and not in the best interest of library patrons or the public at large.”
“Libraries have always had the right to buy and lend books. It’s at the core of a library’s mission,” said Kahle. “The Internet Archive would like to purchase ebooks, but the publishers won’t sell them to us, or to any library. Instead they are suing us to stop all learners from accessing the millions of digitized books in our library.”
On July 22, 2020, Kyle K. Courtney, Copyright Advisor at Harvard University, spoke at a press conference about the copyright lawsuit against the Internet Archive brought by the publishers Hachette, HarperCollins, Wiley, and Penguin Random House.He holds a J.D. with distinction in Intellectual Property Law and a Master of Science in Library and Information Science (MSLIS) degree. Courtney is a published author and nationally recognized speaker on the topics of copyright, technology, libraries, and the law.These are his remarks:
Part of my work in scholarship is about the roles of copyright and the library landscape. I wrote the white paper on Controlled Digital Lending of library books with my coauthor David Hansen at Duke University Libraries. And it presents the legal rationale supporting the overview document called the Position Statement on Controlled Digital Lending, which has been endorsed by many national library organizations, regional library consortia, specific library systems, themselves, individual librarians, and legal experts. Ultimately, though, this is about how libraries can do what they’ve always done, right? Lend books. The paper looks at the underpinnings of the library’s historical mission through the lens of both fair use and first sale, true critical rights that I think any library uses in their programs, right? Both for lending and preservation. And I discuss how libraries can legally lend digital copies of their print collections using this technology.
But I’d like to point out that a CDL system is not a brand new concept, like Corynne stated: libraries loan books to the public. It’s what they do, for centuries. And libraries do not need permission or a license to loan those books that they have purchased or acquired. Copyright law covers those exact issues. But the difference here, I think, and some of the conflict is that the vendors and publishers have to ask permission, right? They must license. This is their business model. Historically, libraries are special creatures of copyright law; libraries have a legally authorized mandate, by the way, granted by Congress, to complete their mission to provide both access to materials. Congress actually placed all of these specialized copyright exemptions for libraries in the Copyright Act itself. So that’s kind of fun to look at library’s unique role in copyright law, they sit right in the middle, both housing the economic purpose of copyright: “we buy the books, we buy lots of books,” and the access purpose of copyright, which is, “we loan the books out to our users.”
Or if you want to put that in the constitutional narrative: libraries are promoting the progress of science and the useful arts. Libraries have historically provided unfettered access and freedom to the books that they purchase for their communities. Now, because of that, there’s multiple versions of CDL-like systems that are currently used in libraries. But I think the origin of the real legal underpinning concept was first explored by Professor Michelle Wu at Georgetown University School of Law in an article that she wrote that I read many times, “Building a Collaborative Digital Collection.” Later, the Internet Archive formed up the Open Library Program, which Chris talked about, which was nine years ago. And other institutions are exploring this option right in their own individual libraries or part of consortia or within affinity groups.
It’s exciting to see, but at its core, Controlled Digital Lending is about replicating, through the Controlled Digital Lending Process, the legal and economically significant aspects of physical lending. And in other words, let’s put this simply: it continues to preserve the powers in the print. A library has these significant legal usage rights and they have great fiscal value in their collections. Some public library systems have spent millions upon millions of dollars to make their collections accessible to the community. And I believe the CDL structure preserves that value by enhancing access of these works to the public through technology. And as Chris pointed out, it’s the same technology that’s used by publishers to distribute in the commercial marketplace.
Again, this may be about the fear of technology, certainly, but technology should be used to enhance access to materials and do what libraries have always done: increasing access to knowledge by loaning the materials to the public. Just because we’re using technology does not mean that suddenly these acts are new. And in fact, libraries have special authority to provide both access longterm to information and preserve these materials for much longer than the business model of any particular corporation, company, or vendor.
And this is especially true with the 20th Century works that are in libraries, right? They have not been available in the digital world across the board. They call this “The 20th Century Black Hole.” Many 20th Century books are not available for purchase as new copies or in print or digital versions online. And I don’t know if your students are like mine or anyone else or patrons: if it’s not digital, it’s almost like it doesn’t exist. Libraries would like to provide digital access, but we can’t, because these are not available in a licensed format or in a digital format that’s available to loan, but we have them on our shelves.
So as many of our student patrons say, “We want access to these works,” and these could be long lost print works, by the way, that are really not lost; they’re on the shelves of our libraries, just trapped there, and in COVID maybe trapped there for a longer time than anticipated. So imagine the potentially enormous high social and scholarly value and relatively low risk if we make these works available to the public for reading, quoting, citing, adaption, using Wikipedia articles. So that’s kind of the exciting aspect of it. I’m not going to get into great detail, but our principle argument in this paper, which summarizes all these points, is that Controlled Digital Lending is a fair use, which is an equitable rule of reason, that permits libraries to do what they’ve always done. And under the First Sale Doctrine, loan those books to users. Thanks a lot for your time.
On July 22, 2020, Chris Freeland, Director of Open Libraries at the Internet Archive, spoke at a press conference about the copyright lawsuit brought by the publishers Hachette, HarperCollins, Wiley, and Penguin Random House against our non-profit digital library.These are his remarks:
I’m Chris Freeland, I’m a librarian at the Internet Archive and I’m the Director of the Open Libraries Program at the Internet Archive. I’ve been at the Internet Archive for more than two and a half years. Before joining the Archive, I was an associate university librarian at Washington University in St. Louis, and then before that I was the Technical Director of a project called the Biodiversity Heritage Library. And so for more than 15 years, I’ve worked in partnership with the Internet Archive to digitize books and make them as widely available as possible through technology and through copyright.
In that same amount of time, that’s when the Internet Archive was partnering with those one thousand libraries that Brewster just mentioned to digitize nearly four million books. So most of those books, when we were partnering with libraries, most of those books were in the public domain, and that means that those were easily published online. They didn’t need restrictions for use. They didn’t need any kind of controls. But at the Internet Archive, we think that everyone deserves to learn. So our goal is to build a research library with more than four million modern books that we can make available to users all over the world.
Now you may be asking why four million? Four million books is the size of a large metropolitan public library. It’s about the same size as a Chicago Public Library or a San Francisco Public Library. And we think that everyone, regardless of where they live, should have equal and equitable access to a comprehensive library. And so to date, we’ve digitized nearly 1.5 million books on the way towards that four million book goal.
So the way that we lend books to our patrons is through Controlled Digital Lending. So Controlled Digital Lending is a legal practice that makes works accessible that are still in copyright. We started working with Controlled Digital Lending with the Boston Public Library, on a pilot that we called at that time “digitize and lend” those books that were in copyright. Now, nine years later, hundreds of other libraries of all sizes in the US and Canada are also participating in Controlled Digital Lending and they’ve embraced the model.
So here’s the way the Controlled Digital Lending works. We only loan as many copies as we and our library partners own, and those checkouts have time limits and the files are protected by the same digital rights management software that publishers use. It’s not a free-for-all. It’s controlled, that’s the “control” in Controlled Digital Lending. So Controlled Digital Lending helps us make information available, which is incredibly important from my perspective as a librarian. It’s a necessary way to increase equity in our education system, and it’s part of the mission of libraries.
In addition to digitizing, we’re also helping libraries and institutions preserve their collections and to keep them safe and accessible. So let me give you a little example, a story from last year. Marygrove College closed last year, and a central concern for the school’s president, Dr. Elizabeth Burns, and for the Board of Trustees is what do we do with the library? Those 70,000 volumes that are in that library that were in the school that was closing. So after hearing about the Internet Archive’s Controlled Digital Lending program, the college decided to donate the entire library to us for digitization and for preservation, so that the legacy of the college would live on. And so that those books would be available for future scholars.
So in closing my little portion here, I want to leave you with an impact story of why Controlled Digital Lending matters. So we’ve received hundreds of testimonials and published two blog posts that are full of statements from users who have used our lending library while their own libraries and their schools were closed. And one such statement really helped underscore that impact of CDL. And it comes to us from Benjamin Saracco, who is a librarian at a medical center in New Jersey. And Benjamin wrote to us, and to let us know that he was able to find basic life support manuals that were needed by the frontline workers at the medical center where he worked. He needed those and he had to use our library because his physical collection was closed due to COVID-19. It may sound impossible to think, but it’s true. Lives were saved because of Controlled Digital Lending. That is impactful.