Author Archives: Peter M. Routhier

Internet Archive weighs in on Artificial Intelligence at the Copyright Office

All too often, the formulation of copyright policy in the United States has been dominated by incumbent copyright industries. As Professor Jessica Litman explained in a recent Internet Archive book talk, copyright laws in the 20th century were largely “worked out by the industries that were the beneficiaries of copyright” to favor their economic interests. In these circumstances, Professor Litman has written, the Copyright Office “plays a crucial role in managing the multilateral negotiations and interpreting their results to Congress.” And at various times in history, the Office has had the opportunity to use this role to add balance to the policymaking process.

We at the Internet Archive are always pleased to see the Copyright Office invite a broad range of voices to discussions of copyright policy and to participate in such discussions ourselves. We did just that earlier this month, participating in a session at the United States Copyright Office on Copyright and Artificial Intelligence. This was the first in a series of sessions the Office will be hosting throughout the first half of 2023, as it works through its “initiative to examine the copyright law and policy issues raised by artificial intelligence (AI) technology.”

As we explained at the event, innovative machine learning and artificial intelligence technology is already helping us build our library. For example, our process for digitizing texts–including never-before-digitized government documents–has been significantly improved by the introduction of LSTM technology. And state-of-the-art AI tools have helped us improve our collection of 100 year-old 78 rpm records. Policymakers dazzled by the latest developments in consumer-facing AI should not forget that there are other uses of this general purpose technology–many of them outside the commercial context of traditional copyright industries–which nevertheless serve the purpose of copyright: “to increase and not to impede the harvest of knowledge.” 

Traditional copyright policymaking also frequently excludes or overlooks the world of open licensing. But in this new space, many of the tools come from the open source community, and much of the data comes from openly-licensed sources like Wikipedia or Flickr Commons. Industry groups that claim to represent the voice of authors typically do not represent such creators, and their proposed solutions–usually, demands that payment be made to corporate publishers or to collective rights management organizations–often don’t benefit, and are inconsistent with, the thinking of the open world

Moreover, even aside from openly licensed material, there are vast troves of technically copyrighted but not actively rights-managed content on the open web; these are also used to train AI models. Millions, if not billions, of individuals have contributed to these data sources, and because none of them are required to register their work for copyright to arise, it does not seem possible or sensible to try to identify all of the relevant copyright owners–let alone negotiate with each of them–before development can continue. Recognizing these and a variety of other concerns, the European Union has already codified copyright exceptions which permit the use of copyright-protected material as training data for generative AI models, subject to an opt-out in commercial situations and potential new transparency obligations

To be sure, there are legitimate concerns over how generative AI could impact creative workers and cause other kinds of harm. But it is important for copyright policymakers to recognize that artificial intelligence technology has the potential to promote the progress of science and the useful arts on a tremendous scale. It is both sensible and lawful as a matter of US copyright law to let the robots read. Let’s make sure that the process described by Professor Litman does not get in the way of building AI tools that work for everyone.

Celebrating Library Fair Use

It’s fair use week in the United States, and here at the Internet Archive, we join all those in the library community and beyond who celebrate the role fair use plays in enabling access to knowledge. 

Fair use is a tremendously important part of US copyright law. From effectuating First Amendment rights to fostering innovation, the Supreme Court has described its “basic purpose” as “providing a context-based check that can help to keep a copyright monopoly within its lawful bounds.” And over the years, fair use has rightly evolved to ensure it can continue to play its role in changing times. 

But fair use is not without its challenges. Professor Larry Lessig once famously quipped that “fair use in America simply means the right to hire a lawyer.” And given the associated expense, this can have an asymmetric effect on fair use. The book publishing industry, for example, is dominated by multi-billion-dollar firms; their economic power is so concentrated that a federal court recently enjoined a further attempt at consolidation. Meanwhile, although libraries collectively represent a substantial portion of book purchasers, their economic power is dispersed among many thousands of public, research, academic, and other institutions. Thus, while some may be willing to “roll the dice” on fair use, the costs and risks lead many to underuse this important user’s right.

As the defendant in a years-long fair use case of our own—recently scheduled for oral argument on March 20th—we are all too familiar with this aspect of the law. As our case demonstrates, the economic challenges of fair use are not only about legal fees; economics are embedded in the doctrine. In some ways, this is a good thing. For example, as we explained in our brief, fair use has always been concerned with protecting non-profit and educational uses of copyrighted works. When considering whether a particular use is fair, the first question is ordinarily whether it’s “noncommercial.” With respect to our own book collections, this is straightforward: the books are lawfully acquired, digitized at our own expense, and lent to one reader at a time—without any cost to them—for personal, research, or scholarly use. And while noncommerciality does, of course, have to be balanced against certain economic interests of the publishers as part of the fair use analysis, that’s precisely what the owned-to-loan ratio and other strictures of controlled digital lending work to do. 

For all its challenges, fair use continues to provide important rights and safeguards to libraries. Among other things, it allows libraries to utilize new technologies and respond to new challenges without waiting for the legislature to pass new laws. In fact, this is exactly what the legislature intended it to do. Fair use means libraries can develop innovative services like our work to support Wikipedia citations, respond to new challenges like the COVID lockdowns, and otherwise continue to serve patrons as the world evolves. And it means libraries, like the Internet Archive and many others, can lend their books to one reader at a time, as they have always done.

Internet Archive Joins Library Groups at the Supreme Court

On February 23, 2023, the United States Supreme Court will hear oral arguments in Gonzalez v. Google. The case is, in a narrow sense, about whether certain algorithmic recommendations of a very large online platform can give rise to civil liability. But the Court’s ruling could fundamentally “reshape the internet”, redefining the circumstances in which a wide variety of websites and online services–including libraries–could be liable for the actions of their users. Internet Archive was proud to join the American Library Association, the Association of Research Libraries, the Freedom to Read Foundation, and the Electronic Frontier Foundation, in a “friend of the Court” brief urging robust Section 230 protections for libraries and others.

As the Association of Research Libraries has previously noted, “libraries are included in” the protections of Section 230, and “[a]ny changes to the liability protections of 230 may endanger the ability of libraries to fulfill their public service missions.” Following from this, our brief highlights a number of important library projects and services “designed to share and build knowledge” which are currently protected by Section 230; these could be threatened by sweeping changes to the law. In fact, providing a space for the maintenance and development of these kinds of projects is exactly what the framers of Section 230 set out to do: it was enacted “to promote the continued development of the Internet” and so that it could continue to provide “a forum for a true diversity of political discourse, unique opportunities for cultural development, and myriad avenues for intellectual activity.”

As the brief explains, substantial changes to Section 230 could frustrate these purposes by making it harder for libraries to use the internet to broaden and deepen the public’s access to knowledge (among other things). And while it is impossible to know exactly how Section 230 might be changed by the Court, and how those changes could impact the behavior of libraries and others who rely on Section 230 today, the brief highlights a number of concerning scenarios that we hope the Court will consider. These kinds of concerns have been raised by many, including Professor Eric Goldman, who has explained how changes to the law occasioned by this case could make it too costly or burdensome for many online services to operate the way they do today; this could result in an internet dominated by “a small number of voices” promoted by the largest corporations and hidden behind paywalls. 

At the Internet Archive, despite the challenges, we continue to believe in the power of the internet to democratize and expand access to knowledge. As EFF said when it filed the brief, “[a]s the internet has grown, its problems have grown, too,” but we can “address those problems without weakening a law” that has provided meaningful protection to everyone, including libraries. As courts and legislatures consider changes to this existing legal structure, we hope they keep in mind the public’s interest, so we can work towards an internet that preserves public interest spaces and is shaped by public interest values

The CDL Lawsuit and the Future of Libraries

It’s been over two years since a group of large book publishers sued the Internet Archive over our lending programs. After an expensive and lengthy discovery phase, arguments have now been fully briefed in the district court. What might we learn from the proceedings so far about how publishers see the future of libraries?

The first thing we might learn is that the publishers want controlled digital lending declared illegal. At the time the lawsuit against us was filed, much of the commentary and analysis suggested that the case was really about the National Emergency Library–our emergency pandemic lending program. But while the NEL is certainly a part of the lawsuit, it did not take center stage in the briefing. In the publisher’s request for summary judgment, for example, only a few short paragraphs–out of about forty pages of argument–were devoted to the NEL. Of all the submissions, about 99% have concerned CDL. So it seems clear that the publishers view this lawsuit as a referendum on CDL, which they claim will cause “catastrophic harm” to the publishing industry.

A second thing this lawsuit has demonstrated is that publishers will continue to sue libraries over digital practices that were long considered fair uses in the physical world–even if they are done on a non-profit basis with no measurable economic harm. In the case against us, the publishers argue that digital lending harms markets they claim to own–and that it therefore is not a fair use under copyright law–under “the common sense economic principle that users are drawn to free goods as a substitute for paid goods.” Put another way, in the digital realm, every non-fee-paying library practice harms the publishers’ economic interests as a matter of principle–regardless of libraries’ historic practices and their previously-accepted roles, let alone what tangible economic evidence shows. In the digital world, where publishers have newfound abilities to surveil and control libraries and their patrons, the publishers argue that the economic opportunities these abilities open to them trump longstanding library practices and the public interest. Thus, they sued over digital course reserves, and are now suing over digital lending, notwithstanding a “thriving” and profitable industry. What library practice will they challenge next?

For many of us, the internet promised a world where libraries and their patrons would have more and better access to high quality information. For these publishers, it’s simply an opportunity to charge more while providing less. In the CDL lawsuit, they have admitted that of the millions of books we have digitized, they themselves have only made about 33,000 available to libraries; only about 1% of what we have done, and only under restrictive and expensive license agreements. This is, they claim, the essence of their copyright rights: the ability to restrict access to information as they see fit, to further their theoretical economic interests, without regard to libraries traditional functions and the greater public good. 

The good news is that many in the library community and beyond–including authors, small publishers, and patrons themselves–are seeing with clear eyes what is truly at stake. And they are seeing that, unfortunately, libraries and their supporters cannot just sit idly by–they will have to fight back. Indeed, that work has long since begun. In an extraordinary show of support–and recognition of what’s at stake–groups of librarians, scholars, and many others submitted friend of the court briefs in the publishers’ lawsuit against us. In these briefs, they demonstrated (among other things) the importance of libraries in the digital world. As the brief of Kenneth Crews, Kevin Smith, and the Harvard Law School Cyberlaw Clinic explained:

To remain relevant and to continue to democratize information access, libraries must meet patrons where they are; in the present day, that means the Internet. Libraries have nurtured our democracy from its inception and have changed alongside our society–evolving from private subscription models serving only the elite to free institutions that enrich citizens without regard to race, creed, gender, or socioeconomic status. As a cornerstone of democracies, libraries will always be the site of cultural struggle and ‘a crucible for a society that is constantly  moving toward a more perfect union.’”

What Does the Blockbuster Antitrust Trial Against Penguin Random House Mean for the Future of Libraries?

The publishing industry is large and powerfulby some accounts, it generates nearly $100 billion in revenue worldwide. The United States Department of Justice has accused big publishers of abusing that power in the past, by conspiring with each other to raise the price of e-books. More recently, Penguin Random House has been in the legal crosshairs for an alleged abuse of power, as the Justice Department sues to stop its proposed (and allegedly anticompetitive) acquisition of Simon & Schuster. 

What would more concentrated power in the publishing industry mean for libraries? In recent years, publishers have blamed libraries for all manner of illsclaiming that they unfairly cannibalize sales, among other thingsto justify the imposition of increasingly expensive licensing models. But as testimony in the Justice Department lawsuit has confirmed, the publishing industry isn’t the least bit ill: it’s “thriving,” with years of double-digit growth. And although the economics of the publishing industry was examined at trial in excruciating detail, the supposed threat of library lending was nowhere to be found; libraries weren’t mentioned at all.

What of the authors? The publishing industry often claims that its actions are necessary for the good of authors, but this case does not support such a claim. The Authors Guild has publicly opposed the merger, expressing its own concern about the extraordinary concentration of power in the publishing industry and how it could harm emerging and mid-list authors. Meanwhile, at trial, we learned that the vast majority of all published books are of this sort, selling very few copies. Of course, libraries are one of the few markets for such titles: buying them, preserving them, and ensuring they remain publicly available after their commercial life is over. Unfortunately, as the trial made abundantly clearfeaturing, as it did, the CEO of Penguin Random House bragging about cutting author compensation for e-bookssuch matters are not high on the publisher’s priority list.

So what does this portend for the future of libraries? While the outcome of the trial remains unclear, the Association of American Publisher’s view of libraries could not be clearer: “Libraries are an important part of the copyright ecosystem as authorized distributors,” they recently said. That is the world the AAP hopes for: one where our public interest institutions, and our library professionals, are little more than “authorized distributors” of whatever is most profitable for the publishers. It should be no surprise, then, that libraries remain deeply concerned that the future envisioned by these publishers is in nobody’s interest but their own.

Internet Archive Opposes Publishers in Federal Lawsuit

On Friday, September 2, we filed a brief in opposition to the four publishers that sued Internet Archive in June 2020: Hachette Book Group, Harper Collins Publishers, John Wiley & Sons, and Penguin Random House. This is the second of three briefs from us that will help the Court decide the case.

Read: Hachette v. Internet Archive – Internet Archive’s Opposition to Motion for Summary Judgment

As many of you know, these four publishers sued the Internet Archive to try to shut down our digital lending program. The lawsuit has been ongoing for over two years now. In addition to the papers that have gone in so far, there will be one more opportunity, later this fall, for the parties to file arguments with the court. These will be the “reply” briefs. At that point, the filing of papers tends to cease. The Court will then decide whether or not it wants to hear from the parties in person–through “oral argument.” After that, the Court will make a decision on this set of briefs. That could resolve the case in its entirety, or it could lead to a trial and/or appeal. In the end, the lawsuit could take some years to resolve.

Our opposition brief responds to the arguments raised in the publisher’s motion for summary judgment. There, some of the world’s largest and most-profitable publishers complained that sometimes “Americans who read an ebook use free library copies, rather than purchasing a commercial ebook.” They believe that copyright law gives them the right to control how libraries lend the books they own, and demand that libraries implement the restrictive terms and conditions that publishers prefer.

Our opposition brief explains that “[p]ublishers do not have a right to limit libraries only to inefficient lending methods, in hopes that those inefficiencies will lead frustrated library patrons to buy their own copies.” The record in this case shows that publishers have suffered no economic harm as a result of our controlled digital lending–indeed, publishers have earned record profits in recent years. “[D]igital lending of physical books costs rightsholders no more or less than, for example, lending books via a bookmobile or interlibrary loan. In each case, the books the library lends are bought and paid for, ensuring that rightsholders receive all of the financial benefits to which they are entitled.”

The future of library lending is at stake in this lawsuit. We will keep fighting to prove that copyright does not stand in the way of a library’s right to do what libraries have always done: lend the books it owns to one patron at a time.

Canada is Leading the Way on User-Centered Copyright Policy

In an important new copyright decision, the Supreme Court of Canada reaffirmed its commitment to the principles of users rights and technological neutrality–principles which have made Canada a world leader in balanced copyright and support for controlled digital lending (CDL) by libraries.  

For many years now, the Supreme Court of Canada has emphasized the importance of these two principles in striking the proper copyright balance. With respect to user’s rights, the Supreme Court has held that exceptions and limitations to copyright are not mere loopholes–they are affirmative user’s rights. This means that copyright is not about maximizing the economic interests of publishers or anyone else, but instead about advancing the public good by seeking “the proper balance between the rights of a copyright owner and users’ interests.” With respect to technological neutrality, the Supreme Court has held that the Copyright Act must be interpreted in view of the principle of technological neutrality, according to which “[w]hat matters is what the user receives, not how the user receives it.” This means that, in general, the courts should “interpret the Copyright Act in a way that avoids imposing an additional layer of protections and fees based solely on the method of delivery of the work to the end user.” These principles have been particularly important for Canadian libraries and their patrons, supporting CDL and other important library practices there.

In many ways, these principles seem like good old fashioned common sense. But publishers and others have long claimed that these user rights and technological neutrality “pose[] a direct threat” to their economic interests. In the new case, SOCAN v. ESA, these arguments were once again brought before the Supreme Court of Canada–and once again rejected. 

As Professor Michael Geist has noted, the case:

provides a further entrenchment of Canadian copyright jurisprudence that holds users’ rights and the copyright balance as foundational elements of the law. . . . the court’s support for these principles is not obiter, rhetoric, or likely to change. Indeed, copyright lobby groups have spent much of the past two decades in denial, convinced that somehow the growing body of Supreme Court copyright cases will be reversed the next time the court confronts the issue. That has now led to multiple defeats at Canada’s highest court by copyright collectives such as Access Copyright and SOCAN. In each case, the core copyright principles have remained unchanged. Indeed, if anything, they have become more solidified as precedent builds upon precedent. Given these outcomes and last week’s SOCAN v. ESA decision, it is long past time for these groups to engage in copyright policy based on the realities of balance, users’ rights, and technological neutrality.

These principles–and a balanced approach overall–allow libraries in Canada to continue to fulfill their mission in the digital age, and allow ordinary citizens access to quality information, all while supporting a thriving creative industry at home and abroad.

Save our Safe Harbor, continued: Internet Archive Supports Libraries and Nonprofits in Submission to the Copyright Office

As many of our readers will know, Section 512 of the Digital Millennium Copyright Act is the 1998 law that established the notice-and-takedown system that protects online platforms of all kinds—including, libraries, archives, and other nonprofits—from liability for the copyright infringement of others. While the law is not perfect, the safe harbor provided by the DMCA has been important in allowing libraries, nonprofits, and other smaller participants to harness the power of the internet and play a meaningful role in the online information ecosystem. More broadly, as our friends at the Wikimedia Foundation have noted, “Section 512 is crucial to the functioning of many of the most popular and important segments of the Internet, and the creative expression that happens there.”

Unfortunately, Section 512 has been under attack for some time. In addition to various legislative proposals, the United States Copyright Office has repeatedly been asked to conduct work on Section 512 that could threaten the safe harbor status of libraries and nonprofits and the communities of their patrons and users. In 2016, for instance, Internet Archive submitted comments to the Copyright Office’s first large Section 512 study, as outlined in a blog post entitled “Save our Safe Harbor“—there, we noted the special importance of the DMCA to “libraries and other nonprofit organizations” which rely in substantial part on volunteer communities and which “are unlikely to be able to bring to bear the sorts of resources [available to] larger commercial entities.” Then again in 2020, as the Copyright Office kept working towards Section 512 reform, the Internet Archive (in collaboration with the New York University Technology Law & Policy Clinic) urged the Copyright Office to consider how changes to the DMCA could have “disproportionately negative impacts on public service non–profits such as the Internet Archive and our patrons.”

This year, the Copyright Office is continuing with ever more work streams on DMCA reform. And while the conversation remains dominated by the commercial interests of some of the world’s largest corporations, Internet Archive has again submitted comments seeking to correct this imbalance. Most recently, in a May 27, 2022 comment on the Copyright Office’s study of Section 512(i) Standard Technical Measures, we emphasized that—notwithstanding industry attempts to use Section 512(i) to impose burdensome technical mandates which could threaten all but the largest commercial intermediaries—nothing in the law “admits of a standard technical measure which would impose substantial burdens and costs on libraries [and] non-profits.”

The DMCA Safe Harbors, while imperfect, have been essential to the ability of libraries, nonprofits, and others to develop public-interest-minded spaces online. And while much has changed since the DMCA’s enactment, it is as important as ever that our legal and regulatory systems allow library and other public interest spaces to flourish online.

Internet Archive Joins Opposition to the “SMART Copyright Act”

In the past few weeks, governments around the world have renewed their efforts to restrain free expression online. In Canada, a revised “Online Streaming Act” comes as the latest in a long-running attempt to bring streaming under a restrictive regulatory regime. In the UK, a new “Online Safety Bill” seeks to censor “legal but harmful content” in a way that would threaten open digital spaces. And in the USA, content filtering is once again being floated as the answer to online copyright infringement, this time via the “SMART Copyright Act of 2022“.

If the SMART Copyright Act were to pass, the Copyright Office would select a “technical measure” every three years that online service providers would be required to implement. The intent, as supporters have made clear, is for the Copyright Office to mandate technical measures that would automatically “filter out” allegedly infringing material. Lobbyists and lawyers for the owners of these technologies would be allowed to petition the Copyright Office to require the adoption of their own products. Whatever technology is adopted would then have to be purchased and implemented by anyone swept up by the law—from big tech platforms to your local research library. Failure to do so could be punishable by millions of dollars in civil penalties, among other things. As Professor Eric Goldman has written:

The SMART Copyright Act is a thinly veiled proxy war over mandatory filtering of copyrighted works. . . mandatory filters are error-prone in ways that hurt consumers, and they raise entry barriers in ways that reduce competition.

More generally, the SMART Copyright Act would give the Copyright Office a truly extraordinary power–the ability to force thousands of businesses to adopt, at their expense, technology they don’t want and may not need, and the mandated technologies could reshape how the Internet works.

Wouldn’t It Be Great if Internet Services Had to License Technologies Selected by Hollywood? (Comments on the Very Dumb “SMART Copyright Act”), from Eric Goldman’s Technology & Marketing Law Blog on March 23, 2022

This bill and its supporters do not represent the public’s interest in fair copyright policy and a robust and accessible public domain. That is a shame, because much good could be done if policymakers would put the public’s interest first. For example, the Copyright Office—which holds records of every copyright ever registered, including all those works which have passed into the public domain—could help catalogue the public domain and prevent it from being swept up by today’s already-overzealous automated filtering technologies (an idea inspired by this white paper from Paul Keller and Felix Reda). Instead, the public domain continues to be treated as acceptable collateral damage in the quest to impose ever-greater restrictions on free expression online.

It is no surprise that the Electronic Frontier Foundation, Public Knowledge, the Library Copyright Alliance, and many others have voiced criticism of this harmful bill. Today, Internet Archive joins these and other signatories in a joint letter to the bill’s cosponsors, Senators Tillis and Leahy, expressing opposition and concern. You can read the letter here.

Link Taxes: A Bad Idea for Journalism and the Open Internet

For many years, some of the world’s largest news publishers have been seeking ways to expand their power online. In Australia, they were able to do so through an unusual form of mandatory arbitration. But underneath these kinds of proposals, whether based in arbitration or otherwise, is a claim to a new sort of copyright right. Often styled as an “ancillary” copyright, such a right could—as described in a recent Copyright Office document—require payment to news publishers from any “online service that collects links to and sometimes snippets of third-party articles and makes them available to its readers.” In other words, this new right would allow big news publishers—and only news publishers—to extract fees from webpages that include links. Unsurprisingly, many have described this as a link tax. And it is now under study at the United States Copyright Office.

We believe link taxes are a bad idea. At a basic level, they are inconsistent with a free and open internet, which relies on the ability of any website to freely link to any other website. But we also do not see that they would achieve the stated goal of protecting journalists against unfair competition. Link tax payments wouldn’t actually go to journalists—instead, under current proposals, they’d go directly to publishers like Rupert Murdoch’s News Corp. This would only make it harder for small, independent, and innovative journalistic upstarts to compete; big companies like News Corp would get this new payment, while small independent journalists would not. Indeed, for these and a variety of other reasons, many have questioned not only whether such proposals support the public interest, but whether they are even consistent with the US Constitution. Supporting quality local journalism is something we can all stand behind, but imposing a link tax on the open web is not the way to do it.

As we have often mentioned, even well-intentioned changes to copyright law can have wide-reaching and negative effects on the online information ecosystem. That is why Internet Archive was proud to voice these concerns in a recent submission to the US Copyright Office and at a public roundtable on December 9, 2021.