Internet Archive weighs in on Artificial Intelligence at the Copyright Office

All too often, the formulation of copyright policy in the United States has been dominated by incumbent copyright industries. As Professor Jessica Litman explained in a recent Internet Archive book talk, copyright laws in the 20th century were largely “worked out by the industries that were the beneficiaries of copyright” to favor their economic interests. In these circumstances, Professor Litman has written, the Copyright Office “plays a crucial role in managing the multilateral negotiations and interpreting their results to Congress.” And at various times in history, the Office has had the opportunity to use this role to add balance to the policymaking process.

We at the Internet Archive are always pleased to see the Copyright Office invite a broad range of voices to discussions of copyright policy and to participate in such discussions ourselves. We did just that earlier this month, participating in a session at the United States Copyright Office on Copyright and Artificial Intelligence. This was the first in a series of sessions the Office will be hosting throughout the first half of 2023, as it works through its “initiative to examine the copyright law and policy issues raised by artificial intelligence (AI) technology.”

As we explained at the event, innovative machine learning and artificial intelligence technology is already helping us build our library. For example, our process for digitizing texts–including never-before-digitized government documents–has been significantly improved by the introduction of LSTM technology. And state-of-the-art AI tools have helped us improve our collection of 100 year-old 78 rpm records. Policymakers dazzled by the latest developments in consumer-facing AI should not forget that there are other uses of this general purpose technology–many of them outside the commercial context of traditional copyright industries–which nevertheless serve the purpose of copyright: “to increase and not to impede the harvest of knowledge.” 

Traditional copyright policymaking also frequently excludes or overlooks the world of open licensing. But in this new space, many of the tools come from the open source community, and much of the data comes from openly-licensed sources like Wikipedia or Flickr Commons. Industry groups that claim to represent the voice of authors typically do not represent such creators, and their proposed solutions–usually, demands that payment be made to corporate publishers or to collective rights management organizations–often don’t benefit, and are inconsistent with, the thinking of the open world

Moreover, even aside from openly licensed material, there are vast troves of technically copyrighted but not actively rights-managed content on the open web; these are also used to train AI models. Millions, if not billions, of individuals have contributed to these data sources, and because none of them are required to register their work for copyright to arise, it does not seem possible or sensible to try to identify all of the relevant copyright owners–let alone negotiate with each of them–before development can continue. Recognizing these and a variety of other concerns, the European Union has already codified copyright exceptions which permit the use of copyright-protected material as training data for generative AI models, subject to an opt-out in commercial situations and potential new transparency obligations

To be sure, there are legitimate concerns over how generative AI could impact creative workers and cause other kinds of harm. But it is important for copyright policymakers to recognize that artificial intelligence technology has the potential to promote the progress of science and the useful arts on a tremendous scale. It is both sensible and lawful as a matter of US copyright law to let the robots read. Let’s make sure that the process described by Professor Litman does not get in the way of building AI tools that work for everyone.

4 thoughts on “Internet Archive weighs in on Artificial Intelligence at the Copyright Office

  1. Chris Lowe

    I believe we are getting closer to a Skynet-born future every year. Hope the Government doesn’t invest on AI.

  2. Eduarda Sahlit

    I must say that I am extremely thankful to the Internet Archive becauseI would not ever have found a great deal of the information I have today about all things I wanted to research, including my family. For instance, the only book I found that was written by my great grandfather, Reverend Antonio Neves de Mesquita, and that was actually respectfully published on the internet, was here. I say respectfully, because everywhere else I found anything about him, was visibly published online with the intention of providing to people the opportunity of copying the text. The websites had intrusive ads, viruses, and the download screamed piracy. I also found many books – MANY – about the history of my family that I had no idea whatsoever I would find anywhere else.
    It is a library, no doubt.
    The Internet Archive also gave me the chance to upload material I had written (even though only ideas of projects to be implemented in poor neighborhoods for children), when I was worried people would take my ideas to the local government and say it was theirs! When I told them I had done so, the look on their faces gave me the peace of mind I needed. Like I said, I found refuge for my little, insignificant ideas here. In NO OTHER PLACE was I taken into account in my entire life: I was never posted / published by any university, group, publisher with commercial interests, etc.
    Moreover, even Facebook and other social networks (LinkedIn, for example, is a headache!) had the ability of simply allowing people to erase or modify what I had written! But not here.
    Today, you are firing at the Archive for the exact same reason you are having to deal with all “giants” – they became too big while others with doing something else they considered more interesting.
    Another thing to consider is the following: a book (or ideas) can be copied even if I have a book in my hands, and read it, and then decide to sit in front of the computer and type the text as is it were my own, right?
    It is best if all the borrowing, and interacting, and reading, etc, is done openly, where everyone can watch and participate, including authors!

  3. John Fraser

    do not discount mass-generated disinformation, the complete downfall of photographic evidence as a reliable measuring rod, the ability to turn anyone’s face and voice into a ‘product’ for your very own use, and the further splintering of humanity into tiny echo chambers who only listen to the voices most comfortable to them.

Comments are closed.