Internet Archive Submits Comments on Copyright and Artificial Intelligence

On Monday the Internet Archive joined thousands of others in submitting comments to the US Copyright Office as part of its study on Copyright and Artificial Intelligence.

Our high level view is that copyright law has been adapting to disruptive technologies since its earliest days and our existing copyright law is adequate to meet the disruptions of today. In particular, copyright’s flexible fair use provision deals well with the fact-specific nature of new technologies, and has already addressed earlier innovations in machine learning and text-and-data mining. So while Generative AI presents a host of policy challenges that may prompt different kinds of legislative reform, we do not see that new copyright laws are needed to respond to Generative AI today.

Our comments are guided by three core principles.


First, regulation of Artificial Intelligence should be considered holistically–not solely through the isolated lens of copyright law. As explained in the Library Copyright Alliance Principles for Artificial Intelligence and Copyright, “AI has the potential to disrupt many professions, not just individual creators. The response to this disruption (e.g., support for worker retraining through institutions such as community colleges and public libraries) should be developed on an economy-wide basis, and copyright law should not be treated as a means for addressing these broader societal challenges.” Going down a typical copyright path of creating new rights and licensing markets could, for AI, serve to worsen social problems like inequality, surveillance and monopolistic behavior of Big Tech and Big Media.

Second, any new copyright regulation of AI should not negatively impact the public’s right and ability to access information, knowledge, and culture. A primary purpose of copyright is to expand access to knowledge. See Authors Guild v. Google, 804 F.3d 202, 212 (2d Cir. 2015) (“Thus, while authors are undoubtedly important intended beneficiaries of copyright, the ultimate, primary intended beneficiary is the public, whose access to knowledge copyright seeks to advance  . . . .”). Proposals to amend the Copyright Act to address AI should be evaluated by the impact such new regulations would have on the public’s access to information, knowledge, and culture. In cases where proposals would have the effect of reducing public access, they should be rejected or balanced out with appropriate exceptions and limitations.

Third, universities, libraries, and other publicly-oriented institutions must be able to continue to ensure the public’s access to high quality, verifiable sources of news, scientific research and other information essential to their participation in our democratic society. Strong libraries and educational institutions can help mitigate some of the challenges to our information ecosystem, including those posed by AI. Libraries should be empowered to provide access to educational resources of all sorts– including the powerful Generative AI tools now being developed.

Read our full comments here.

12 thoughts on “Internet Archive Submits Comments on Copyright and Artificial Intelligence

  1. Tyler

    I am a frequent monetary contributor to the Archive, and I am confused about some of the positions being put forward by you regarding this issue. A few examples:
    ‘Going down a typical copyright path of creating new rights and licensing markets could, for AI, serve to worsen social problems like inequality, surveillance and monopolistic behavior of Big Tech and Big Media.’
    Generative AI is itself a product wholly derived from surveillance and monopolistic Big Tech behavior – it is built on wholesale scraping and industrial storage of human created data by either the biggest corporations on earth or startups directly funded by them. Saying that requiring them to provide some kind of trail that can be readily checked by a third party would somehow play into their hands seems disingenuous in the extreme.
    ‘Copyright rules regarding transparency must be carefully crafted so as not to discourage AI companies from disclosing data provenance and citing trustworthy sources.’
    This seems suspiciously close to ‘if you don’t allow them to go ahead and do it, how can you expect them to be honest with you about how?’, which is a well-established form of regulatory capture.
    ‘ For our part, we agree with the Library Copyright Alliance Principles for Copyright and Artificial Intelligence that data collection and nonconsumptive uses for the purposes of training AI models will generally be a fair use under US Copyright Law.’
    First what does ‘nonconsumptive’ mean in this context, given that modern ML generation is based on the technical loophole of recording every computable aspect of a given piece of data, just without direct copying into a readable format? And regarding your broader citing of fair use – up until now, fair use has been oriented around a person or persons quoting/parodying/collaging specific work from another person or set of persons. This differs massively from an indiscriminate collection of as many works as is possible for the purpose of theoretically infinite variations and amalgamations of said works. Regardless of whether one generated work borrows particularly from a specific human work, it is still inherently a process of algorithmic laundering.
    I am not inherently against ML or its usage, but using it for building gigantic models designed to mulch and mash the internet at large seems to be a monstrous idea, and can only accelerate the process of dehumanization and dis-empowering that you claim to be fighting against. I’m not sure if copyright law is the best way to fight against it, and I do not want other areas to be caught in the crossfire, but your suggestion that we simply trust the owners of these models to regulate themselves is outrageous.

  2. Tom

    So… you are saying that machines write the books. Stop pussyfooting around and just say it. And, how would you know? What kind of conspiracy is this? Also, ‘A.I.’ doesn’t exist. It is a checkdown system written by people. Big problem with the whole thing. Do you want millionaire novelists or do you want their careers stolen from them?

  3. Tom

    The Archive seems to have a student-leaning bias where ‘access’ is the point, it is not. Its nonexistence is essential to freedom. This is a moment for the United States Congress to drag these faceless people before the American public and condemn them.

  4. Tyler

    Why was my long dissenting comment, which thoroughly explained my objections to your comments, not approved by moderation , yet the above was?

  5. John Myers

    Tom,
    IF you would like others to understand your point you’ll need to calmly explain the problem. Then state your thoughts on the matter. All I see is a strange rant that makes no sense to me.

    J.

  6. williems

    their participation in our democratic society. Strong libraries and educational institutions can help mitigate some of the challenges to our information ecosystem, including those posed by AI. Libraries should be empowered to provide access to educational resources of all sorts– including the powerful Generative AI tools now being developed.

  7. Mandy

    So you agree for their jobs to be stolen? Basically you allow for big corporations to stole other people jobs and creations just to make even more money for themselves. You stop other people from wanting learn how to create, write, drag or even code.

  8. Mariya

    Greetings and respect I am looking for information about the Iranian actor and composer, Mr. Javad Nazari. He is 41 years old and lives in Tehran. has done Unfortunately, this is all the information I have about him, can I find more information about him here or not?! Thank you for your help

Comments are closed.