Category Archives: News

Internet Archive weighs in on Artificial Intelligence at the Copyright Office

All too often, the formulation of copyright policy in the United States has been dominated by incumbent copyright industries. As Professor Jessica Litman explained in a recent Internet Archive book talk, copyright laws in the 20th century were largely “worked out by the industries that were the beneficiaries of copyright” to favor their economic interests. In these circumstances, Professor Litman has written, the Copyright Office “plays a crucial role in managing the multilateral negotiations and interpreting their results to Congress.” And at various times in history, the Office has had the opportunity to use this role to add balance to the policymaking process.

We at the Internet Archive are always pleased to see the Copyright Office invite a broad range of voices to discussions of copyright policy and to participate in such discussions ourselves. We did just that earlier this month, participating in a session at the United States Copyright Office on Copyright and Artificial Intelligence. This was the first in a series of sessions the Office will be hosting throughout the first half of 2023, as it works through its “initiative to examine the copyright law and policy issues raised by artificial intelligence (AI) technology.”

As we explained at the event, innovative machine learning and artificial intelligence technology is already helping us build our library. For example, our process for digitizing texts–including never-before-digitized government documents–has been significantly improved by the introduction of LSTM technology. And state-of-the-art AI tools have helped us improve our collection of 100 year-old 78 rpm records. Policymakers dazzled by the latest developments in consumer-facing AI should not forget that there are other uses of this general purpose technology–many of them outside the commercial context of traditional copyright industries–which nevertheless serve the purpose of copyright: “to increase and not to impede the harvest of knowledge.” 

Traditional copyright policymaking also frequently excludes or overlooks the world of open licensing. But in this new space, many of the tools come from the open source community, and much of the data comes from openly-licensed sources like Wikipedia or Flickr Commons. Industry groups that claim to represent the voice of authors typically do not represent such creators, and their proposed solutions–usually, demands that payment be made to corporate publishers or to collective rights management organizations–often don’t benefit, and are inconsistent with, the thinking of the open world

Moreover, even aside from openly licensed material, there are vast troves of technically copyrighted but not actively rights-managed content on the open web; these are also used to train AI models. Millions, if not billions, of individuals have contributed to these data sources, and because none of them are required to register their work for copyright to arise, it does not seem possible or sensible to try to identify all of the relevant copyright owners–let alone negotiate with each of them–before development can continue. Recognizing these and a variety of other concerns, the European Union has already codified copyright exceptions which permit the use of copyright-protected material as training data for generative AI models, subject to an opt-out in commercial situations and potential new transparency obligations

To be sure, there are legitimate concerns over how generative AI could impact creative workers and cause other kinds of harm. But it is important for copyright policymakers to recognize that artificial intelligence technology has the potential to promote the progress of science and the useful arts on a tremendous scale. It is both sensible and lawful as a matter of US copyright law to let the robots read. Let’s make sure that the process described by Professor Litman does not get in the way of building AI tools that work for everyone.

AI Audio Challenge: Audio Restoration of 78rpm Records based on Expert Examples

http://great78.archive.org/

Hopefully we have a dataset primed for AI researchers to do something really useful, and fun– how to take noise out of digitized 78rpm records.

The Internet Archive has 1,600 examples of quality human restorations of 78rpm records where the best tools were used to ‘lightly restore’ the audio files. This takes away scratchy surface noise while trying not to impair the music or speech. In the items are files in those items are the unrestored originals that were used.

But then the Internet Archive has over 400,000 unrestored files that are quite scratchy and difficult to listen to.

The goal is, or rather the hope is, that a program that can take all or many of the 400,000 unrestored records and make them much better. How hard this is is unknown, but hopefully it is a fun project to work on.

Many of the recordings are great and worth the effort. Please comment on this post if you are interested in diving in.

AI@IA — Extracting Words Sung on 100 year-old 78rpm records

A post in the series about how the Internet Archive is using AI to help build the library.

Freely available Artificial Intelligence tools are now able to extract words sung on 78rpm records.  The results may not be full lyrics, but we hope it can help browsing, searching, and researching.

Whisper is an open source tool from OpenAI “that approaches human level robustness and accuracy on English speech recognition.”  We were surprised how far it could get with recognizing spoken words on noisy disks and even words being sung.

For instance in As We Parted At The Gate (1915) by  Donald Chalmers, Harvey Hindermyer, and E. Austin Keith, the tool found the words:

[…] we parted at the gate,
I thought my heart would shrink.
Often now I seem to hear her last goodbye.
And the stars that tune at night will
never die as bright as they did before we
parted at the gate.
Many years have passed and gone since I
went away once more, leaving far behind
the girl I love so well.
But I wander back once more, and today
I pass the door of the cottade well, my
sweetheart, here to dwell.
All the roads they flew at fair,
but the faith is missing there.
I hear a voice repeating, you’re to live.
And I think of days gone by
with a tear so from her eyes.
On the evening as we parted at the gate,
as we parted at the gate, I thought my
heart would shrink.
Often now I seem to hear her last goodbye.
And the stars that tune at night will
never die as bright as they did before we
parted at the gate.

All of the extracted texts are now available– we hope it is useful for understanding these early recordings.  Bear in mind these are historical materials so may be offensive and also possibly incorrectly transcribed.

We are grateful that University of California Santa Barbara Library donated an almost complete set of transfers of 100 year-old Edison recordings to the Internet Archive’s Great 78 Project this year.  The recordings and the transfers were so good that the automatic tools were able to make out many of the words.

The next step is to integrate these texts into the browsing and searching interfaces at the Internet Archive.

Don’t Delete Our Books! Rally

For those asking how you can support the Internet Archive, there will be a rally on the steps of the Internet Archive on Saturday, April 8 @ 11am PT.

Learn more & sign up

Reposted from https://actionnetwork.org/events/dont-delete-our-books-rally-in-san-francisco

Rally for the digital future of libraries!

The nonprofit Internet Archive is appealing a judgment that threatens the future of all libraries. Big publishers are suing to cut off libraries’ ownership and control of digital books, opening new paths for censorship and surveillance. If this ruling is allowed to stand, it will result in:

— Increased censorship or even deletion of books, decided only by big publishing shareholders
— Big Tech growing its overreach into library patron’s data, making people unsafe by monitizing intimate personal information on what they read or research
— Even more predatory licensing fees from Big Media monopolies, who are gobbling up public and school library budgets
— Reduced access to books for people from every community
— Losing libraries as preservers of vast swaths of history and culture, because they will never be allowed to own and preserve digital books

More information is available at BattleForLibraries.com. The organizers of that website are holding a rally at the Internet Archive on Funston St in San Francisco on Saturday, April 8, 2023 at 11 am.

All are welcome. Bring signs (we’ll also have some to share!) and join us to stand up for the rights of libraries to own and preserve books—whether they’re digital or print.

Can’t make it to the rally?

You can still participate & show your support for the digital rights of libraries in the following ways:

  •  Make & share a rally sign & tag @internetarchive on social media
    Need a suggestion? Try: 
    Internet Archive is a Library For Everyone!
    eBooks are Books

How Can You Help The Internet Archive? (A Repost)

In June of 2020, facing a range of challenges, we posted a host of information about how you could help the Internet Archive through difficult and pressing times.

Pretty much all of the suggestions and links in that essay still hold up and are relevant this month as well, and we are the Historical Web people, so here is a full link to that post again:

http://blog.archive.org/2020/06/14/how-can-you-help-the-internet-archive/

Your words of support and letting us know what we mean to you are appreciated, and read with great happiness. Thanks.

The Fight Continues

Today’s lower court decision in Hachette v. Internet Archive is a blow to all libraries and the communities we serve. This decision impacts libraries across the US who rely on controlled digital lending to connect their patrons with books online. It hurts authors by saying that unfair licensing models are the only way their books can be read online. And it holds back access to information in the digital age, harming all readers, everywhere.

But it’s not over—we will keep fighting for the traditional right of libraries to own, lend, and preserve books. We will be appealing the judgment and encourage everyone to come together as a community to support libraries against this attack by corporate publishers. 

We will continue our work as a library. This case does not challenge many of the services we provide with digitized books including interlibrary loan, citation linking, access for the print-disabled, text and data mining, purchasing ebooks, and ongoing donation and preservation of books.

Statement from Internet Archive founder, Brewster Kahle:
“Libraries are more than the customer service departments for corporate database products. For democracy to thrive at global scale, libraries must be able to sustain their historic role in society—owning, preserving, and lending books.

This ruling is a blow for libraries, readers, and authors and we plan to appeal it.”

Take Action!

Stand up for libraries ✊
Stand up for the digital rights of all libraries! Join the Battle for Libraries: https://www.battleforlibraries.com/ 

Support the Internet Archive 📚 
Support the Internet Archive to continue fighting for libraries in court!

Stay connected 🔗
Sign up for the Empowering Libraries newsletter for ongoing updates about the lawsuit and our library.

Stand with Internet Archive as we fight for the digital rights of all libraries

We stood up for the digital rights of all libraries today in court! The Southern District of New York heard oral argument in Hachette v. Internet Archive, the lawsuit against our library and the longstanding library practice of controlled digital lending, brought by 4 of the world’s largest publishers.

We fought hard for libraries today, and we’re proud of how well we were able to represent the value of controlled digital lending to the communities we serve. 

Take action!

While we wait for the judge’s decision, here’s how you can show your support:

Join the Battle for Libraries ✊
The internet advocacy group Fight for the Future has launched the Battle for Libraries, an online rally in support of the Internet Archive and digital lending. Visit the action hub to engage with other supporters & share messages with your followers across social media to spread awareness about our fight. Get started now!

Read a book! 📕
Check out a book from Open Library and read it online using the library practice of controlled digital lending.

Stay connected 🔗
Sign up for the Empowering Libraries newsletter for the latest updates about the lawsuit and our library.

Internet Archive Press Conference: March 20, 2023

Internet Archive hosted a press conference before oral argument in Hachette v. Internet Archive, the lawsuit against our library.


Speakers:
Link to statement & transcript.