Category Archives: News

AI@IA — Extracting Words Sung on 100 year-old 78rpm records

A post in the series about how the Internet Archive is using AI to help build the library.

Freely available Artificial Intelligence tools are now able to extract words sung on 78rpm records.  The results may not be full lyrics, but we hope it can help browsing, searching, and researching.

Whisper is an open source tool from OpenAI “that approaches human level robustness and accuracy on English speech recognition.”  We were surprised how far it could get with recognizing spoken words on noisy disks and even words being sung.

For instance in As We Parted At The Gate (1915) by  Donald Chalmers, Harvey Hindermyer, and E. Austin Keith, the tool found the words:

[…] we parted at the gate,
I thought my heart would shrink.
Often now I seem to hear her last goodbye.
And the stars that tune at night will
never die as bright as they did before we
parted at the gate.
Many years have passed and gone since I
went away once more, leaving far behind
the girl I love so well.
But I wander back once more, and today
I pass the door of the cottade well, my
sweetheart, here to dwell.
All the roads they flew at fair,
but the faith is missing there.
I hear a voice repeating, you’re to live.
And I think of days gone by
with a tear so from her eyes.
On the evening as we parted at the gate,
as we parted at the gate, I thought my
heart would shrink.
Often now I seem to hear her last goodbye.
And the stars that tune at night will
never die as bright as they did before we
parted at the gate.

All of the extracted texts are now available– we hope it is useful for understanding these early recordings.  Bear in mind these are historical materials so may be offensive and also possibly incorrectly transcribed.

We are grateful that University of California Santa Barbara Library donated an almost complete set of transfers of 100 year-old Edison recordings to the Internet Archive’s Great 78 Project this year.  The recordings and the transfers were so good that the automatic tools were able to make out many of the words.

The next step is to integrate these texts into the browsing and searching interfaces at the Internet Archive.

Don’t Delete Our Books! Rally

For those asking how you can support the Internet Archive, there will be a rally on the steps of the Internet Archive on Saturday, April 8 @ 11am PT.

Learn more & sign up

Reposted from https://actionnetwork.org/events/dont-delete-our-books-rally-in-san-francisco

Rally for the digital future of libraries!

The nonprofit Internet Archive is appealing a judgment that threatens the future of all libraries. Big publishers are suing to cut off libraries’ ownership and control of digital books, opening new paths for censorship and surveillance. If this ruling is allowed to stand, it will result in:

— Increased censorship or even deletion of books, decided only by big publishing shareholders
— Big Tech growing its overreach into library patron’s data, making people unsafe by monitizing intimate personal information on what they read or research
— Even more predatory licensing fees from Big Media monopolies, who are gobbling up public and school library budgets
— Reduced access to books for people from every community
— Losing libraries as preservers of vast swaths of history and culture, because they will never be allowed to own and preserve digital books

More information is available at BattleForLibraries.com. The organizers of that website are holding a rally at the Internet Archive on Funston St in San Francisco on Saturday, April 8, 2023 at 11 am.

All are welcome. Bring signs (we’ll also have some to share!) and join us to stand up for the rights of libraries to own and preserve books—whether they’re digital or print.

Can’t make it to the rally?

You can still participate & show your support for the digital rights of libraries in the following ways:

  •  Make & share a rally sign & tag @internetarchive on social media
    Need a suggestion? Try: 
    Internet Archive is a Library For Everyone!
    eBooks are Books

How Can You Help The Internet Archive? (A Repost)

In June of 2020, facing a range of challenges, we posted a host of information about how you could help the Internet Archive through difficult and pressing times.

Pretty much all of the suggestions and links in that essay still hold up and are relevant this month as well, and we are the Historical Web people, so here is a full link to that post again:

http://blog.archive.org/2020/06/14/how-can-you-help-the-internet-archive/

Your words of support and letting us know what we mean to you are appreciated, and read with great happiness. Thanks.

The Fight Continues

Today’s lower court decision in Hachette v. Internet Archive is a blow to all libraries and the communities we serve. This decision impacts libraries across the US who rely on controlled digital lending to connect their patrons with books online. It hurts authors by saying that unfair licensing models are the only way their books can be read online. And it holds back access to information in the digital age, harming all readers, everywhere.

But it’s not over—we will keep fighting for the traditional right of libraries to own, lend, and preserve books. We will be appealing the judgment and encourage everyone to come together as a community to support libraries against this attack by corporate publishers. 

We will continue our work as a library. This case does not challenge many of the services we provide with digitized books including interlibrary loan, citation linking, access for the print-disabled, text and data mining, purchasing ebooks, and ongoing donation and preservation of books.

Statement from Internet Archive founder, Brewster Kahle:
“Libraries are more than the customer service departments for corporate database products. For democracy to thrive at global scale, libraries must be able to sustain their historic role in society—owning, preserving, and lending books.

This ruling is a blow for libraries, readers, and authors and we plan to appeal it.”

Take Action!

Stand up for libraries ✊
Stand up for the digital rights of all libraries! Join the Battle for Libraries: https://www.battleforlibraries.com/ 

Support the Internet Archive 📚 
Support the Internet Archive to continue fighting for libraries in court!

Stay connected 🔗
Sign up for the Empowering Libraries newsletter for ongoing updates about the lawsuit and our library.

Stand with Internet Archive as we fight for the digital rights of all libraries

We stood up for the digital rights of all libraries today in court! The Southern District of New York heard oral argument in Hachette v. Internet Archive, the lawsuit against our library and the longstanding library practice of controlled digital lending, brought by 4 of the world’s largest publishers.

We fought hard for libraries today, and we’re proud of how well we were able to represent the value of controlled digital lending to the communities we serve. 

Take action!

While we wait for the judge’s decision, here’s how you can show your support:

Join the Battle for Libraries ✊
The internet advocacy group Fight for the Future has launched the Battle for Libraries, an online rally in support of the Internet Archive and digital lending. Visit the action hub to engage with other supporters & share messages with your followers across social media to spread awareness about our fight. Get started now!

Read a book! 📕
Check out a book from Open Library and read it online using the library practice of controlled digital lending.

Stay connected 🔗
Sign up for the Empowering Libraries newsletter for the latest updates about the lawsuit and our library.

Internet Archive Press Conference: March 20, 2023

Internet Archive hosted a press conference before oral argument in Hachette v. Internet Archive, the lawsuit against our library.


Speakers:
Link to statement & transcript.

Press conference statement: Lila Bailey, Internet Archive

Lila Bailey is the senior policy counsel at Internet Archive. Lila spoke at the press conference hosted by Internet Archive ahead of oral argument in Hachette v. Internet Archive.

Statement

I’m Lila Bailey, Senior Policy Counsel for the Internet Archive. Today, the court will hear arguments about whether Copyright Law affirms the rights of libraries to lend the books they own to one reader at a time.

The benefits of libraries to our modern world cannot be overstated. Libraries are an essential component of our democratic and free society. But the rise of social media, and now AI, have resulted in an immediate threat to the public’s pursuit of the truth. 

In a vigorous pursuit of truth, the library is our greatest ally.  

The Internet Archive’s digital lending program serves this essential purpose. The very purpose of the copyright system: to encourage the intellectual enrichment of the public. 

Controlled digital lending represents the latest in a long history of innovations developed by libraries to serve the public’s need for information. In the past, publishers stood against microfilm and photocopiers, crying harm. They said they would be harmed by interlibrary loan. They lobbied for decades against libraries being allowed to provide access for the blind and print disabled. They were wrong. It took years, but eventually the law affirmed each of these things, and the public benefitted.

With this lawsuit, publishers have repeated those same claims of massive harm from controlled digital lending. 

But this case has revealed one thing very clearly, after both sides have spent nearly three years, and millions of dollars looking at the actual market and usage data. 

There has been no harm. These publishers have not shown the loss of even one dollar.

Even during COVID, when every physical library was closed and the Internet Archive stepped up to provide an Emergency Library.

Contrary to the publishers’ dire predictions there was simply no effect on their market. Not one dollar of harm.

When asked under oath, their own executives admit this. For example, Alison Lazarus, EVP and Director of Group Sales for Hachette, admitted that their theory of harm is only [quote] “speculative.”  Another executive, Skip Dye, SVP of Library Sales and Digital Strategy at Penguin Random House, candidly admitted that when it comes to market harm, quote: “I don’t have any evidence.”

Another agreed, stating: “There’s no factual analysis. It’s just one inference one could make.”  That was Chantal Restivo-Alessi, Chief Digital Officer of HarperCollins.

Tellingly, the publishers instructed their own 950 dollar per hour expert not to even try to measure economic harm. They didn’t give him any data to measure. When asked under oath whether any potential sales were lost, he responded: “I don’t have empirical evidence of that.”

On the other hand, when we invited economists from Northeastern University and the University of Copenhagen to look at the sales and library lending data produced in this case, they came to a singular conclusion: The Internet Archive’s digital lending had no measurable effect on the market whatsoever. 

Never in the history of the United States have libraries needed to obtain special permission or to pay license fees to lend the books they already own. Sure, publishers would profit from the ability to demand such fees, but the law does not give them that right.

We look forward to the court reaffirming the essential role of library lending, now in our digital world.

Thank you.

Press conference statement: Heather Joseph, SPARC

Heather Joseph is the executive director of SPARC. She spoke at the press conference hosted by Internet Archive ahead of oral argument in Hachette v. Internet Archive.

Statement

Access to knowledge is a fundamental human right. 

We depend on being able to freely share knowledge each and every day. It’s foundational to how we navigate the world – from how we learn to how we work, to how we share our culture and understand our collective history.  It’s also the lifeblood of how we advance discovery, and attack the biggest challenges that we face as a society.  From cancer breakthroughs to climate justice, we rely on being able to access, build on and benefit from the knowledge generated by those around us. 

We take for granted that knowledge is just – there, and that ANYONE can get it when and if they need it.  But the reality is that too often, this simply isn’t the case.    Especially in the world of scientific research, knowledge is treated as a commodity, and often carries a price tag that makes it unaffordable to all but the wealthiest individuals and institutions.   

This is never more evident than in times of crises. From the avian flu to the global COVID 19 pandemic, we’ve seen the same pattern play out over and over again. When a health crisis looms, one of the very first thing that happens is that scientists, the public and policymakers have to plead with publishers to lower their paywalls and make sure that those who desperately need access to knowledge can get it.  Whether it’s access to develop treatments and cures, or to make sure students can continue to learn, knowledge shouldn’t be kept locked behind glass that can only be broken in the event of an emergency.  It should be readily available to all. 

Libraries play a critical role in making this happen.  They are designed to empower everyone – regardless of who you are, where you live, or your economic or political status – to access and use knowledge. Whether you walk into a physical library like the New York Public Library, or log into a digital one like the Internet Archive, you don’t need a PhD or a billion-dollar bank account to access the knowledge they hold. 

We depend on libraries to do the crucial things they have done for centuries.  Libraries collect. They preserve.  And libraries lend.  They collect materials to ensure access to the broadest range of ideas and facts.  They preserve these materials for the long haul, because access to knowledge should not be ephemeral. Stable, consistent, long-term access is how we promote continuity and ultimately understand truth.  Lending – one copy of a physical or digital object to one person at time is the bedrock process that libraries use to ensure free, fair and equitable knowledge sharing.   

Libraries like the Internet Archive exist to ensure the universal sharing of knowledge. Sharing knowledge is a fundamental human right. Nothing could be more important to protect than that.