Tag Archives: october

Internet Archive Celebrates Research and Research Libraries at Annual Gathering

At this year’s annual celebration in San Francisco, the Internet Archive team showcased its innovative projects and rallied supporters around its mission of “Universal Access to All Knowledge.”

Brewster Kahle, Internet Archive’s founder and digital librarian, welcomes hundreds of guests to the annual celebration on October 12, 2023.

“People need libraries more than ever,” said Brewster Kahle, founder of the Internet Archive, at the October 12 event. “We have a set of forces that are making libraries harder and harder to happen—so we have to do something more about it.”

Efforts to ban books and defund libraries are worrisome trends, Kahle said, but there are hopeful signs and emerging champions.

Watch the full live stream of the celebration

Among the headliners of the program was Connie Chan, Supervisor of San Francisco’s District 1, who was honored with the 2023 Internet Archive Hero Award. In April, she authored and unanimously passed a resolution at the San Francisco Board of Supervisors, backing the Internet Archive and the digital rights of all libraries.

Chan spoke at the event about her experience as a first-generation, low-income immigrant who relied on books in Chinese and English at the public library in Chinatown.  

Watch Supervisor Chan’s acceptance speech

“Having free access to information was a critical part of my education—and I know I was not alone,” said Chan, who is a supporter of the Internet Archive’s role as a digital, online library. “The Internet Archive is a hidden gem…It is very critical to humanity, to freedom of information, diversity of information and access to truth…We aren’t just fighting for libraries, we are fighting for our humanity.”

Several users shared testimonials about how resources from the Internet Archive have enabled them to advance their research, fact-check politicians’ claims, and inspire their creative works. Content in the collection is helping improve machine translation of languages. It is preserving international television news coverage and Ukrainian memes on social media during the war with Russia.  

Quinn Dombrowski, of the Saving Ukrainian Cultural Heritage Online project, shows off Ukrainian memes preserved by the project.

Technology is changing things—some for the worse, but a lot for the better, said David McRaney, speaking via video to the audience in the auditorium at 300 Funston Ave. “And when [technology] changes things for the better, it’s going to expand the limited capabilities of human beings. It’s going to extend the reach of those capabilities, both in speed and scope,” he said. “It’s about a newfound freedom of mind, and time, and democratizing that freedom so everyone has access to it.”

Open Library developer Drini Cami explained how the Internet Archive is using artificial intelligence to improve access to its collections.

When a book is digitized, it used to be that photographs of pages had to be manually cropped by scanning operators. The Internet Archive recently trained a custom machine learning model to automatically suggest page boundaries—allowing staff to double the rate of process. Also, an open-source machine learning tool converts images into text, making it possible for books to be searchable, and for the collection to be available for bulk research, cross-referencing, text analysis, as well as read aloud to people with print disabilities.

Open Library developer Drini Cami.

“Since 2021, we’ve made 14 million books, documents, microfiche, records—you name it—discoverable and accessible in over 100 languages,” Cami said.

As AI technology advanced this year, Internet Archive  engineers piloted a metadata extractor, a tool that automatically pulls key data elements from digitized books. This extra information helps librarians match the digitized book to other cataloged records, beginning to resolve the backlog of books with limited metadata in the Archive’s collection. AI is also being leveraged to assist in writing descriptions of magazines and newspapers—reducing the time from 40 to 10 minutes per item.

“Because of AI, we’ve been able to create new tools to streamline the workflows of our librarians and the data staff, and make our materials easier to discover, and work with patrons and researchers, Cami said. “With new AI capabilities being announced and made available at a breakneck rate, new ideas of projects are constantly being added.”

Jamie Joyce & AI hackathon participants.

A recent Internet Archive hackathon explored the risks and opportunities of AI by using the technology itself to generate content, said Jamie Joyce, project lead with the organization’s Democracy’s Library project. One of the hackathon volunteers created an autonomous research agent to crawl the web and identify claims related to AI. With a prompt-based model, the machine was able to generate nearly 23,000 claims from 500 references. The information could be the basis for creating economic, environmental and other arguments about the use of AI technology. Joyce invited others to get involved in future hackathons as the Internet Archive continues to expand its AI potential.

Peter Wang, CEO and co-founder at Anaconda, said interesting kinds of people and communities have emerged around cultures of sharing. For example, those who participate in the DWeb community are often both humanists and technologists, he said, with an understanding about the importance of reducing barriers to information for the future of humanity. Wang said rather than a scarcity mindset, he embraces an abundant approach to knowledge sharing and applying community values to technology solutions.

Peter Wang, CEO and co-founder at Anaconda.

“With information, knowledge and open-source software, if I make a project, I share it with someone else, they’re more likely to find a bug,” he said. “They might improve the documentation a little bit. They might adapt it for a novel use case that I can then benefit from. Sharing increases value.”

The Internet Archive’s Joy Chesbrough, director of philanthropy, closed the program by expressing appreciation for those who have supported the digital library, especially in these precarious times.

“We are one community tied together by the internet, this connected web of knowledge sharing. We have a commitment to an inclusive and open internet, where there are many winners, and where ethical approaches to genuine AI research are supported,” she said. “The real solution lies in our deep human connection. It inspires the most amazing acts of generosity and humanity.”

***

If you value the Internet Archive and our mission to provide “Universal Access to All Knowledge,” please consider making a donation today.

Celebrate with the Internet Archive on October 11th & 12th

Join us on October 11th & 12th to help celebrate AI @ IA : Research in the Age of Artificial Intelligence!

October 11: Tour of the physical archive

Please join us October 11th @ 6-8pm as we take a peek behind the doors of the physical archive in Richmond, California.

We are excited to offer a behind-the-scenes tour of our physical collections of books, music, film, and video in Richmond, California.

With this special insider event we are opening the doors to an often unseen place. See the lifecycle of physical books – donation, preservation, digitization, and access. Also, samples from generous donations and acquisitions of books, records, microfiche, and more are presented.

Register now for the physical archive tour


October 12: Join our annual celebration – in-person & virtual

Artificial Intelligence rocking your boat? Join us October 12th to see how the Internet Archive is using AI to build new capabilities into our library, and how students and scholars all over the world use the Archive’s petabytes of data to inform their own research.

This year’s event is hybrid. We will be celebrating in-person at our main library in San Francisco, and will be livestreaming the event itself from 7pm-8pm PT for those who want to celebrate with us from afar!

Register now for in-person or virtual attendance

Event details

5pm: Entertainment and food trucks
7pm: Program in our Great Room
8pm: Dancing in the streets

Location: 300 Funston Ave. at Clement St., San Francisco

Registration is required: Register now for in-person or virtual attendance.

Community Turns Out to Celebrate Promise of Democracy’s Library

Friends and supporters of the Internet Archive gathered October 19 at the organization’s headquarters in San Francisco to celebrate the launch of Democracy’s Library.

Plans to collect government documents from around the world and make them easily accessible online were met with enthusiasm and endorsements. Speakers at the event expressed an urgency to preserve the public record, make valuable research discoverable, and keep the citizenry informed—all potential benefits of Democracy’s Library. 

“If we really succeed — and we have to succeed — then Democracy’s Library might become an inspiration for openness in areas that are becoming more and more closed,” said Internet Archive founder Brewster Kahle. 

The 10-year project aims to make freely available the massive volume of government publications (from the U.S. and other democracies), including books, guides, reports, surveys, laws and academic research results, which are all funded with taxpayer money, but often difficult to find. 

To kick off the project, Kahle announced the Internet Archive’s initial contributions to Democracy’s Library:

  • United States .gov websites collected since 2008; 
  • Crawls of the U.S. state government websites;
  • Digitized microfilm and microfiche from the U.S. Government Publishing Office, NASA and other government entities;
  • Crawls of government domains from 200 other countries;
  • 50 million government PDF documents made into text searchable information.

It will be a collaborative effort, said Kahle, calling upon others to join in the ambitious undertaking to contribute to the online collection.

The need for Democracy’s Library

“We need Democracy’s Library. The Internet Archive’s work leading this project represents a critical step in the evolution of democracy,” said Jamie Joyce, executive director of The Society Library and emcee of the program. “Archives and libraries, as they’ve always done in the past, will continue to change in their scope, scale, and capabilities to be of critical use to society, especially democratic societies. Tonight is about witnessing another transformation.”

Although there is more data available than ever before, Joyce said, society’s knowledge management system is badly broken. Misinformation is rampant, while high quality government data is buried and scattered across different federal, state and local agencies. 

Having public material consolidated, digitized and machine readable will allow journalists, activists, and others to be better informed. It will also make democracy more transparent and accountable, as well as protect the historical documents. “We will not be able to compute in the future what we do not save today,” Joyce said.

At a time when polarized politics can put information at risk, the event highlighted the need to safeguard public data.

Gretchen Gehrke, co-founder of the Environmental Data and Governance Initiative, has been working in partnership with the Internet Archive to track changes in federal environmental websites. 

“People should be able to know about environmental issues and have a say in environmental decisions,” she said. “For the last 20 years, the majority of this information has been delivered through the web, but the right to access that information through the web is not protected.”

Gehrke described how public resources and tools related to the federal Clean Power Plan, a hallmark environmental regulation of the Obama administration, were taken down from the Environmental Protection Agency’s website under President Trump’s tenure. 

“There are no policies protecting federal website information from suppression or outright censorship,” Gehrke said. “This case serves as an example of why we need Democracy’s Library to preserve and provide continued access to these critical government documents.”

When statistics are being cited in policy debates, citizens need to be able to have access to sources of claims. For example, Sharon Hammond, chief operating officer of The Society Library, said documents related to the environmental impact of California’s Diablo Canyon power plant should be easily available. There are nearly 5 different government bodies that have some role in monitoring the plant’s ecological impact, but the agencies house the reports on their own websites. 

“Finding governmental records about public policy matters should not be a barrier to becoming an informed participant in these collective decisions,” Hammond said. “When we connect evidence directly to the claims and make that information publicly accessible as a resource, we can improve the public discourse.”

Hammond said a searchable, machine readable repository of government documents, with active links and a register of relevant government agencies, will dramatically increase meaningful access to the public’s information.

An international vision

The effort is an international one, and Canada has stepped forward as an early partner.

Canada has contributed crawls by the Library and Archives Canada of all the country’s government websites, as well as digitized microfilm and books from the Canadian Research Knowledge Network, Canadiana, and the University of Toronto.

Leslie Weir, librarian and archivist of Canada, spoke in support of the initiative. 

“We know by making our collection and work of government openly accessible, we will create a more engaged community, a community that participates in elections, school board meetings, in public consultations, and yes, even and especially in protests,” Weir said. “Access is the key to understanding. And understanding is the underpinning of democracy.”

Celebrating heroes

The festivities concluded with a tribute to Carl Malamud, recipient of the 2022 Internet Archive Hero Award. Corynne McSherry, legal director of the Electronic Frontier Foundation, presented the award. “Carl has always seen what the internet could be. He has dedicated his life to building that internet,” she said. “He is a true hero.”

Malamud said government information is more than just a good idea. “It is about the law. It is about our rulebook. It is the manual on how we, as citizens, choose to run our society. We own this manual,” he said. “We cannot honor our obligations to future generations if we cannot freely read and speak and even change that rulebook.”

Malamud urged the audience to get involved to realize the vision of Democracy’s Library and guarantee universal access to human knowledge. 

“This is our moment. We must build a distributed and interoperable internet for our global village. We must make the increase in diffusion of knowledge our mutual and everlasting mission,” Malamud said. “We must seize the means of computation and share their fruits with all the people. Let us all swim together in the ocean of knowledge.”

For more on Malamud’s career and contributions, read his profile here.