The Internet Archive team, its partners, and enthusiasts recently shared updates on how the organization is empowering research, ensuring preservation of vital materials, and extending access to knowledge to a growing number of grateful users.
The 2023 Library Leaders Forum, held virtually Oct. 4, featured snapshots of the many activities the organization is supporting on a global scale. Together, the efforts are making a difference in the lives of students, scholars, educators, entrepreneurs, journalists, public servants — anyone who needs trusted information without barriers.
“It’s important for us to recognize that the Internet Archive is a library. It’s a research library in the role that it plays, in the way that it works,” said Brewster Kahle, founder of the Internet Archive.
Watch the 2023 Library Leaders Forum:
With the rise of misinformation and new artificial intelligence technologies, reliable, digital information is needed more than ever, he said.
“This is going to be a challenging time in the United States when all of our institutions — the press, the election system, and libraries — are going to be tested,” Kahle said. “It’s time for us to make sure we stand up tall and be as useful to people in the United States and to people around the world who are having some of the same issues.”
To provide citizens everywhere with free access to government data, documents, records, the Archive launched Democracy’s Library last year. The collection now has 889,000 government publications, with many more items donated but yet to be organized, said the Archive’s Jamie Joyce at the forum. The goal is to digitize municipal, provincial, state and federal documents, along with datasets, research, records publications, and microfiche so they are searchable and accessible.
The Archive is taking a leadership role in harnessing the power of AI to make its information easier for users to find, Kahle added. It is also preserving state television newscasts from Russia and Iran, along with translations, to allow researchers to track trends in coverage.
Collections as data
Thomas Padilla, deputy director of data archiving and data services at the Internet Archive, reported on a project that examines how libraries can support responsible use of collections as data. Working in partnership with Iowa State University, University of Pennsylvania, and James Madison University, it is a community development effort for libraries, archives, museums and galleries to help researchers use new technology (text and data mining, machine learning) while also mitigating potential harm that can be generated by the process.
Through the effort, the Archive gave grants to 12 research libraries and cultural heritage organizations to explore questions around collections as data, Padilla said. As it became apparent that others around the world were grappling with similar issues, the project convened representatives from 60 organizations representing 18 countries earlier this year in Canada. The group agreed on core principles (The Vancouver Statement on Collections-As-Data) to use when providing machine actionable collection data to researchers. Next, the project expects to issue a roadmap for the broader international community in this space, Padilla said.
Helping libraries help publishers
The recent forum also featured digitization managers from the Internet Archive who are collaborating with partner libraries, including Tim Bigelow, Sophie Flynn-Piercy, Elizabeth MacLead, Andrea Mills and Jeff Sharpe. These librarians are at institutions big and small from the University of North Carolina at Chapel Hill to the Wellcome Trust in London, working with teams of professionally trained technicians to digitize collections.
One of those partnerships is taking an exciting new direction. The Boston Public Library’s partnership with the Archive began in 2007. Over the years, the team has completed digitization of the John Adams presidential library, Shakespeare’s First Folio (his 36 plays published in 1632), more than 17,000 government documents and the Houghton Mifflin trade book archival collection, according to Bigelow, the Northeast Regional digitization manager for the Archive.
The Houghton Mifflin collection includes 20,000 titles dating back to 1832, including some of the best known works in American fiction and children’s literature, such as books by Ralph Waldo Emerson and the Curious George series. The publisher gave BPL the entire physical collection for preservation (90% of which were out of print) and continues to add new titles as they are published. With the formal agreement of Houghton Mifflin, BPL and the Archive have been working together since 2017 to digitize every book—those in the public domain are completely readable and downloadable; those still in copyright are available through controlled digital lending (CDL).
As in Boston, many libraries have embraced CDL. However, commercial publishers have challenged the practice.
Lila Bailey, senior policy counsel for the Archive, provided an update at the forum on the Hachette v. Internet Archive lawsuit, in which the court ruled in favor of the publishers in limiting the use of CDL. The Archive filed an appeal in September. Bailey encouraged supporters to consider filing amicus briefs when the Archive’s case is expected to be reviewed by the appellate court.
For the Internet Archive—and libraries everywhere—to continue their work, the Archive is advocating for a legal infrastructure that ensures libraries can collect digital materials, preserve those materials in different formats, lend digital materials, and cooperate with other libraries.
“In our evolving digital society, will new technologies serve the public good, or only corporate interests?” Bailey asked in her remarks at the forum. “Libraries are on the front line of the fight to decide this question in favor of the public good. In order to maintain our age-old role as guardians of knowledge, we need our rights to own, lend and preserve books, as we all live more and more of our lives online.”