As an academic librarian helping connect students and faculty with the research materials they need, Sanjeet Mann has turned to the Internet Archive many times.
“I really value having the Wayback Machine as an additional tool in my librarian’s toolbox,” Mann said. “Information preservation is an essential, but often overlooked, part of the infrastructure for teaching and learning.”
Mann, currently working as the Systems & Discovery Librarian at California State University, San Bernardino (CSUSB), said he first learned about the value of the Internet Archive in 2006 during his library science master’s program.
Over his career, Mann has worked at various libraries, tapping into the Archive on the job.
Assisting budding writers, composers and artists as Arts Librarian at University of Redlands, Mann found that the vast amount of free information online, including biographies, can shape students’ projects.
“We can draw on the Archive whenever we need inspiration for creative work, or when we need to understand how current scholarship and the issues that we’re facing now aren’t completely new—they’re based on this history of work by scholars, by politicians, by citizens active in the public interest,” he said. “These issues tend to recur over time. As a society, we need to know where we have been in order to meet the challenges of the future.”
At CSUSB, Mann also helps computer science and business students use the Archive’s collections to better understand the cultural roots of new technologies—the historical context for their innovations.
“It is the only entity I’m aware of that preserves the Internet’s scholarly and historical record at this scale,” Mann said.
On a practical note, Mann leveraged information through the Wayback Machine when he was researching how to set up a campus laptop loaner program for University of Redlands. This can be an essential service that libraries provide students who have trouble with their computers.
Mann wanted to understand policies at other universities, such as how they handled the return of damaged laptops. Looking at archived versions of university library websites through the Wayback Machine, Mann was able to learn about other approaches and find contacts to follow up for additional details.
The Internet Archive is a source to verify information that is no longer listed on websites, he said.
“Companies themselves don’t have any incentive to archive the history of their website. New products get launched. The platform gets migrated from one platform to another,” Mann said. “An organization like the Internet Archive, being a library, is uniquely positioned to meet the need in society of ensuring some kind of continuity of memory and having a public record. Especially with the government being very partisan these days, I think there’s value in the Internet Archive being an independent, not-for-profit that operates in the public interest.”
Mann added: “Without the Archive, we would lose decades of information about our society at a crucial turning point in its development, eroding trust in online systems and requiring educators, students and researchers to reconsider the way we do our work and share it with others.”
Libraries around the world were forced to shut their doors in the spring of 2020 during the start of the COVID-19 pandemic. Temple University Libraries was no exception. While the Philadelphia institution’s physical buildings were closed, librarians got creative about how to remain open to students, faculty and staff.
It was all about getting users connected with digital material. Library staff worked together to develop a simple new service—they added a “Get Help Finding a Digital Copy” button to their library catalog. When searching for resources in the library catalog, users can click on the button to request assistance finding a physical item in digital form, which creates a help ticket for library staff to field.
Within the first week of the button launch in April 2020, there were about 350 requests. Since then, the requests have surpassed 9,000.
“Our popular service helps users get access to resources they need quickly without economic hardship, and without having to travel to campus,” said Olivia Given Castello, a social science librarian and unit head in Temple Libraries’ Learning & Research Services department, who helped create the new service.
Temple relies on a variety of sources for its digital requests—including the Internet Archive. “It’s a valuable resource through which we help Temple library users find digital copies of inaccessible or inconveniently accessible items in our physical collection,” Given Castello said of the Internet Archive’s ebooks available through controlled digital lending (CDL).
For a large research university, Temple’s library collections’ budget is modest, and it has been challenging to keep up with the rapidly rising costs of journals and monographs given the static library budget in recent years. Additionally, there are ebooks that the libraries are unable to provide. Commercial publishers want to maximize profits gained from ebook sales to individual students, so unlike with print books, there are many ebook titles they refuse to sell to libraries, or refuse to sell with adequate user licensing. Based on past requests, we estimate that just under 20% of the digital items that Temple finds through its new service is in the Internet Archive collection, said Given Castello.
“Our library serves a diverse user community that is socio-economically disadvantaged relative to those at many other R1 U.S. research universities,” she said. The R1 designation indicates a university that grants doctoral degrees and has very high research activity; the list of 146 institutions so designated include the wealthiest private universities in the U.S. “Our users’ ability to access ebooks through the Internet Archive’s controlled digital lending eases financial strain on them.”
“The actions of commercial publishers have put the academic publishing model at risk, pushing the boundaries in ways that prevent libraries from serving the role in society that they need to” Given Castello said. “We’re trying to cope with that. Services like the one we set up, and controlled digital lending for borrowing ebooks from Internet Archive are important in this challenging landscape”
“For any university that has a student body with significant economic challenges, organizations like the Internet Archive are just so important in helping make knowledge and information accessible to everyone, regardless of their economic privilege,” Given Castello said. “Libraries exist, in part, so that getting access to the information you need is not dependent on your personal wealth. Inequity of information access is bad for individuals and for society as a whole.”
If legal action were to diminish or shut down CDL, Given Castello said it would be “detrimental” to the university’s service.
She added: “We can’t let commercial publishers’ short-term shareholder profits take such precedence that they get in the way of equitable access to information. Eventually, that will have a long-term negative impact on knowledge creation, which hurts our society, companies, and the economy as well. Sometimes you have to think of the greater good.”
At this year’s annual celebration in San Francisco, the Internet Archive team showcased its innovative projects and rallied supporters around its mission of “Universal Access to All Knowledge.”
“People need libraries more than ever,” said Brewster Kahle, founder of the Internet Archive, at the October 12 event. “We have a set of forces that are making libraries harder and harder to happen—so we have to do something more about it.”
Efforts to ban books and defund libraries are worrisome trends, Kahle said, but there are hopeful signs and emerging champions.
Watch the full live stream of the celebration
Among the headliners of the program was Connie Chan, Supervisor of San Francisco’s District 1, who was honored with the 2023 Internet Archive Hero Award. In April, she authored and unanimously passed a resolution at the San Francisco Board of Supervisors, backing the Internet Archive and the digital rights of all libraries.
Chan spoke at the event about her experience as a first-generation, low-income immigrant who relied on books in Chinese and English at the public library in Chinatown.
Watch Supervisor Chan’s acceptance speech
“Having free access to information was a critical part of my education—and I know I was not alone,” said Chan, who is a supporter of the Internet Archive’s role as a digital, online library. “The Internet Archive is a hidden gem…It is very critical to humanity, to freedom of information, diversity of information and access to truth…We aren’t just fighting for libraries, we are fighting for our humanity.”
Several users shared testimonials about how resources from the Internet Archive have enabled them to advance their research, fact-check politicians’ claims, and inspire their creative works. Content in the collection is helping improve machine translation of languages. It is preserving international television news coverage and Ukrainian memes on social media during the war with Russia.
Technology is changing things—some for the worse, but a lot for the better, said David McRaney, speaking via video to the audience in the auditorium at 300 Funston Ave. “And when [technology] changes things for the better, it’s going to expand the limited capabilities of human beings. It’s going to extend the reach of those capabilities, both in speed and scope,” he said. “It’s about a newfound freedom of mind, and time, and democratizing that freedom so everyone has access to it.”
Open Library developer Drini Cami explained how the Internet Archive is using artificial intelligence to improve access to its collections.
When a book is digitized, it used to be that photographs of pages had to be manually cropped by scanning operators. The Internet Archive recently trained a custom machine learning model to automatically suggest page boundaries—allowing staff to double the rate of process. Also, an open-source machine learning tool converts images into text, making it possible for books to be searchable, and for the collection to be available for bulk research, cross-referencing, text analysis, as well as read aloud to people with print disabilities.
“Since 2021, we’ve made 14 million books, documents, microfiche, records—you name it—discoverable and accessible in over 100 languages,” Cami said.
As AI technology advanced this year, Internet Archive engineers piloted a metadata extractor, a tool that automatically pulls key data elements from digitized books. This extra information helps librarians match the digitized book to other cataloged records, beginning to resolve the backlog of books with limited metadata in the Archive’s collection. AI is also being leveraged to assist in writing descriptions of magazines and newspapers—reducing the time from 40 to 10 minutes per item.
“Because of AI, we’ve been able to create new tools to streamline the workflows of our librarians and the data staff, and make our materials easier to discover, and work with patrons and researchers, Cami said. “With new AI capabilities being announced and made available at a breakneck rate, new ideas of projects are constantly being added.”
A recent Internet Archive hackathon explored the risks and opportunities of AI by using the technology itself to generate content, said Jamie Joyce, project lead with the organization’s Democracy’s Library project. One of the hackathon volunteers created an autonomous research agent to crawl the web and identify claims related to AI. With a prompt-based model, the machine was able to generate nearly 23,000 claims from 500 references. The information could be the basis for creating economic, environmental and other arguments about the use of AI technology. Joyce invited others to get involved in future hackathons as the Internet Archive continues to expand its AI potential.
Peter Wang, CEO and co-founder at Anaconda, said interesting kinds of people and communities have emerged around cultures of sharing. For example, those who participate in the DWeb community are often both humanists and technologists, he said, with an understanding about the importance of reducing barriers to information for the future of humanity. Wang said rather than a scarcity mindset, he embraces an abundant approach to knowledge sharing and applying community values to technology solutions.
“With information, knowledge and open-source software, if I make a project, I share it with someone else, they’re more likely to find a bug,” he said. “They might improve the documentation a little bit. They might adapt it for a novel use case that I can then benefit from. Sharing increases value.”
The Internet Archive’s Joy Chesbrough, director of philanthropy, closed the program by expressing appreciation for those who have supported the digital library, especially in these precarious times.
“We are one community tied together by the internet, this connected web of knowledge sharing. We have a commitment to an inclusive and open internet, where there are many winners, and where ethical approaches to genuine AI research are supported,” she said. “The real solution lies in our deep human connection. It inspires the most amazing acts of generosity and humanity.”
If you value the Internet Archive and our mission to provide “Universal Access to All Knowledge,” please consider making a donation today.
The Internet Archive team, its partners, and enthusiasts recently shared updates on how the organization is empowering research, ensuring preservation of vital materials, and extending access to knowledge to a growing number of grateful users.
The 2023 Library Leaders Forum, held virtually Oct. 4, featured snapshots of the many activities the organization is supporting on a global scale. Together, the efforts are making a difference in the lives of students, scholars, educators, entrepreneurs, journalists, public servants — anyone who needs trusted information without barriers.
“It’s important for us to recognize that the Internet Archive is a library. It’s a research library in the role that it plays, in the way that it works,” said Brewster Kahle, founder of the Internet Archive.
Watch the 2023 Library Leaders Forum:
With the rise of misinformation and new artificial intelligence technologies, reliable, digital information is needed more than ever, he said.
“This is going to be a challenging time in the United States when all of our institutions — the press, the election system, and libraries — are going to be tested,” Kahle said. “It’s time for us to make sure we stand up tall and be as useful to people in the United States and to people around the world who are having some of the same issues.”
To provide citizens everywhere with free access to government data, documents, records, the Archive launched Democracy’s Library last year. The collection now has 889,000 government publications, with many more items donated but yet to be organized, said the Archive’s Jamie Joyce at the forum. The goal is to digitize municipal, provincial, state and federal documents, along with datasets, research, records publications, and microfiche so they are searchable and accessible.
The Archive is taking a leadership role in harnessing the power of AI to make its information easier for users to find, Kahle added. It is also preserving state television newscasts from Russia and Iran, along with translations, to allow researchers to track trends in coverage.
Collections as data
Thomas Padilla, deputy director of data archiving and data services at the Internet Archive, reported on a project that examines how libraries can support responsible use of collections as data. Working in partnership with Iowa State University, University of Pennsylvania, and James Madison University, it is a community development effort for libraries, archives, museums and galleries to help researchers use new technology (text and data mining, machine learning) while also mitigating potential harm that can be generated by the process.
Through the effort, the Archive gave grants to 12 research libraries and cultural heritage organizations to explore questions around collections as data, Padilla said. As it became apparent that others around the world were grappling with similar issues, the project convened representatives from 60 organizations representing 18 countries earlier this year in Canada. The group agreed on core principles (The Vancouver Statement on Collections-As-Data) to use when providing machine actionable collection data to researchers. Next, the project expects to issue a roadmap for the broader international community in this space, Padilla said.
Helping libraries help publishers
The recent forum also featured digitization managers from the Internet Archive who are collaborating with partner libraries, including Tim Bigelow, Sophie Flynn-Piercy, Elizabeth MacLead, Andrea Mills and Jeff Sharpe. These librarians are at institutions big and small from the University of North Carolina at Chapel Hill to the Wellcome Trust in London, working with teams of professionally trained technicians to digitize collections.
One of those partnerships is taking an exciting new direction. The Boston Public Library’s partnership with the Archive began in 2007. Over the years, the team has completed digitization of the John Adams presidential library, Shakespeare’s First Folio (his 36 plays published in 1632), more than 17,000 government documents and the Houghton Mifflin trade book archival collection, according to Bigelow, the Northeast Regional digitization manager for the Archive.
The Houghton Mifflin collection includes 20,000 titles dating back to 1832, including some of the best known works in American fiction and children’s literature, such as books by Ralph Waldo Emerson and the Curious George series. The publisher gave BPL the entire physical collection for preservation (90% of which were out of print) and continues to add new titles as they are published. With the formal agreement of Houghton Mifflin, BPL and the Archive have been working together since 2017 to digitize every book—those in the public domain are completely readable and downloadable; those still in copyright are available through controlled digital lending (CDL).
As in Boston, many libraries have embraced CDL. However, commercial publishers have challenged the practice.
Lila Bailey, senior policy counsel for the Archive, provided an update at the forum on the Hachette v. Internet Archive lawsuit, in which the court ruled in favor of the publishers in limiting the use of CDL. The Archive filed an appeal in September. Bailey encouraged supporters to consider filing amicus briefs when the Archive’s case is expected to be reviewed by the appellate court.
For the Internet Archive—and libraries everywhere—to continue their work, the Archive is advocating for a legal infrastructure that ensures libraries can collect digital materials, preserve those materials in different formats, lend digital materials, and cooperate with other libraries.
“In our evolving digital society, will new technologies serve the public good, or only corporate interests?” Bailey asked in her remarks at the forum. “Libraries are on the front line of the fight to decide this question in favor of the public good. In order to maintain our age-old role as guardians of knowledge, we need our rights to own, lend and preserve books, as we all live more and more of our lives online.”
When the tech platforms promised a future of “connection,” they were lying. They said their “walled gardens” would keep us safe, but those were prison walls.
The platforms locked us into their systems and made us easy pickings, ripe for extraction. Twitter, Facebook and other Big Tech platforms hard to leave by design. They hold hostage the people we love, the communities that matter to us, the audiences and customers we rely on. The impossibility of staying connected to these people after you delete your account has nothing to do with technological limitations: it’s a business strategy in service to commodifying your personal life and relationships.
We can – we must – dismantle the tech platforms. In The Internet Con, Cory Doctorow explains how to seize the means of computation, by forcing Silicon Valley to do the thing it fears most: interoperate. Interoperability will tear down the walls between technologies, allowing users leave platforms, remix their media, and reconfigure their devices without corporate permission.
Interoperability is the only route to the rapid and enduring annihilation of the platforms. The Internet Con is the disassembly manual we need to take back our internet.
ABOUT THE AUTHOR CORY DOCTOROW is a science fiction author, activist and journalist. He is the author of many books, most recently RADICALIZED and WALKAWAY, science fiction for adults; HOW TO DESTROY SURVEILLANCE CAPITALISM, nonfiction about monopoly and conspiracy; IN REAL LIFE, a graphic novel; and the picture book POESY THE MONSTER SLAYER. His latest book is ATTACK SURFACE, a standalone adult sequel to LITTLE BROTHER. In 2020, he was inducted into the Canadian Science Fiction and Fantasy Hall of Fame. He works for the Electronic Frontier Foundation, is a MIT Media Lab Research Affiliate, is a Visiting Professor of Computer Science at Open University, a Visiting Professor of Practice at the University of North Carolina’s School of Library and Information Science and co-founded the UK Open Rights Group.
Book Talk: The Internet Con by Cory Doctorow Tuesday, October 31 @ 10am PT / 1pm ET Register now for the virtual discussion!
For Meghan Kwast, having access to the Internet Archive helps her library staff at California Lutheran University operate more efficiently to better serve faculty and students.
Budgets and staffing limitations have forced Kwast to come up with some creative strategies to meet the needs of users. This includes tapping into the digital resources available through the Internet Archive—especially when there are requests for items not in the university stacks.
“While Interlibrary Loan is available for most scholars, delivery times can vary from a few days to several weeks,” said Kwast, head of collection management services at Cal Lutheran in Thousand Oaks, California. “For researchers and scholars, this is time lost. Internet Archive saves them from these delays.”
The broader, virtual collection often includes niche subjects titles that the Cal Lutheran library doesn’t carry. Also, providing digital, rather than print materials, reduces ILL shipping costs and avoids problems with physical deliveries due to weather, Kwast added.
‘A USEFUL TOOL’
For librarians like Kwast, the collections at the Internet Archive are helpful beyond connecting patrons with research materials. The Archive has been a useful tool in a campus project to evaluate the diversity of the Cal Lutheran print monograph collection.
Cal Lutheran enrolls about 3,200 undergraduate and graduate students in their College of Arts and Sciences, Bachelor’s Degree for Professionals, Graduate School of Education, School of Management, Graduate School of Psychology, and Pacific Lutheran Theological Seminary programs. The university operates across southern California, with its main campus in Thousand Oaks and satellite centers in Oxnard, Santa Maria and Westlake Village. The campus demographics have changed since it was founded in 1959—now students come from 59 countries, and the university is designated as a Hispanic Serving institution.
Kwast said she wanted to be intentional about ensuring the library collection reflects the current student population. Last year, the library embarked on an audit of authors represented in its collection. As Kwast’s team began to evaluate the authors, they relied on the Archive’s search engine to find books digitally, rather than having to physically pull them off the shelves.
“Internet Archive makes that process faster and more efficient for us,” Kwast said. “Having these materials digitized makes this project achievable. It makes it possible for us to serve today’s students.”
It was evident early in the assessment that most titles were written by white, cisgender men. Now, about halfway through the review, Kwast said the library discovered just 2 percent of authors were Hispanic/Latino, yet about 40 percent of the Cal Lutheran population identifies as Hispanic/Latino.
“Some students from these communities are still trying to see themselves in higher education or in the field that they’re pursuing. The voices in our collection should reflect the voices on our campus, helping students see themselves in the research process and the sources they use,” Kwast said. “Where our collections are now is not reflective of where our community is.”
Based on what was discovered in the author assessment, this fiscal year Cal Lutheran created a new item in its library budget specifically for purchasing books written by authors who are diverse by race, ethnicity, gender, sexuality, and ability. The library also started a diverse authors table to highlight some of these works, Kwast noted.
EQUITABLE POINTS OF ACCESS
The Internet Archive’s vast collection of digital resources is more needed than ever, Kwast added. During the pandemic, with limited access to their buildings, the Archive helped Cal Lutheran keep their library users connected. “Electronic resources and digital access to information are critical for public safety,” Kwast said.
Today, public libraries still have barriers to accessing materials, Kwast noted. Many of them require patrons to come on-site after registering for a card to verify identification and residence. For those without a home or those who work during normal business hours, this is an insurmountable challenge. Internet Archive removes some of those obstacles by providing 24-7 remote access from any location.
Documents that should be publicly available, such as those produced by Congress and public universities, are instead hidden behind paywalls and layers of complication, Kwast said. Internet Archive helps provide equitable points of access to information, which is a necessity today, Kwast said, regardless of a user’s income or ability.
“As librarians and information professionals, we are dealing with an information landscape that a lot of folks take for granted,” Kwast said, as digital collections are constantly changing with licensing limitations. “Just because [access] is not a problem for you as an individual does not mean it isn’t a very real issue that other folks face in their daily lives.”
Today, the Internet Archive has submitted its appeal [PDF] in Hachette v. Internet Archive. As we stated when the decision was handed down in March, we believe the lower court made errors in facts and law, so we are fighting on in the face of great challenges. We know this won’t be easy, but it’s a necessary fight if we want library collections to survive in the digital age.
Statement from Brewster Kahle, founder and digital librarian of the Internet Archive: “Libraries are under attack like never before. The core values and library functions of preservation and access, equal opportunity, and universal education are being threatened by book bans, budget cuts, onerous licensing schemes, and now by this harmful lawsuit. We are counting on the appellate judges to support libraries and our longstanding and widespread library practices in the digital age. Now is the time to stand up for libraries.”
We will share more information about the appeal as it progresses.
To support our ongoing efforts, please donate as we continue this fight!
When Graeme Currie was working at a university, he went to the campus library for research and often lingered in the stacks just to enjoy the collection.
Now, as a freelance translator and editor operating remotely from a small town near Hamburg, Germany, Currie doesn’t have that same access. Without an institutional affiliation, he relies on materials in the Internet Archive for his work.
“It’s been vital for me because, at times, it’s the only way I can find what I need,” says Currie, 51, who is originally from Scotland. “For freelancers who are working from home without a library nearby and using obscure sources and out-of-print books, there’s nothing to replace the Internet Archive.”
Currie first heard about the Wayback Machine in the early 2000s as a means to check changes in websites. Then, he discovered other services that the Internet Archive provides including its audio and book library.
As he edits and translates academic books from German to English, Currie says he often has to check book citations—looking up page numbers and verifying passages. The virtual collection has been helpful as he researches a range of topics in the arts, social sciences and the humanities. Currie says he’s borrowed titles related to philosophy, criminality and global urban history, including the early history of tourism in Sicily.
Not only are many of the books hard to find, but Currie says logistically, they are difficult to obtain. Without the Internet Archive, Currie says he would have to wait weeks for interlibrary loans or try to contact the book authors, who are often unavailable.
“I simply could not do my job without access to a virtual library,” says Currie, who has been freelancing for about five years. “The Internet Archive is like having a university library on your desktop.”
Join experts from the library, copyright and information policy fields for a series of conversations exploring some of the most pressing issues facing libraries today: digital ownership and the future of library collections, the emergence of artificial intelligence, and the enduring value of research libraries in the digital age.
October 4 @ 10am PT – 11am PT Online via zoom – Register now
In our virtual session, you’ll hear from Internet Archive staff about our emerging library services and updates on existing efforts, including from our partners. How do libraries empower research in the 21st century? Join in our discussion!
October 12: In-Person
October 12 @ 8:30am – 4pm PT Internet Archive Headquarters @ 300 Funston, San Francisco
At our in-person session, we’ll gather together with the builders & dreamers to envision an equitable future for digital lending. We’ll reserve the afternoon for workshops and unconference breakouts so that you can choose your own conversation, or lead one yourself. Capacity will be capped at 60 attendees.Interested in attending?
Our library is still strong, growing, and serving millions of patrons. But the publishers’ attack on basic library practices continues.
Last Friday, the Southern District of New York court issued its final order in Hachette v. Internet Archive, thus bringing the lower court proceedings to a close. We disagree with the court’s decision and intend to appeal. In the meantime, however, we will abide by the court’s injunction.
The lawsuit only concerns our book lending program. The injunction clarifies that the Publisher Plaintiffs will notify us of their commercially available books, and the Internet Archive will expeditiously remove them from lending. Additionally, Judge Koeltl also signed an order in favor of the Internet Archive, agreeing with our request that the injunction should only cover books available in electronic format, and not the publishers’ full catalog of books in print. Separately, we have come to agreement with the Association of American Publishers (AAP), the trade organization that coordinated the original lawsuit with the four publishers, that the AAP will not support further legal action against the Internet Archive for controlled digital lending if we follow the same takedown procedures for any AAP-member publisher.
So what is the impact of these final orders on our library? Broadly, this injunction will result in a significant loss of access to valuable knowledge for the public. It means that people who are not part of an elite institution or who do not live near a well-funded public library will lose access to books they cannot read otherwise. It is a sad day for the Internet Archive, our patrons, and for all libraries.
Because this case was limited to our book lending program, the injunction does not significantly impact our other library services. The Internet Archive may still digitize books for preservation purposes, and may still provide access to our digital collections in a number of ways, including through interlibrary loan and by making accessible formats available to people with qualified print disabilities. We may continue to display “short portions” of books as is consistent with fair use—for example, Wikipedia references (as shown in the image above). The injunction does not affect lending of out-of-print books. And of course, the Internet Archive will still make millions of public domain texts available to the public without restriction.
Regarding the monetary payment, we can say that “AAP’s significant attorney’s fees and costs incurred in the Action since 2020 have been substantially compensated by the Monetary Judgement Payment.”
Thanks to your continued support, our library is still strong, growing, and serving millions of patrons.
Libraries are going to have to fight to be able to buy, preserve, and lend digital books outside of the confines of temporary licensed access. We deeply appreciate your support as we continue this fight!