Internet Archive to Honor Supervisor Connie Chan with 2023 Hero Award

Announced today, Connie Chan, Supervisor of San Francisco’s District 1, will receive the 2023 Internet Archive Hero Award. Supervisor Chan will be presented the award on stage at next week’s evening celebration at the Internet Archive.

The Internet Archive Hero Award is an annual award that recognizes those who have exhibited leadership in making information available for digital learners all over the world. Previous recipients have included public domain advocate Carl Malamud, librarians Kanta Kapoor and Lisa Radha Vohra, copyright expert Michelle Wu, the Biodiversity Heritage Library, and the Grateful Dead.

In April, Supervisor Chan, whose district includes the Internet Archive, authored and unanimously passed a resolution at the San Francisco Board of Supervisors, backing the Internet Archive and the digital rights of all libraries. “At a time when we are seeing an increase in censorship and book bans across the country, we must move to preserve free access to information,” said Supervisor Chan, about the resolution. “I am proud to stand with the Internet Archive, our Richmond District neighbor, and digital libraries throughout the United States.”

Supervisor Connie Chan with Internet Archive’s Brewster Kahle and digital library supporters rally for the digital rights of libraries on the steps of San Francisco City Hall, April 19, 2023.

Many thanks to Supervisor Chan for being a strong advocate for libraries, and for making San Francisco the first municipality to codify the importance of digital libraries and controlled digital lending in a resolution. For this fearless act of standing with libraries, the Internet Archive is proud to honor Supervisor Connie Chan with the 2023 Internet Archive Hero Award.

Join us next week on Thursday, October 12 at 7pm PT, as Supervisor Chan accepts the award live on stage during our evening celebration. Tickets are available for in-person attendance or the livestream.

Leveraging Technology to Scale Library Research Support: ARCH, AI, and the Humanities

Kevin Hegg is Head of Digital Projects at James Madison University Libraries (JMU). Kevin has held many technology positions within JMU Libraries. His experience spans a wide variety of technology work, from managing computer labs and server hardware to developing a large open-source software initiative. We are thankful to Kevin for taking time to talk with us about his experience with ARCH (Archives Research Compute Hub), AI, and supporting research at JMU

Thomas Padilla is Deputy Director, Archiving and Data Services. 

Thomas: Thank you for agreeing to talk more about your experience with ARCH, AI, and supporting research. I find that folks are often curious about what set of interests and experiences prepares someone to work in these areas. Can you tell us a bit about yourself and how you began doing this kind of work?

Kevin: Over the span of 27 years, I have held several technology roles within James Madison University (JMU) Libraries. My experience ranges from managing computer labs and server hardware to developing a large open-source software initiative adopted by numerous academic universities across the world. Today I manage a small team that supports faculty and students as they design, implement, and evaluate digital projects that enhance, transform, and promote scholarship, teaching, and learning. I also co-manage Histories Along the Blue Ridge which hosts over 50,000 digitized legal documents from courthouses along Virginia’s Blue Ridge mountains.

Thomas: I gather that your initial interest in using ARCH was to see what potential it afforded for working with James Madison University’s Mapping Black Digital and Public Humanities project. Can you introduce the project to our readers? 

Kevin: The Mapping the Black Digital and Public Humanities project began at JMU in Fall 2022. The project draws inspiration from established resources such as the Colored Convention Project and the Reviews in Digital Humanities journal. It employs Airtable for data collection and Tableau for data visualization. The website features a map that not only geographically locates over 440 Black digital and public humanities projects across the United States but also offers detailed information about each initiative. The project is a collaborative endeavor involving JMU graduate students and faculty, in close alliance with JMU Libraries. Over the past year, this interdisciplinary team has dedicated hundreds of hours to data collection, data visualization, and website development.

Mapping the Black Digital and Public Humanities, project and organization type distribution

The project has achieved significant milestones. In Fall 2022, Mollie Godfrey and Seán McCarthy, the project leaders, authored, “Race, Space, and Celebrating Simms: Mapping Strategies for Black Feminist Biographical Recovery“, highlighting the value of such mapping projects. At the same time, graduate student Iliana Cosme-Brooks undertook a monumental data collection effort. During the winter months, Mollie and Seán spearheaded an effort to refine the categories and terms used in the project through comprehensive research and user testing. By Spring 2023, the project was integrated into the academic curriculum, where a class of graduate students actively contributed to its inaugural phase. Funding was obtained to maintain and update the database and map during the summer.

Looking ahead, the project team plans to present their work at academic conferences and aims to diversify the team’s expertise further. The overarching objective is to enhance the visibility and interconnectedness of Black digital and public humanities projects, while also welcoming external contributions for the initiative’s continual refinement and expansion.

Thomas: It sounds like the project adopts a holistic approach to experimenting with and integrating the functionality of a wide range of tools and methods (e.g., mapping, data visualization). How do you see tools like ARCH fitting into the project and research services more broadly? What tools and methods have you used in combination with ARCH?

Kevin: ARCH offers faculty and students an invaluable resource for digital scholarship by providing expansive, high-quality datasets. These datasets enable more sophisticated data analytics than typically encountered in undergraduate pedagogy, revealing patterns and trends that would otherwise remain obscured. Despite the increasing importance of digital humanities, a significant portion of faculty and students lack advanced coding skills. The advent of AI-assisted coding platforms like ChatGPT and GitHub CoPilot has democratized access to programming languages such as Python and JavaScript, facilitating their integration into academic research.

For my work, I employed ChatGPT and CoPilot to further process ARCH datasets derived from a curated sample of 20 websites focused on Black digital and public humanities. Utilizing PyCharm—an IDE freely available for educational purposes—and the CoPilot extension, my coding efficiency improved tenfold.

Next, I leveraged ChatGPT’s Advanced Data Analysis plugin to deconstruct visualizations from Stanford’s Palladio platform, a tool commonly used for exploratory data visualizations but lacking a means for sharing the visualizations. With the aid of ChatGPT, I developed JavaScript-based web applications that faithfully replicate Palladio’s graph and gallery visualizations. Specifically, I instructed ChatGPT to employ the D3 JavaScript library for ingesting my modified ARCH datasets into client-side web applications. The final products, including HTML, JavaScript, and CSV files, were made publicly accessible via GitHub Pages (see my graph and gallery on GitHub Pages)

Black Digital and Public Humanities websites, graph visualization

In summary, the integration of Python and AI-assisted coding tools has not only enhanced my use of ARCH datasets but also enabled the creation of client-side web applications for data visualization.

Thomas: Beyond pairing ChatGPT with ARCH, what additional uses are you anticipating for AI-driven tools in your work?

Kevin: AI-driven tools have already radically transformed my daily work. I am using AI to reduce or even eliminate repetitive, mindless tasks that take tens or hundreds of hours. For example, as part of the Mapping project, ChatGPT+ helped me transform an AirTable with almost 500 rows and two dozen columns into a series of 500 blog posts on a WordPress site. ChatGPT+ understands the structure of a WordPress export file. After a couple of hours of iterating through my design requirements with ChatGPT, I was able to import 500 blog posts into a WordPress website. Without this intervention, this task would have required over a hundred hours of tedious copying and pasting. Additionally, we have been using AI-enabled platforms like Otter and Descript to transcribe oral interviews.

I foresee AI-driven tools playing an increasingly pivotal role in many facets of my work. For instance, natural language processing could automate the categorization and summarization of large text-based datasets, making archival research more efficient and our analyses richer. AI can also be used to identify entities in large archival datasets. Archives hold a treasure trove of artifacts waiting to be described and discovered. AI offers tools that will supercharge our construction of finding aids and item-level metadata.  

Lastly, AI could facilitate more dynamic and interactive data visualizations, like the ones I published on GitHub Pages. These will offer users a more engaging experience when interacting with our research findings. Overall, the potential of AI is vast, and I’m excited to integrate more AI-driven tools into JMU’s classrooms and research ecosystem.

Thomas: Thanks for taking the time Kevin. To close out, whose work would you like people to know more about? 

Kevin: Engaging in Digital Humanities (DH) within the academic library setting is a distinct privilege, one that requires a collaborative ethos. I am fortunate to be a member of an exceptional team at JMU Libraries, a collective too expansive to fully acknowledge here. AI has introduced transformative tools that border on magic. However, loosely paraphrasing Immanuel Kant, it’s crucial to remember that technology devoid of content is empty. I will use this opportunity to spotlight the contributions of three JMU faculty whose work celebrates our local community and furthers social justice.

Mollie Godfrey (Department of English) and Seán McCarthy (Writing, Rhetoric, and Technical Communication) are the visionaries behind two inspiring initiatives: the Mapping Project and the Celebrating Simms Project. The latter serves as a digital, post-custodial archive honoring Lucy F. Simms, an educator born into enslavement in 1856 who impacted three generations of young students in our local community. Both Godfrey and McCarthy have cultivated deep, lasting connections within Harrisonburg’s Black community. Their work strikes a balance between celebration and reparation. Collaborating with them has been as rewarding as it is challenging.

Gianluca De Fazio (Justice Studies) spearheads the Racial Terror: Lynching in Virginia project, illuminating a grim chapter of Virginia’s past. His relentless dedication led to the installation of a historical marker commemorating the tragic lynching of Charlotte Harris. De Fazio, along with colleagues, has also developed nine lesson plans based on this research, which are now integrated into high school curricula. My collaboration with him was a catalyst for pursuing a master’s degree in American History.

Racial Terror: Lynching in Virginia

Both the Celebrating Simms and Racial Terror projects are highlighted in the Mapping the Black Digital and Public Humanities initiative. The privilege of contributing to such impactful projects alongside such dedicated individuals has rendered my extensive tenure at JMU both meaningful and, I hope, enduring.

Book Talk: The Internet Con by Cory Doctorow

Join us for a virtual book talk with author Cory Doctorow about THE INTERNET CON, the disassembly manual we need to take back our internet.

REGISTER NOW

When the tech platforms promised a future of “connection,” they were lying. They said their “walled gardens” would keep us safe, but those were prison walls.

The platforms locked us into their systems and made us easy pickings, ripe for extraction. Twitter, Facebook and other Big Tech platforms hard to leave by design. They hold hostage the people we love, the communities that matter to us, the audiences and customers we rely on. The impossibility of staying connected to these people after you delete your account has nothing to do with technological limitations: it’s a business strategy in service to commodifying your personal life and relationships.

We can – we must – dismantle the tech platforms. In The Internet Con, Cory Doctorow explains how to seize the means of computation, by forcing Silicon Valley to do the thing it fears most: interoperate. Interoperability will tear down the walls between technologies, allowing users leave platforms, remix their media, and reconfigure their devices without corporate permission.

Interoperability is the only route to the rapid and enduring annihilation of the platforms. The Internet Con is the disassembly manual we need to take back our internet.

REGISTER NOW

ABOUT THE AUTHOR
CORY DOCTOROW is a science fiction author, activist and journalist. He is the author of many books, most recently RADICALIZED and WALKAWAY, science fiction for adults; HOW TO DESTROY SURVEILLANCE CAPITALISM, nonfiction about monopoly and conspiracy; IN REAL LIFE, a graphic novel; and the picture book POESY THE MONSTER SLAYER. His latest book is ATTACK SURFACE, a standalone adult sequel to LITTLE BROTHER. In 2020, he was inducted into the Canadian Science Fiction and Fantasy Hall of Fame. He works for the Electronic Frontier Foundation, is a MIT Media Lab Research Affiliate, is a Visiting Professor of Computer Science at Open University, a Visiting Professor of Practice at the University of North Carolina’s School of Library and Information Science and co-founded the UK Open Rights Group.

Book Talk: The Internet Con by Cory Doctorow
Tuesday, October 31 @ 10am PT / 1pm ET
Register now for the virtual discussion!

Academic Librarian Leans on Internet Archive for Access and Analysis

For Meghan Kwast, having access to the Internet Archive helps her library staff at California Lutheran University operate more efficiently to better serve faculty and students.  

Meghan Kwast, head of collection management services, California Lutheran University

Budgets and staffing limitations have forced Kwast to come up with some creative strategies to meet the needs of users. This includes tapping into the digital resources available through the Internet Archive—especially when there are requests for items not in the university stacks.

“While Interlibrary Loan is available for most scholars, delivery times can vary from a few days to several weeks,” said Kwast, head of collection management services at Cal Lutheran in Thousand Oaks, California. “For researchers and scholars, this is time lost. Internet Archive saves them from these delays.”

The broader, virtual collection often includes niche subjects titles that the Cal Lutheran library doesn’t carry. Also, providing digital, rather than print materials, reduces ILL shipping costs and avoids problems with physical deliveries due to weather, Kwast added.

‘A USEFUL TOOL’

For librarians like Kwast, the collections at the Internet Archive are helpful beyond connecting patrons with research materials. The Archive has been a useful tool in a campus project to evaluate the diversity of the Cal Lutheran print monograph collection.

Cal Lutheran enrolls about 3,200 undergraduate and graduate students in their College of Arts and Sciences, Bachelor’s Degree for Professionals, Graduate School of Education, School of Management, Graduate School of Psychology, and Pacific Lutheran Theological Seminary programs. The university operates across southern California, with its main campus in Thousand Oaks and satellite centers in Oxnard, Santa Maria and Westlake Village. The campus demographics have changed since it was founded in 1959—now students come from 59 countries, and the university is designated as a Hispanic Serving institution.

Kwast said she wanted to be intentional about ensuring the library collection reflects the current student population. Last year, the library embarked on an audit of authors represented in its collection. As Kwast’s team began to evaluate the authors, they relied on the Archive’s search engine to find books digitally, rather than having to physically pull them off the shelves.

“Internet Archive makes that process faster and more efficient for us,” Kwast said. “Having these materials digitized makes this project achievable. It makes it possible for us to serve today’s students.”

“The voices in our collection should reflect the voices on our campus, helping students see themselves in the research process and the sources they use.”

Meghan Kwast, head of collection management services, California Lutheran University

It was evident early in the assessment that most titles were written by white, cisgender men. Now, about halfway through the review, Kwast said the library discovered just 2 percent of authors were Hispanic/Latino, yet about 40 percent of the Cal Lutheran population identifies as Hispanic/Latino.

 “Some students from these communities are still trying to see themselves in higher education or in the field that they’re pursuing. The voices in our collection should reflect the voices on our campus, helping students see themselves in the research process and the sources they use,” Kwast said. “Where our collections are now is not reflective of where our community is.”

 Based on what was discovered in the author assessment, this fiscal year Cal Lutheran created a new item in its library budget specifically for purchasing books written by authors who are diverse by race, ethnicity, gender, sexuality, and ability. The library also started a diverse authors table to highlight some of these works, Kwast noted.

EQUITABLE POINTS OF ACCESS

The Internet Archive’s vast collection of digital resources is more needed than ever, Kwast added. During the pandemic, with limited access to their buildings, the Archive helped Cal Lutheran keep their library users connected. “Electronic resources and digital access to information are critical for public safety,” Kwast said.

Today, public libraries still have barriers to accessing materials, Kwast noted. Many of them require patrons to come on-site after registering for a card to verify identification and residence. For those without a home or those who work during normal business hours, this is an insurmountable challenge. Internet Archive removes some of those obstacles by providing 24-7 remote access from any location.

Documents that should be publicly available, such as those produced by Congress and public universities, are instead hidden behind paywalls and layers of complication, Kwast said. Internet Archive helps provide equitable points of access to information, which is a necessity today, Kwast said, regardless of a user’s income or ability.

“As librarians and information professionals, we are dealing with an information landscape that a lot of folks take for granted,” Kwast said, as digital collections are constantly changing with licensing limitations. “Just because [access] is not a problem for you as an individual does not mean it isn’t a very real issue that other folks face in their daily lives.”

Brewster Goes to Washington – Congressional Hearing on the Copyright Office Modernization Committee

A good day in Washington.   After two years of being on the Copyright Office Modernization Committee, helping advise the Copyright Office on their new registration and recordation process, a republican and a democrat from the House of Representatives held a hearing to ask questions of committee members. It was such a refreshing scene because it was bipartisan, they knew the issues, and they were spending time finding out what we suggested.

This all matters because the Copyright Office is moving to filings being digital, which is an improvement, and because it could make way for efficient submissions of digital files.   This would be a major way for the Library of Congress to get copies of books they would own, preserve, and make somewhat accessible.

Another attendee said they had gone to congressional meetings for 30 years and this one had the most engagement of any of them.  A good day in Washington, indeed.

Internet Archive is a Digital Oasis for Book and Music Lovers on Remote Vermont Island

Image: islelamotte.us

Living in the middle of Lake Champlain in Vermont, Eleanor Martinez says she enjoys the beautiful scenery all around, especially the fall foliage. It’s been an idyllic place to retire, but there is one thing she misses: a public library.

Martinez, and her husband, Sid, live on Isle La Motte, which is 7 miles long and 2 miles wide, accessible by one bridge and has a population of 400. There is a library on the island, but it is private, and open by appointment only. The public libraries in nearby towns have limited collections.

“The Internet Archive has been a lifesaver,” says Martinez, who discovered the online collection about two years ago. She’s a regular user of the virtual library, checking out books and music on her laptop in the comfort of her rural home.

The wooded, nine-acre property was a draw for the retirees, who relocated in 2018, but it is remote. In the winter, it can sometimes take more than a week for a snowplow to reach their gravel road. Martinez, 66, lived most of her life in more urban areas in California and Minnesota where she enjoyed large, metropolitan public libraries nearby. The Internet Archive has provided access to materials she would not otherwise be able to enjoy in her small town.

Martinez has tapped into the Internet Archive to check out books, from “The Modern Temper” by Joseph Wood Krutch to “The Theory of the Leisure Class” by Thorsten Veblen. She enjoys vintage cookbooks, books on gardening, knitting and poetry.

Martinez found Down Beat magazines dating back to the 1930s about the jazz and blues scene. She’s also discovered music not available elsewhere on vinyl or CD.

“I was able to check out 33-1/3 records and 78s, too,” Martinez said. “This is a boon to those of us who don’t have access to large collections of records, and for those of us who are low-income and living on a fixed income.”

One of her favorite music items is “In a Clock Store,” a novelty recording from 1907 that includes sounds from a clock in the background. “I’m listening to something that is from a time when my grandfather would have been a teenager,” she said. “It was a different world.”

Another copy of that 78rpm recording shines a light on the importance of digitizing and preserving recordings on the obsolete medium—notes made by the audio engineer at the time of digitization indicate that the second side of this record wasn’t able to be preserved “due to physical condition of disc.”

After a pause, Martinez added a final thought: “The Internet Archive has just about everything I’ve been looking for—even things that are pretty obscure. It’s amazing.”

A Quarter In, A Quarter-Million Out: 10 Years of Emulation at Internet Archive

10 years ago, the Internet Archive made an announcement: It was possible for anyone with a reasonably powerful computer running a modern browser to have software emulated, running as it did back when it was fresh and new, with a single click. Now, a decade later, we have surpassed 250,000 pieces of software running at the Archive and it might be a great time to reflect on how different the landscape has become since then.

Anyone can come up with an idea, and the idea of taking the then-quite-mature Javascript language, universally inside all major browsers and having it run complicated programs was not new.

With the rise of a cross-compiler named Emscripten, the idea of taking rather-complicated programs written in other languages and putting them into Javascript was kind of new.

That all being the case, the idea of taking a by-then 20-year-old super-emulator called MAME, using Emscripten to cross-compile it into Javascript, and then running the resulting code in the browser at Internet Archive to make computers and consoles run, was very new.

It was also, objectively, madness.

Well over a thousand hours of work went into the project from a very wide range of volunteers who poured galactic amounts of time into making the project a reality. Along the way, changes were made to Emscripten, the Firefox, Internet Explorer, and Chrome Browsers, MAME, and the Internet Archive’s codebase to accommodate this dream.

It was announced in the Fall of 2013, well over a year after the project started.

Additional announcements came with each expansion of the types of software being emulated, and it became huge news, leading to millions of visitors coming to try this it out.

By any measure, a quarter of a million items later, it has been a huge, huge success.

The rest of this blog entry is pretty pictures and beautiful links, but before we move on, it’s once again important to highlight people who provided major contributions, including Justin Kerk, Daniel Brooks, Vitorio Miliano, James Baicoianu, John Vilk, Tracey Jaquith, Jim Nelson, and Hank Bromley. Dozens more developers spent evenings, weekends, and months to make this system happen. Thank you to everyone involved.

The joy of watching a computer boot up in the browser was (and is) a miraculous feeling. And after that feeling, comes a quick comfort with the situation: Of course we can run computers inside our browsers. Of course we can make most anything we want run in these browser-based computers. What’s next?

Within a short time after our 2013 announcement, the archive was running hundreds, then thousands of individual programs, floppy disks and even cassette-based software from computing’s past.

As emulators besides MAME were added, it became necessary to create a framework for a versatile and understandable method to load emulators. This framework eventually got a name: THE EMULARITY.

In the decade of the Emularity’s existence, the Archive’s software emulation has expanded into directions nobody could have fully expected to work when the project started.

Here are some highlights:

Hypercard Stacks for the Apple Macintosh, a critical period in content creation and computer information architecture, have been restored to easy access, surpassing thousands of hypercards to try instantly.

Plastic Electronic Handheld Games, once a staple of toys in the 1970s through the 1990s, have been able to live once again as, including the original housing that these simple (and not so simple) machines relied on instead of graphics.

As the uploads veered into the many thousands, it became more and more difficult for new adventurous users to figure out what, if any, software was at the archive to check out. This has led to specialized collections focused on one type of program, like the Computer Chess Club. People can use these collections as gateways to quickly testing the waters of now-decades of computer and software history, seeing the turns and twists of countless lost companies and individuals who squeezed every last bit of wonder and spectacle out of these underpowered boxes.

The Calculator Drawer took things to a new level when entire calculators could be emulated, including their unique looks, accompanied by a “drawer of manuals” to browse through if you had to learn (or re-learn) how to make these machines run.


The Woz-a-Day Collection, in many ways, represents the logical end for the role that the Internet Archive’s Emularity can provide for software history. The project is the effort of the software historian 4am, who has spent years on its maintenance. Methodically preserving Apple II software from the original floppy disks, incorporating every last bit and track of the disks with no modifications, and allowing the best fidelity of these programs as they originally were offered, 4am allows some of these programs to be playable for the first time in decades.

With each new batch of added emulated systems and machines have come a greater and greater pool of users, toying with historical software or playing long-forgotten or never-remembered games with a new level of convenience and willingness to try them out.

At this milestone of a decade into this experimental adventure, Internet Archive continues to grow its collection, to test and automate the functioning of both uploaded and self-maintained collections of software, and to provide a vast and necessary service in the preservation of historical software.

And, of course, we all get to enjoy some really great games.

Here’s to what another ten years will bring us!

IMLS National Leadership Grant Supports Expansion of the ARCH Computational Research Platform

In June, we announced the official launch of Archives Research Compute Hub (ARCH) our platform for supporting computational research with digital collections. The Archiving & Data Services group at IA has long provided computational research services via collaborations, dataset services, product features, and other partnerships and software development. In 2020, in partnership with our close collaborators at the Archives Unleashed project, and with funding from the Mellon Foundation, we pursued cooperative technical and community work to make text and data mining services available to any institution building, or researcher using, archival web collections. This led to the release of ARCH, with more than 35 libraries and 60 researchers and curators participating in beta testing and early product pilots. Additional work supported expanding the community of scholars doing computational research using contemporary web collections by providing technical and research support to multi-institutional research teams.

We are pleased to announce that ARCH recently received funding from the Institute of Museum and Library Services (IMLS), via their National Leadership Grants program, supporting ARCH expansion. The project, “Expanding ARCH: Equitable Access to Text and Data Mining Services,” entails two broad areas of work. First, the project will create user-informed workflows and conduct software development that enables a diverse set of partner libraries, archives, and museums to add digital collections of any format (e.g., image collections, text collections) to ARCH for users to study via computational analysis. Working with these partners will help ensure that ARCH can support the needs of organizations of any size that aim to make their digital collections available in new ways. Second, the project will work with librarians and scholars to expand the number and types of data analysis jobs and resulting datasets and data visualizations that can be created using ARCH, including allowing users to build custom research collections that are aggregated from the digital collections of multiple institutions. Expanding the ability for scholars to create aggregated collections and run new data analysis jobs, potentially including artificial intelligence tools, will enable ARCH to significantly increase the type, diversity, scope, and scale of research it supports.

Collaborators on the Expanding ARCH project include a set of institutional partners that will be closely involved in guiding functional requirements, testing designs, and using the newly-built features intended to augment researcher support. Primary institutional partners include University of Denver, University of North Carolina at Chapel Hill, Williams College Museum of Art, and Indianapolis Museum of Art, with additional institutional partners joining in the project’s second year.

Thousands of libraries, archives, museums, and memory organizations work with Internet Archive to build and make openly accessible digitized and born-digital collections. Making these collections available to as many users in as many ways as possible is critical to providing access to knowledge. We are thankful to IMLS for providing the financial support that allows us to expand the ARCH platform to empower new and emerging types of access and research.

Internet Archive + IIIF

Making IIIF Official at the Internet Archive

A joint blog post between the Internet Archive and the IIIF Community

Summary

After eight years hosting an experimental IIIF service for public benefit, the Internet Archive is moving forward with important steps to make its International Image Interoperability Framework (IIIF) service official. Each year, the Internet Archive receives feedback from friends and partners asking about our long-term plans for supporting IIIF. In response, the Internet Archive is announcing an official IIIF service which aims to increase the resourcing and reliability of the Internet Archive’s IIIF service, upgrade the service to utilize the latest version 3.0 of the IIIF specification, and graduate the service from the iiif.archivelab.org domain to iiif.archive.org. The upgrade also expands the Internet Archive’s IIIF support beyond images to also include audio, movies, and collections — enabling deep zoom on high-resolution images, comparative item analysis, portability across media players, annotation support, and more.

An image visually detailing each step of how a URL for a conceptual IIIF service run by "example.org" may be used to crop, zoom, rotate, and color correct an image and then download the result as a jpeg. Image from https://iiif.io/get-started/how-iiif-works

Background

In 2015, a team of enthusiastic Internet Archive volunteers from a group called Archive Labs implemented an experimental IIIF service to give partners and patrons new ways of using Archive.org images and texts. You can read more about the project’s origins and ambitions in this 2015 announcement blog post. The initial service provided researchers with an easy, standardized way to crop and reference specific regions of archive.org images. (Maybe you can tell whose eyes these are?) By making Internet Archive images and texts IIIF-compatible, they may be opened using any number of compatible IIIF viewer apps, each offering their own advantages and unique features. For instance, Mirador is a “multi-up” viewer that makes it easy for researchers to view different images side by side and then zoom into or annotate different areas of interest within each image.

Since its launch more than seven years ago, the IIIF labs service has received millions of requests by more than 15 universities and GLAM (galleries, libraries, archives and museums) organizations across the globe, including University of Texas, UCD Digital Library, Havana University, Digital Library of Georgia, BioStor, Emory University, and McGill University. In this time, the broader IIIF ecosystem itself has blossomed to include hundreds of participating institutions. For all its benefits, the labs IIIF service has been considered “unofficial,” hosted on the separate archivelab.org domain, and several partners have voiced interest in the Internet Archive adopting it as an officially supported service. Today, several members of the IIIF community are collaborating with the Internet Archive to make this happen. 

Josh Hadro, managing director of the IIIF Consortium (IIIF-C), sees the Internet Archive as filling a critical role “in serving the average Internet user who may not benefit from the same access to or affiliation with infrastructure offered by traditional research institutions.” The IIIF-C promotes interoperability as a core element of IIIF: the ability to streamline access to information and make cultural materials as easy to use and reuse as possible. Because the Internet Archive enables any patron to upload eligible materials, everyone has the opportunity to benefit from IIIF’s capabilities. IIIF-C counts the Internet Archive as a natural ally because of its ongoing support of open collections delivered via open web standards and protocols. With this project, IIIF-C hopes to make the Internet Archive a go-to resource online that facilitates IIIF work for students and scholars unaffiliated with the kinds of institutions that historically have provided IIIF infrastructure. This is an essential step toward a strategic goal of lowering barriers to IIIF usage and adoption worldwide.

In service of this outcome, the Internet Archive has teamed up with a number of IIIF community members to officialize and upgrade the IIIF service in order to make the best use of the new capabilities introduced into the IIIF specifications in recent years.

In the coming weeks, we’ll share more details about the IIIF improvements that will become available to users of the Internet Archive. First, we want to lay out our current plan for the update, including backwards compatibility affordances, to ensure existing consumers have the information they need to successfully migrate from the unofficial to the official IIIF API.

Thanks

Both the original IIIF labs service the Internet Archive has been running, as well as the new upcoming official IIIF service, wouldn’t have been possible without huge support from volunteers within the IIIF community and Internet Archive staff. A big thank you to the following folks who are making this effort to bring IIIF into production possible:

Stay tuned for more details on the new functionality soon, and if you have questions or would like to get involved in helping us test the new setup, get in touch with IIIF-C at staff@iiif.io. For more updates, including September 13 IIIF Consortium community call announcing the Internet Archive’s IIIF service, please visit the IIIF community calendar at https://iiif.io/community/#calendar.

Technical Notes & FAQs for Partners

This technical section is intended for partners who currently rely on the iiif.archivelab.org IIIF API who may be seeking further details on how these changes might affect them.

What is changing? Previously, partners accessed the Internet Archive’s IIIF labs API from the  iiif.archivelab.org domain. As part of the effort to graduate from labs to production, the IIIF API will move to the iiif.archive.org domain. Because we don’t want to break any of the amazing projects and exhibits that patrons have created using the existing IIIF capabilities on the archivelab.org domain, we’re migrating the API in phases. 

Phasing migration. The first phase will introduce a new and improved, official Internet Archive IIIF 3.0 service on the iiif.archive.org subdomain. The unofficial, legacy service will continue to run on the iiif.archivelab.org for a grace period, allowing partners to migrate. Once we’ve gathered enough data to be confident requests are being satisfactorily fulfilled by the new official service, the legacy iiif.archivelab.org service will be “sunset” and any request to it will redirect to use the official iiif.archive.org service. At this point, all requests for IIIF manifests and IIIF images (whether to iiif.archivelab.org or iiif.archive.org) will default to the latest 3.0 version of the IIIF APIs and be answered by iiif.archive.org. A specifiable “version” endpoint will be available for consumers whose applications require manifests and images to be served using the IIIF v2.0 legacy format. More details, examples, and technical documentation will be made available on this topic in the coming weeks and will eventually be accessible from iiif.archive.org.

Possible Breaking Changes.
1. When the iiif.archivelab.org service was originally launched, iiif.archive.org was set up to redirect to iiif.archivelab.org as a convenience. Regrettably, during the first phase of development, iiif.archive.org will no longer be a redirect for iiif.archivelab.org and instead will run the new official IIIF service. As a result, partners whose code or applications reference iiif.archive.org (expecting it to redirect to iiif.archivelab.org) will experience a breaking change and will need to either update their references to explicitly refer to the legacy “iiif.archivelab.org” service, or update their code to use the Internet Archive’s new official iiif.archive.org service. As far as we can tell, we’re unaware of partners currently referencing “iiif.archive.org”  within public projects on Github or Gitlab and so we hope no one is affected. Still, we want to give fair warning here. For those starting a new project and looking to use the Internet Archive’s IIIF offerings today, we strongly recommend using the iiif.archive.org endpoint.
2. Some partners migrating from the v2 to v3 API who have been saving annotations may also experience a breaking changes because canvas and manifest identifiers for version 3 are necessarily different from version 2 identifiers. We will be doing our best, for the time being, to ensure version 2.0 manifests remain accessible from the archivelab.org address (via redirects) and will retain the iiif.archivelab.org canvas identifiers.

Student’s Use of Internet Archive Expands from High School to College

Rachel Simmons first used the Wayback Machine for research projects at her Sacramento, California, high school. Now a senior at UCLA, she’s discovered even more ways to find material not available elsewhere.  

Rachel Simmons

Simmons, whose mother and grandmother were both librarians, is an applied math major with a minor in film, television and digital media. As she looks up information about media figures or needs to find a rare film, she says the Internet Archive’s digital collection has been an invaluable resource.

“It’s really great to have access to information for anyone to use from their home computer,” Simmons says. “I don’t physically have to go into a library. If I’m working on something late at night, it’s convenient.”  

When taking a class on American film history last year, she was assigned to research a famous actor; she chose Peter Lorre.

“I’m a big fan of classic horror films and he’s an icon whose legacy has continued long past his career,” she said. “I just wanted to learn more about him and what people thought of him at the time.”

To find those contemporary views of Lorre’s work, Simmons turned to the fan magazine collection in the Archive’s Media History Digital Library. There she found interviews with the actor and reviews of his movies from the 1930s. Despite appearing as a mysterious figure on film, Simmons says she learned the interviews present him as a conventional, regular guy. She gained even more insight through the published fan letters in the magazines. “I found it really interesting that I was reading these letters from almost one hundred years ago,” Simmons said.

For another UCLA course, Simmons tapped into the Internet Archive to view silent German films that were discussed in class. While she was studying, Simmons found herself stumbling onto trailers for other films, which led her to checking out similar movies for fun after her projects were complete. Many of the more obscure titles that interest her are not available on streaming services, she notes.

Simmons says she tells others about the resources available through the Internet Archive—including her family of librarians.