Category Archives: News

Regenerating Community in the Rainforest at DWeb+Coolab Camp Brazil

Posted on October 23, 2023 by kev nguyen

It’s 10 am and I’ve already been traveling for 20 hours — two planes and a long layover from California on my way to Ubatuba, a town 4-hours northeast of Sao Paulo, Brazil. I feel nervous. I’ve never been to Brazil before but the bus ride is serene. The city buildings give way to lush rainforest along the mountainside. It’s almost silent on the bus, a calm quiet. I take a cue from the locals, close my eyes and try to get some rest. I am on my way to DWeb+Coolab Camp Brazil.

View of buildings at Neos Institute where campers found cover from the elements. Photo by: Bruno Caldas Vianna licensed under CC BY-NC-ND 4.0 Deed

My phone buzzes. It’s Victor (Coolab) and Dana (Colnodo). They pick me up from the station and we’re off to Neos Institute, where we’ll spend the next five days together. Coolab Camp is a continuing experiment in the DWeb movement — weaving together technologists, dreamers, builders, and organizers in a beautiful outdoor setting, providing food and shelter for the week, then letting the sharing, imagination, and community building fly.

Gathering on the first day to talk about the themes of agriculture and ecology.

I arrive early to help set up parts of the camp, which is being hosted by the Neos Institute for Sociobiodiversity. They are a collective that has spent the last six years rebuilding this once dilapidated cultural center. One of Neos’ goals is to protect and conserve this area, the Brazilian Atlantic forest. Only about 10% of this forest remains in the wake of development.

This spirit of conservation overlays with the themes of Camp: agriculture, sustainability, and ecology. Coolab is bringing together farmers and organizers from Latin America with DWeb builders and technologists to discover how we can take care of both our digital and physical landscapes.

My roommate, Bruna, from the Transfeminist Network of Digital Care, shares their work on Pratododia. They use the metaphor of food to explain how we can practice healthier technology habits. For instance, just as we wash our hands before meals, it’s important to check our security and privacy settings online before tasting everything the internet offers us.

Papaya, mango, and watermelon served during our vegetarian meals.

Coolab Camp is more than a conference, it is an experiment in building a pop-up community. We start each day with a general meeting at the Casarão (Big House), where we forge acordos (agreements) about how to take care of the space and each other.

Alexandre from Coletivo Neos goes over the history of the Neos Institute.

Coolab Camp morning meetings are at once relaxed and energizing.

These acordos range from simple things: don’t feed the cats and take off your shoes — to strong expressions of our values: no oppression or discrimination of any kind based on class, race, gender, or sexual orientation. At the beginning of every meeting we reiterate these agreements and ask ourselves: do we still agree, does anything need to be changed, does anything need to be added?

This daily gathering is only possible because the event is small, about 80 people over the five days of Camp. That intimacy means we recognize familiar faces and at least exchange a friendly greeting (Bom dia!). There are no janitors to clean up during the event. We wash our own dishes and clean our own bathrooms.

A community member helps setup the mesh network.

Folks also volunteer to be the “olhos (eyes) ” and “ouvidos (ears)” of the community. The Olhos serve to watch out for any misbehavior. The Ouvidos are there to listen if someone has issues they are uncomfortable bringing up to the group. All of this adds to the building of our community.

Marcela and Tomate crafting posters and zines.

How do we communicate at Camp? First, we test some technological solutions like a Mumble server for multiple audio channels, then having AI do live translation. But in the end, the best solution is human: to have another person by our side.

A lot of the Brazilian campers speak both Portuguese and English, so volunteers translate whispering next to us English-only speakers. It is incredibly humbling to have community members put so much energy into making sure we are included in the conversations and know what is going on.

Creating our session schedule through unconference.

Next comes the fun part, the sessions and workshops! Sessions are organized through an unconference where everyone proposes sessions, determines their interests, and those garnering the most interest place themselves on the schedule. Workshops range from:

Learning programming with Scratch for kids and beginners
Working with a mesh network
ODD.SDK – a local-first framework for app development
A presentation on Digital Democracy’s Earth Defender’s Toolkit

An analog map of the camp site and where routers for the mesh network will go.

Campers gather around the firepit to share experiences working in cooperatives.

Luandro Viera from Digital Democracy shares the Earth Defender’s Toolkit.

One of my favorite sessions is with Ana, a Brazilian farmer and social researcher guiding us through a game called Sanctuaries of Attention. It happens on the last day. It is impromptu and they just ask around for people to join after breakfast.

Ana is able to lead the session in Portuguese, Spanish, and English. We spend two hours sharing stories of how our attention changes in different situations and which situations feel safe for us — ”sanctuaries” that we can rest in.

The unconference style suits DWeb+Coolab Camp, because it allows the time and space for sessions like these to happen organically, without constraints.

Ana guides participants through the Sanctuaries of Attention.

Nico teaches programming for beginners using Scratch.

Setting up network equipment for the mesh network on site.

Some sessions are discussions around topics like:

Experiences as a cooperative
How to organize groups using sociocracy
Sharing challenges and workarounds managing a community network
Methods for social exchange of common resources

It doesn’t hurt that we can hold some of these discussions at the beach. There are also plenty of casual conversations over meals, on a couch, or lounging in a hammock.

Discussion on the beach about community networks.

One of the things I’ll keep with me from those conversations is a new way of understanding the saying, “The future is already here — it’s just not evenly distributed yet.”

Those of us from luckier circumstances fret about the end of the world. Those from different circumstances have already seen it happen. Their economic systems have collapsed or their environment is suffering through the worst of the climate catastrophe. The end of the world is already here — it’s just not evenly distributed yet.

But an end is just a new beginning. Here in Brazil, we meet in the forest with people who are already rebuilding, regenerating from the ruins. The contributions we make will remain. Regeneration is already here — it’s just not evenly distributed yet.

Creating improvised music with Música de Círculo around the campfire.

Peixe (Fish) and Ondas (Waves), the spaces where sessions were held.

Farmers, organizers, designers and technologists at DWeb+Coolab Camp Brazil 2023.

We could have been anywhere, but we got the opportunity to be within the songs of the birds, the whispers of the trees, and the laughter of the sea. Within smiles and greetings, warm embraces and supportive shoulders. To all the people who gathered us together: Tania, Hiure, Marcela, Luandro, Victor, Dana, Bruno, Marcus, Colectivo Neos and anyone else I may have forgotten, thank you for showing us how to regenerate culture, environment and technology through community. Obrigado!

All photos by Melissa Rahal licensed under CC BY-NC-ND 4.0 Deed unless otherwise stated.

Celebrating 1 Petabyte on the Filecoin Network!

Posted on October 20, 2023 by Jamie Joyce

The Internet Archive is pleased to announce, through support from the Filecoin Foundation (FF) and Filecoin Foundation for the Decentralized Web (FFDW), that one petabyte of material has now been uploaded to the Filecoin network. Among the collections uploaded are the “End of Term Crawl” collections. These collections are composed of U.S. government websites which are crawled at the end of presidential administrations, before the ephemeral media may be lost to administrative turnover.

We are so grateful for the support we have received from the Filecoin Foundation (FF) and Filecoin Foundation for the Decentralized Web (FFDW) to make this work possible, and are enthusiastic about continuing this collaboration to ensure the ongoing accessibility of critical information like government materials.

To read more about this milestone, please visit the Filecoin Foundation’s announcement here.
About the Filecoin Network: Filecoin is an open-source cloud storage marketplace, protocol, and incentive layer with a mission to store humanity’s most important information.

What Happened at the Virtual Library Leaders Forum?

Posted on October 6, 2023 by Caralee Adams

The Internet Archive team, its partners, and enthusiasts recently shared updates on how the organization is empowering research, ensuring preservation of vital materials, and extending access to knowledge to a growing number of grateful users.

The 2023 Library Leaders Forum, held virtually Oct. 4, featured snapshots of the many activities the organization is supporting on a global scale. Together, the efforts are making a difference in the lives of students, scholars, educators, entrepreneurs, journalists, public servants — anyone who needs trusted information without barriers.

“It’s important for us to recognize that the Internet Archive is a library. It’s a research library in the role that it plays, in the way that it works,” said Brewster Kahle, founder of the Internet Archive.

Watch the 2023 Library Leaders Forum:

With the rise of misinformation and new artificial intelligence technologies, reliable, digital information is needed more than ever, he said.

“This is going to be a challenging time in the United States when all of our institutions — the press, the election system, and libraries — are going to be tested,” Kahle said. “It’s time for us to make sure we stand up tall and be as useful to people in the United States and to people around the world who are having some of the same issues.”

To provide citizens everywhere with free access to government data, documents, records, the Archive launched Democracy’s Library last year. The collection now has 889,000 government publications, with many more items donated but yet to be organized, said the Archive’s Jamie Joyce at the forum. The goal is to digitize municipal, provincial, state and federal documents, along with datasets, research, records publications, and microfiche so they are searchable and accessible.

The Archive is taking a leadership role in harnessing the power of AI to make its information easier for users to find, Kahle added. It is also preserving state television newscasts from Russia and Iran, along with translations, to allow researchers to track trends in coverage.

Collections as data

Thomas Padilla, deputy director of data archiving and data services at the Internet Archive, reported on a project that examines how libraries can support responsible use of collections as data. Working in partnership with Iowa State University, University of Pennsylvania, and James Madison University, it is a community development effort for libraries, archives, museums and galleries to help researchers use new technology (text and data mining, machine learning) while also mitigating potential harm that can be generated by the process.

Through the effort, the Archive gave grants to 12 research libraries and cultural heritage organizations to explore questions around collections as data, Padilla said. As it became apparent that others around the world were grappling with similar issues, the project convened representatives from 60 organizations representing 18 countries earlier this year in Canada. The group agreed on core principles (The Vancouver Statement on Collections-As-Data) to use when providing machine actionable collection data to researchers. Next, the project expects to issue a roadmap for the broader international community in this space, Padilla said.

Helping libraries help publishers

The recent forum also featured digitization managers from the Internet Archive who are collaborating with partner libraries, including Tim Bigelow, Sophie Flynn-Piercy, Elizabeth MacLead, Andrea Mills and Jeff Sharpe. These librarians are at institutions big and small from the University of North Carolina at Chapel Hill to the Wellcome Trust in London, working with teams of professionally trained technicians to digitize collections.

One of those partnerships is taking an exciting new direction. The Boston Public Library’s partnership with the Archive began in 2007. Over the years, the team has completed digitization of the John Adams presidential library, Shakespeare’s First Folio (his 36 plays published in 1632), more than 17,000 government documents and the Houghton Mifflin trade book archival collection, according to Bigelow, the Northeast Regional digitization manager for the Archive.

The Houghton Mifflin collection includes 20,000 titles dating back to 1832, including some of the best known works in American fiction and children’s literature, such as books by Ralph Waldo Emerson and the Curious George series. The publisher gave BPL the entire physical collection for preservation (90% of which were out of print) and continues to add new titles as they are published. With the formal agreement of Houghton Mifflin, BPL and the Archive have been working together since 2017 to digitize every book—those in the public domain are completely readable and downloadable; those still in copyright are available through controlled digital lending (CDL).

Lawsuit updates

As in Boston, many libraries have embraced CDL. However, commercial publishers have challenged the practice.

Lila Bailey, senior policy counsel for the Archive, provided an update at the forum on the Hachette v. Internet Archive lawsuit, in which the court ruled in favor of the publishers in limiting the use of CDL. The Archive filed an appeal in September. Bailey encouraged supporters to consider filing amicus briefs when the Archive’s case is expected to be reviewed by the appellate court.

For the Internet Archive—and libraries everywhere—to continue their work, the Archive is advocating for a legal infrastructure that ensures libraries can collect digital materials, preserve those materials in different formats, lend digital materials, and cooperate with other libraries.

“In our evolving digital society, will new technologies serve the public good, or only corporate interests?” Bailey asked in her remarks at the forum. “Libraries are on the front line of the fight to decide this question in favor of the public good. In order to maintain our age-old role as guardians of knowledge, we need our rights to own, lend and preserve books, as we all live more and more of our lives online.”

Wrapping up Legal Literacies for Text and Data Mining – Cross-Border (LLTDM-X)

Posted on October 5, 2023 by tpadilla

In August 2022, the UC Berkeley Library and Internet Archive were awarded a grant from the National Endowment for the Humanities (NEH) to study legal and ethical issues in cross-border text and data mining (TDM).

The project, entitled Legal Literacies for Text Data Mining – Cross-Border (“LLTDM-X”), supported research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or licensed content, or involves international research collaborations.

LLTDM-X is now complete, resulting in the publication of an instructive case study for researchers and white paper. Both resources are explained in greater detail below.

Project Origins

LLTDM-X built upon the previous NEH-sponsored institute, Building Legal Literacies for Text Data Mining. That institute provided training, guidance, and strategies to digital humanities TDM researchers on navigating legal literacies for text data mining (including copyright, contracts, privacy, and ethics) within a U.S. context.

A common challenge highlighted during the institute was the fact that TDM practitioners encounter expanding and increasingly complex cross-border legal problems. These include situations in which: (i) the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) the human subjects they are studying or who created the underlying content reside in another country; or, (iii) the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply.

Project Design

LLTDM-X was designed to identify and better understand the cross-border issues that digital humanities TDM practitioners face, with the aim of using these issues to inform prospective research and education. Secondarily, it was hoped that LLTDM-X would also suggest preliminary guidance to include in future educational materials. In early 2023, the project hosted a series of three online round tables with U.S.-based cross-border TDM practitioners and law and ethics experts from six countries.

The round table conversations were structured to illustrate the empirical issues that researchers face, and also for the practitioners to benefit from preliminary advice on legal and ethical challenges. Upon the completion of the round tables, the LLTDM-X project team created a hypothetical case study that (i) reflects the observed cross-border LLTDM issues and (ii) contains preliminary analysis to facilitate the development of future instructional materials.

The project team also charged the experts with providing responsive and tailored written feedback to the practitioners about how they might address specific cross-border issues relevant to each of their projects.

Guidance & Analysis

Case Study

Extrapolating from the issues analyzed in the round tables, the practitioners’ statements, and the experts’ written analyses, the Project Team developed a hypothetical case study reflective of “typical” cross-border LLTDM issues that U.S.-based practitioners encounter. The case study provides basic guidance to support U.S. researchers in navigating cross-border TDM issues, while also highlighting questions that would benefit from further research.

The case study examines cross-border copyright, contracts, and privacy & ethics variables across two distinct paradigms: first, a situation where U.S.-based researchers perform all TDM acts in the U.S., and second, a situation where U.S.-based researchers engage with collaborators abroad, or otherwise perform TDM acts in both U.S. and abroad.

White Paper

The LLTDM-X white paper provides a comprehensive description of the project, including origins and goals, contributors, activities, and outcomes. Of particular note are several project takeaways and recommendations, which the project team hopes will help inform future research and action to support cross-border text data mining. Project takeaways touched on seven key themes:

Uncertainty about cross-border LLTDM issues indeed hinders U.S. TDM researchers, confirming the need for education about cross-border legal issues;
The expansion of education regarding U.S. LLTDM literacies remains essential, and should continue in parallel to cross-border education;
Disparities in national copyright, contracts, and privacy laws may incentivize TDM researcher “forum shopping” and exacerbate research bias;
License agreements (and the concept of “contractual override”) often dominate the overall analysis of cross-border TDM permissibility;
Emerging lawsuits about generative artificial intelligence may impact future understanding of fair use and other research exceptions;
Research is needed into issues of foreign jurisdiction, likelihood of lawsuits in foreign countries, and likelihood of enforcement of foreign judgments in the U.S. However, the overall “risk” of proceeding with cross-border TDM research may remain difficult to quantify; and
Institutional review boards (IRBs) have an opportunity to explore a new role or build partnerships to support researchers engaged in cross-border TDM.

Gratitude & Next Steps

Thank you to the practitioners, experts, project team, and generous funding of the National Endowment for the Humanities for making this project a success.

We aim to broadly share our project outputs to continue helping U.S.-based TDM researchers navigate cross-border LLTDM hurdles. We will continue to speak publicly to educate researchers and the TDM community regarding project takeaways, and to advocate for legal and ethical experts to undertake the essential research questions and begin developing much-needed educational materials. And, we will continue to encourage the integration of LLTDM literacies into digital humanities curricula, to facilitate both domestic and cross-border TDM research.

[Note: this content is cross-posted on the Legal Literacies for Text and Data Mining project site and the UC Berkeley Library Update blog.]

Celebrating Wendy Hanamura, Internet Archive’s ‘Storyteller-in-Chief’

Posted on October 5, 2023 by Caralee Adams

When Wendy Hanamura came to the Internet Archive nearly a decade ago, she used her talent as a journalist and media professional to share the story of the organization with an ever-growing audience.

“The power of storytelling is a tool that cannot be underestimated,” Hanamura said.

As she retires this fall as Director of Partnerships, she leaves a lasting imprint on the Archive. Hanamura’s creativity and dedication helped build new connections, attract more donors, and advance the mission of universal access to all knowledge.

“She became the storyteller-in-chief. She’s helped the organization understand itself and communicate what we’ve done,” said Brewster Kahle, founder of the Internet Archive. “Through that, she has really helped shape the Internet Archive.”

Wendy at the Internet Archive’s annual celebration in 2017, which she produced.

During her tenure, Hanamura oversaw projects big and small that expanded the visibility and sustainability of the Archive. She stewarded relationships that moved the organization into new areas, such as controlled digital lending and the decentralized web. Along the way, Hanamura became known for her personal touch, warmly moderating discussions, mentoring young staffers, and extending her spirit of generosity to others.

“Wendy makes things work. She thinks things through, connects competent people, works to the highest standard, translates between different types of people, and is decisive and diplomatic,” says Jeff Ubois, of Lever for Change, a nonprofit affiliate of the MacArthur Foundation.

Ubois got to know Hanamura when he was at the MacArthur Foundation and she was spearheading the Archive’s proposal for the MacArthur 100&Change competition. Although the Archive wasn’t awarded the grant, it was one of eight semi-finalists of nearly 2,000 applicants for the grant. The major endeavor, Ubois said, required Hanamura to thoroughly imagine and manage a multi-step process, while enlisting the support of others to participate. “Her vision on the one hand and her implementation skills on the other are superpowers,” Ubois said.

Brewster Kahle, Wendy Hanamura and John Gonzalez supporting Internet Archive’s “100&Change” grant submission.

Hanamura began her career in New York as a reporter-researcher at Time Magazine, after graduating summa cum laude from Harvard University. She moved into broadcast journalism and worked as a correspondent in Tokyo for World Monitor on the Discovery Channel. In the San Francisco Bay area she worked at the local CBS affiliate, covering breaking news, and then as a producer at PBS. Hanamura ran an independent documentary company for 15 years, spent time as General Manager of independent tv network Link TV and was chief digital officer at KCET/Link before joining the Archive in 2014.

“I think of myself as a storyteller for change. What it comes down to is telling your story—your vision—convincingly and bringing others into that vision and finding ways to integrate them,” said Hanamura, 62, who lives in San Francisco. “That’s just always what I’ve done whether it be for Time Magazine, a television network, PBS or the Internet Archive.”

Preserving Important Voices

In building partnerships for the Internet Archive, Hanamura said the best pitch is always personal.

She would often hold up the book, Executive order 9066: The Internment of 110,000 Japanese Americans and explain how, after discovering it in her local library in sixth grade, it changed her life. The book is out of print and hard to find. When her son was taking a college class in Asian American identity, it would have been perfect for him. “But his generation believes that if it’s not online, it doesn’t exist,” Hanamura would say. “The only place he can access this book online is the Internet Archive.” She could go on to suggest there are more “valuable, precious” books that need to be available online to people everywhere in the world.

In a tribute to her father’s service in a WWII all-Japanese unit, Hanamura produced a documentary, “Honor Bound: A Personal Journey, the Story of the 100^th/442^nd Regimental Combat Team.” Hanamura also became involved in helping Densho, a Seattle-based nonprofit, build a Digital Library of Japanese American Incarceration materials at the Internet Archive. Her own mother was 14 when she was incarcerated in wartime camps and today, at 95, is one of the few remaining survivors—a story Hanamura told for the Archive’s 25th anniversary.

Digital Library of Japanese American Incarceration

“The people who knew and experienced the Japanese American incarceration are dying in great numbers,” Hanamura said. “I really wanted to make sure that the most important voices, the most important literature and research was captured and preserved for all time.”

Hanamura was able to secure support from the U.S. Department of Interior and National Park Services’ Japanese American Confinement Site program to partially fund the effort.

“Wendy’s love of history, community, and story made working with her and the Internet Archive a joy,” said Tom Ikeda, the founder of Densho who collaborated with Hanamura on the project. “[The Archive] was a natural partner to preserve and share over 1,000 video-recorded oral histories of Japanese Americans incarcerated during WWII.”

‘A Great Connector’

At the Archive, Kahle said that Hanamura explained not only the history of the organization and the dream of the internet as a library, but how it can help people with challenges in the world. She established online fundraising efforts that people cared about—cultivating thousands of donors through her work. Hanamura grew the Archive’s annual contributions from about $350,000 when she started to about $5 million when she turned over philanthropic operations in 2019.

Brewster Kahle and Wendy Hanamura at the DWeb Summit, 2018.

When it came time to celebrate the Archive’s 25^th anniversary, it was Hanamura who coordinated the coverage, content and celebration. She also developed the “Way Forward” part of the campaign, an innovative approach to consider where the world would be in 25 years if access to knowledge was not protected.

“Wendy cares deeply about social issues and has a great sense of what might be done to make the world a little more creative, fair, open, and prosperous,” Ubois says.

Hanamura also cares about the rising generation of professionals in technology who will carry on this work in the coming years. She has supported the decentralized web community and helped it coalesce–from holding Decentralized Web Summits beginning in 2016 to producing DWeb Camps starting in 2019.

“I admire her ability to be such a great connector, which comes from her acute ability in understanding people’s interests and strengths,” said Mai Ishikawa Sutton, who has worked alongside Hanamura as a co-organizer of the camp. “She has an empathetic ear to people as they talk about what they’re working on and what they’re struggling with. She’s a dynamic force when it comes to helping people who are building a better decentralized, resilient web.”

Held in California, the first camp in 2019 drew about 350 people, and by 2023 it had grown to more than 500 participants. From planning the technical sessions to the nitty-gritty logistics of catering and music, Hanamura’s leadership makes it happen, said Ishikawa Sutton.

Catherine Stihler, CEO of Creative Commons, said Hanamura exudes kindness and empathy. “She has an ability to make you feel so welcome. She’s one of those people if you’re in a room of strangers, they aren’t strangers for long,” says Stihler, who has watched her energy, joy and inclusion bring people together at the DWeb camps.

“[Hanamura] is central to the success of this camp. She has her fingerprint on everything from top to bottom,” Ishikawa Sutton said. “She is able to really envision what it means to build a meaningful event where people can share their work, have fun, and be their full selves.”

As a result of the gatherings, many participants have built trust, collaborated on projects, shared grants and been hired—in part, thanks to Hanamura’s effort to connect people, according to Ishikawa Sutton, who is a co-founder and editor of COMPOST, an online magazine about the digital commons, and project manager of Distributed.Press. She has also played a critical role in making the DWeb organization financially sustainable, they added.

“What I love about the Internet Archive is that we are always looking to what’s next and over the horizon,” said Hanamura, who sees great potential in the young, idealistic supporters of the DWeb, calling it a “life-giving movement.”

Lasting Impact

Although Hanamura retired from her position at the Internet Archive Sept. 14, she plans to continue to be involved in the DWeb community as a volunteer.

Hanamura said her decision to step down as Director of Partnerships was prompted by her desire to give her full attention to caregiving for her husband, who has a terminal illness.

Among her final projects at the Archive was creating a meditation garden outside Funston Street building to provide a quiet space for reflection.

The meditation garden that Wendy designed & landscaped as a parting gift to Internet Archive staff and friends.

“Working at the Internet Archive has been a gift and such an education,” Hanamura said “Here, we have the mission to create lasting impact, lasting stories, and lasting artifacts. I am so grateful to the Internet Archive for giving me the opportunity to use my skills toward that end.”

Added Kahle: “The Internet Archive staff and patrons have felt the power of Wendy Hanamura spending her lifeforce building a library all these years.”

Leveraging Technology to Scale Library Research Support: ARCH, AI, and the Humanities

Posted on October 3, 2023 by tpadilla

Kevin Hegg is Head of Digital Projects at James Madison University Libraries (JMU). Kevin has held many technology positions within JMU Libraries. His experience spans a wide variety of technology work, from managing computer labs and server hardware to developing a large open-source software initiative. We are thankful to Kevin for taking time to talk with us about his experience with ARCH (Archives Research Compute Hub), AI, and supporting research at JMU.

Thomas Padilla is Deputy Director, Archiving and Data Services.

Thomas: Thank you for agreeing to talk more about your experience with ARCH, AI, and supporting research. I find that folks are often curious about what set of interests and experiences prepares someone to work in these areas. Can you tell us a bit about yourself and how you began doing this kind of work?

Kevin: Over the span of 27 years, I have held several technology roles within James Madison University (JMU) Libraries. My experience ranges from managing computer labs and server hardware to developing a large open-source software initiative adopted by numerous academic universities across the world. Today I manage a small team that supports faculty and students as they design, implement, and evaluate digital projects that enhance, transform, and promote scholarship, teaching, and learning. I also co-manage Histories Along the Blue Ridge which hosts over 50,000 digitized legal documents from courthouses along Virginia’s Blue Ridge mountains.

Thomas: I gather that your initial interest in using ARCH was to see what potential it afforded for working with James Madison University’s Mapping Black Digital and Public Humanities project. Can you introduce the project to our readers?

Kevin: The Mapping the Black Digital and Public Humanities project began at JMU in Fall 2022. The project draws inspiration from established resources such as the Colored Convention Project and the Reviews in Digital Humanities journal. It employs Airtable for data collection and Tableau for data visualization. The website features a map that not only geographically locates over 440 Black digital and public humanities projects across the United States but also offers detailed information about each initiative. The project is a collaborative endeavor involving JMU graduate students and faculty, in close alliance with JMU Libraries. Over the past year, this interdisciplinary team has dedicated hundreds of hours to data collection, data visualization, and website development.

*Mapping the Black Digital and Public Humanities, project and organization type distribution*

The project has achieved significant milestones. In Fall 2022, Mollie Godfrey and Seán McCarthy, the project leaders, authored, “Race, Space, and Celebrating Simms: Mapping Strategies for Black Feminist Biographical Recovery“, highlighting the value of such mapping projects. At the same time, graduate student Iliana Cosme-Brooks undertook a monumental data collection effort. During the winter months, Mollie and Seán spearheaded an effort to refine the categories and terms used in the project through comprehensive research and user testing. By Spring 2023, the project was integrated into the academic curriculum, where a class of graduate students actively contributed to its inaugural phase. Funding was obtained to maintain and update the database and map during the summer.

Looking ahead, the project team plans to present their work at academic conferences and aims to diversify the team’s expertise further. The overarching objective is to enhance the visibility and interconnectedness of Black digital and public humanities projects, while also welcoming external contributions for the initiative’s continual refinement and expansion.

Thomas: It sounds like the project adopts a holistic approach to experimenting with and integrating the functionality of a wide range of tools and methods (e.g., mapping, data visualization). How do you see tools like ARCH fitting into the project and research services more broadly? What tools and methods have you used in combination with ARCH?

Kevin: ARCH offers faculty and students an invaluable resource for digital scholarship by providing expansive, high-quality datasets. These datasets enable more sophisticated data analytics than typically encountered in undergraduate pedagogy, revealing patterns and trends that would otherwise remain obscured. Despite the increasing importance of digital humanities, a significant portion of faculty and students lack advanced coding skills. The advent of AI-assisted coding platforms like ChatGPT and GitHub CoPilot has democratized access to programming languages such as Python and JavaScript, facilitating their integration into academic research.

For my work, I employed ChatGPT and CoPilot to further process ARCH datasets derived from a curated sample of 20 websites focused on Black digital and public humanities. Utilizing PyCharm—an IDE freely available for educational purposes—and the CoPilot extension, my coding efficiency improved tenfold.

Next, I leveraged ChatGPT’s Advanced Data Analysis plugin to deconstruct visualizations from Stanford’s Palladio platform, a tool commonly used for exploratory data visualizations but lacking a means for sharing the visualizations. With the aid of ChatGPT, I developed JavaScript-based web applications that faithfully replicate Palladio’s graph and gallery visualizations. Specifically, I instructed ChatGPT to employ the D3 JavaScript library for ingesting my modified ARCH datasets into client-side web applications. The final products, including HTML, JavaScript, and CSV files, were made publicly accessible via GitHub Pages (see my graph and gallery on GitHub Pages)

*Black Digital and Public Humanities websites, graph visualization*

In summary, the integration of Python and AI-assisted coding tools has not only enhanced my use of ARCH datasets but also enabled the creation of client-side web applications for data visualization.

Thomas: Beyond pairing ChatGPT with ARCH, what additional uses are you anticipating for AI-driven tools in your work?

Kevin: AI-driven tools have already radically transformed my daily work. I am using AI to reduce or even eliminate repetitive, mindless tasks that take tens or hundreds of hours. For example, as part of the Mapping project, ChatGPT+ helped me transform an AirTable with almost 500 rows and two dozen columns into a series of 500 blog posts on a WordPress site. ChatGPT+ understands the structure of a WordPress export file. After a couple of hours of iterating through my design requirements with ChatGPT, I was able to import 500 blog posts into a WordPress website. Without this intervention, this task would have required over a hundred hours of tedious copying and pasting. Additionally, we have been using AI-enabled platforms like Otter and Descript to transcribe oral interviews.

I foresee AI-driven tools playing an increasingly pivotal role in many facets of my work. For instance, natural language processing could automate the categorization and summarization of large text-based datasets, making archival research more efficient and our analyses richer. AI can also be used to identify entities in large archival datasets. Archives hold a treasure trove of artifacts waiting to be described and discovered. AI offers tools that will supercharge our construction of finding aids and item-level metadata.

Lastly, AI could facilitate more dynamic and interactive data visualizations, like the ones I published on GitHub Pages. These will offer users a more engaging experience when interacting with our research findings. Overall, the potential of AI is vast, and I’m excited to integrate more AI-driven tools into JMU’s classrooms and research ecosystem.

Thomas: Thanks for taking the time Kevin. To close out, whose work would you like people to know more about?

Kevin: Engaging in Digital Humanities (DH) within the academic library setting is a distinct privilege, one that requires a collaborative ethos. I am fortunate to be a member of an exceptional team at JMU Libraries, a collective too expansive to fully acknowledge here. AI has introduced transformative tools that border on magic. However, loosely paraphrasing Immanuel Kant, it’s crucial to remember that technology devoid of content is empty. I will use this opportunity to spotlight the contributions of three JMU faculty whose work celebrates our local community and furthers social justice.

Mollie Godfrey (Department of English) and Seán McCarthy (Writing, Rhetoric, and Technical Communication) are the visionaries behind two inspiring initiatives: the Mapping Project and the Celebrating Simms Project. The latter serves as a digital, post-custodial archive honoring Lucy F. Simms, an educator born into enslavement in 1856 who impacted three generations of young students in our local community. Both Godfrey and McCarthy have cultivated deep, lasting connections within Harrisonburg’s Black community. Their work strikes a balance between celebration and reparation. Collaborating with them has been as rewarding as it is challenging.

Gianluca De Fazio (Justice Studies) spearheads the Racial Terror: Lynching in Virginia project, illuminating a grim chapter of Virginia’s past. His relentless dedication led to the installation of a historical marker commemorating the tragic lynching of Charlotte Harris. De Fazio, along with colleagues, has also developed nine lesson plans based on this research, which are now integrated into high school curricula. My collaboration with him was a catalyst for pursuing a master’s degree in American History.

Both the Celebrating Simms and Racial Terror projects are highlighted in the Mapping the Black Digital and Public Humanities initiative. The privilege of contributing to such impactful projects alongside such dedicated individuals has rendered my extensive tenure at JMU both meaningful and, I hope, enduring.

Brewster Goes to Washington – Congressional Hearing on the Copyright Office Modernization Committee

Posted on September 28, 2023 by Brewster Kahle

A good day in Washington. After two years of being on the Copyright Office Modernization Committee, helping advise the Copyright Office on their new registration and recordation process, a republican and a democrat from the House of Representatives held a hearing to ask questions of committee members. It was such a refreshing scene because it was bipartisan, they knew the issues, and they were spending time finding out what we suggested.

This all matters because the Copyright Office is moving to filings being digital, which is an improvement, and because it could make way for efficient submissions of digital files. This would be a major way for the Library of Congress to get copies of books they would own, preserve, and make somewhat accessible.

Another attendee said they had gone to congressional meetings for 30 years and this one had the most engagement of any of them. A good day in Washington, indeed.

A Quarter In, A Quarter-Million Out: 10 Years of Emulation at Internet Archive

Posted on September 20, 2023 by Jason Scott

10 years ago, the Internet Archive made an announcement: It was possible for anyone with a reasonably powerful computer running a modern browser to have software emulated, running as it did back when it was fresh and new, with a single click. Now, a decade later, we have surpassed 250,000 pieces of software running at the Archive and it might be a great time to reflect on how different the landscape has become since then.

Anyone can come up with an idea, and the idea of taking the then-quite-mature Javascript language, universally inside all major browsers and having it run complicated programs was not new.

With the rise of a cross-compiler named Emscripten, the idea of taking rather-complicated programs written in other languages and putting them into Javascript was kind of new.

That all being the case, the idea of taking a by-then 20-year-old super-emulator called MAME, using Emscripten to cross-compile it into Javascript, and then running the resulting code in the browser at Internet Archive to make computers and consoles run, was very new.

It was also, objectively, madness.

Well over a thousand hours of work went into the project from a very wide range of volunteers who poured galactic amounts of time into making the project a reality. Along the way, changes were made to Emscripten, the Firefox, Internet Explorer, and Chrome Browsers, MAME, and the Internet Archive’s codebase to accommodate this dream.

It was announced in the Fall of 2013, well over a year after the project started.

Additional announcements came with each expansion of the types of software being emulated, and it became huge news, leading to millions of visitors coming to try this it out.

By any measure, a quarter of a million items later, it has been a huge, huge success.

The rest of this blog entry is pretty pictures and beautiful links, but before we move on, it’s once again important to highlight people who provided major contributions, including Justin Kerk, Daniel Brooks, Vitorio Miliano, James Baicoianu, John Vilk, Tracey Jaquith, Jim Nelson, and Hank Bromley. Dozens more developers spent evenings, weekends, and months to make this system happen. Thank you to everyone involved.

The joy of watching a computer boot up in the browser was (and is) a miraculous feeling. And after that feeling, comes a quick comfort with the situation: Of course we can run computers inside our browsers. Of course we can make most anything we want run in these browser-based computers. What’s next?

Within a short time after our 2013 announcement, the archive was running hundreds, then thousands of individual programs, floppy disks and even cassette-based software from computing’s past.

As emulators besides MAME were added, it became necessary to create a framework for a versatile and understandable method to load emulators. This framework eventually got a name: THE EMULARITY.

In the decade of the Emularity’s existence, the Archive’s software emulation has expanded into directions nobody could have fully expected to work when the project started.

Here are some highlights:

Hypercard Stacks for the Apple Macintosh, a critical period in content creation and computer information architecture, have been restored to easy access, surpassing thousands of hypercards to try instantly.

Plastic Electronic Handheld Games, once a staple of toys in the 1970s through the 1990s, have been able to live once again as, including the original housing that these simple (and not so simple) machines relied on instead of graphics.

As the uploads veered into the many thousands, it became more and more difficult for new adventurous users to figure out what, if any, software was at the archive to check out. This has led to specialized collections focused on one type of program, like the Computer Chess Club. People can use these collections as gateways to quickly testing the waters of now-decades of computer and software history, seeing the turns and twists of countless lost companies and individuals who squeezed every last bit of wonder and spectacle out of these underpowered boxes.

The Calculator Drawer took things to a new level when entire calculators could be emulated, including their unique looks, accompanied by a “drawer of manuals” to browse through if you had to learn (or re-learn) how to make these machines run.

The Woz-a-Day Collection, in many ways, represents the logical end for the role that the Internet Archive’s Emularity can provide for software history. The project is the effort of the software historian 4am, who has spent years on its maintenance. Methodically preserving Apple II software from the original floppy disks, incorporating every last bit and track of the disks with no modifications, and allowing the best fidelity of these programs as they originally were offered, 4am allows some of these programs to be playable for the first time in decades.

With each new batch of added emulated systems and machines have come a greater and greater pool of users, toying with historical software or playing long-forgotten or never-remembered games with a new level of convenience and willingness to try them out.

At this milestone of a decade into this experimental adventure, Internet Archive continues to grow its collection, to test and automate the functioning of both uploaded and self-maintained collections of software, and to provide a vast and necessary service in the preservation of historical software.

And, of course, we all get to enjoy some really great games.

Here’s to what another ten years will bring us!

IMLS National Leadership Grant Supports Expansion of the ARCH Computational Research Platform

Posted on September 19, 2023 by jefferson

In June, we announced the official launch of Archives Research Compute Hub (ARCH) our platform for supporting computational research with digital collections. The Archiving & Data Services group at IA has long provided computational research services via collaborations, dataset services, product features, and other partnerships and software development. In 2020, in partnership with our close collaborators at the Archives Unleashed project, and with funding from the Mellon Foundation, we pursued cooperative technical and community work to make text and data mining services available to any institution building, or researcher using, archival web collections. This led to the release of ARCH, with more than 35 libraries and 60 researchers and curators participating in beta testing and early product pilots. Additional work supported expanding the community of scholars doing computational research using contemporary web collections by providing technical and research support to multi-institutional research teams.

We are pleased to announce that ARCH recently received funding from the Institute of Museum and Library Services (IMLS), via their National Leadership Grants program, supporting ARCH expansion. The project, “Expanding ARCH: Equitable Access to Text and Data Mining Services,” entails two broad areas of work. First, the project will create user-informed workflows and conduct software development that enables a diverse set of partner libraries, archives, and museums to add digital collections of any format (e.g., image collections, text collections) to ARCH for users to study via computational analysis. Working with these partners will help ensure that ARCH can support the needs of organizations of any size that aim to make their digital collections available in new ways. Second, the project will work with librarians and scholars to expand the number and types of data analysis jobs and resulting datasets and data visualizations that can be created using ARCH, including allowing users to build custom research collections that are aggregated from the digital collections of multiple institutions. Expanding the ability for scholars to create aggregated collections and run new data analysis jobs, potentially including artificial intelligence tools, will enable ARCH to significantly increase the type, diversity, scope, and scale of research it supports.

Collaborators on the Expanding ARCH project include a set of institutional partners that will be closely involved in guiding functional requirements, testing designs, and using the newly-built features intended to augment researcher support. Primary institutional partners include University of Denver, University of North Carolina at Chapel Hill, Williams College Museum of Art, and Indianapolis Museum of Art, with additional institutional partners joining in the project’s second year.

Thousands of libraries, archives, museums, and memory organizations work with Internet Archive to build and make openly accessible digitized and born-digital collections. Making these collections available to as many users in as many ways as possible is critical to providing access to knowledge. We are thankful to IMLS for providing the financial support that allows us to expand the ARCH platform to empower new and emerging types of access and research.

Making IIIF Official at the Internet Archive

Posted on September 18, 2023 by mek

A joint blog post between the Internet Archive and the IIIF Community

Summary

After eight years hosting an experimental IIIF service for public benefit, the Internet Archive is moving forward with important steps to make its International Image Interoperability Framework (IIIF) service official. Each year, the Internet Archive receives feedback from friends and partners asking about our long-term plans for supporting IIIF. In response, the Internet Archive is announcing an official IIIF service which aims to increase the resourcing and reliability of the Internet Archive’s IIIF service, upgrade the service to utilize the latest version 3.0 of the IIIF specification, and graduate the service from the iiif.archivelab.org domain to iiif.archive.org. The upgrade also expands the Internet Archive’s IIIF support beyond images to also include audio, movies, and collections — enabling deep zoom on high-resolution images, comparative item analysis, portability across media players, annotation support, and more.

An image visually detailing each step of how a URL for a conceptual IIIF service run by "example.org" may be used to crop, zoom, rotate, and color correct an image and then download the result as a jpeg. Image from https://iiif.io/get-started/how-iiif-works

Background

In 2015, a team of enthusiastic Internet Archive volunteers from a group called Archive Labs implemented an experimental IIIF service to give partners and patrons new ways of using Archive.org images and texts. You can read more about the project’s origins and ambitions in this 2015 announcement blog post. The initial service provided researchers with an easy, standardized way to crop and reference specific regions of archive.org images. (Maybe you can tell whose eyes these are?) By making Internet Archive images and texts IIIF-compatible, they may be opened using any number of compatible IIIF viewer apps, each offering their own advantages and unique features. For instance, Mirador is a “multi-up” viewer that makes it easy for researchers to view different images side by side and then zoom into or annotate different areas of interest within each image.

Since its launch more than seven years ago, the IIIF labs service has received millions of requests by more than 15 universities and GLAM (galleries, libraries, archives and museums) organizations across the globe, including University of Texas, UCD Digital Library, Havana University, Digital Library of Georgia, BioStor, Emory University, and McGill University. In this time, the broader IIIF ecosystem itself has blossomed to include hundreds of participating institutions. For all its benefits, the labs IIIF service has been considered “unofficial,” hosted on the separate archivelab.org domain, and several partners have voiced interest in the Internet Archive adopting it as an officially supported service. Today, several members of the IIIF community are collaborating with the Internet Archive to make this happen.

Josh Hadro, managing director of the IIIF Consortium (IIIF-C), sees the Internet Archive as filling a critical role “in serving the average Internet user who may not benefit from the same access to or affiliation with infrastructure offered by traditional research institutions.” The IIIF-C promotes interoperability as a core element of IIIF: the ability to streamline access to information and make cultural materials as easy to use and reuse as possible. Because the Internet Archive enables any patron to upload eligible materials, everyone has the opportunity to benefit from IIIF’s capabilities. IIIF-C counts the Internet Archive as a natural ally because of its ongoing support of open collections delivered via open web standards and protocols. With this project, IIIF-C hopes to make the Internet Archive a go-to resource online that facilitates IIIF work for students and scholars unaffiliated with the kinds of institutions that historically have provided IIIF infrastructure. This is an essential step toward a strategic goal of lowering barriers to IIIF usage and adoption worldwide.

In service of this outcome, the Internet Archive has teamed up with a number of IIIF community members to officialize and upgrade the IIIF service in order to make the best use of the new capabilities introduced into the IIIF specifications in recent years.

In the coming weeks, we’ll share more details about the IIIF improvements that will become available to users of the Internet Archive. First, we want to lay out our current plan for the update, including backwards compatibility affordances, to ensure existing consumers have the information they need to successfully migrate from the unofficial to the official IIIF API.

Thanks

Both the original IIIF labs service the Internet Archive has been running, as well as the new upcoming official IIIF service, wouldn’t have been possible without huge support from volunteers within the IIIF community and Internet Archive staff. A big thank you to the following folks who are making this effort to bring IIIF into production possible:

Glen Robson
Mike Bennett
Sara Brumfield
Ben Brumfield
Josh Hadro
Drew Winget
Internet Archive staff, including Rob, Mek, Drini, Tracey, Brenton, et al.

Stay tuned for more details on the new functionality soon, and if you have questions or would like to get involved in helping us test the new setup, get in touch with IIIF-C at staff@iiif.io. For more updates, including September 13 IIIF Consortium community call announcing the Internet Archive’s IIIF service, please visit the IIIF community calendar at https://iiif.io/community/#calendar.

Technical Notes & FAQs for Partners

This technical section is intended for partners who currently rely on the iiif.archivelab.org IIIF API who may be seeking further details on how these changes might affect them.

What is changing? Previously, partners accessed the Internet Archive’s IIIF labs API from the iiif.archivelab.org domain. As part of the effort to graduate from labs to production, the IIIF API will move to the iiif.archive.org domain. Because we don’t want to break any of the amazing projects and exhibits that patrons have created using the existing IIIF capabilities on the archivelab.org domain, we’re migrating the API in phases.

Phasing migration. The first phase will introduce a new and improved, official Internet Archive IIIF 3.0 service on the iiif.archive.org subdomain. The unofficial, legacy service will continue to run on the iiif.archivelab.org for a grace period, allowing partners to migrate. Once we’ve gathered enough data to be confident requests are being satisfactorily fulfilled by the new official service, the legacy iiif.archivelab.org service will be “sunset” and any request to it will redirect to use the official iiif.archive.org service. At this point, all requests for IIIF manifests and IIIF images (whether to iiif.archivelab.org or iiif.archive.org) will default to the latest 3.0 version of the IIIF APIs and be answered by iiif.archive.org. A specifiable “version” endpoint will be available for consumers whose applications require manifests and images to be served using the IIIF v2.0 legacy format. More details, examples, and technical documentation will be made available on this topic in the coming weeks and will eventually be accessible from iiif.archive.org.

Possible Breaking Changes.
1. When the iiif.archivelab.org service was originally launched, iiif.archive.org was set up to redirect to iiif.archivelab.org as a convenience. Regrettably, during the first phase of development, iiif.archive.org will no longer be a redirect for iiif.archivelab.org and instead will run the new official IIIF service. As a result, partners whose code or applications reference iiif.archive.org (expecting it to redirect to iiif.archivelab.org) will experience a breaking change and will need to either update their references to explicitly refer to the legacy “iiif.archivelab.org” service, or update their code to use the Internet Archive’s new official iiif.archive.org service. As far as we can tell, we’re unaware of partners currently referencing “iiif.archive.org” within public projects on Github or Gitlab and so we hope no one is affected. Still, we want to give fair warning here. For those starting a new project and looking to use the Internet Archive’s IIIF offerings today, we strongly recommend using the iiif.archive.org endpoint.
2. Some partners migrating from the v2 to v3 API who have been saving annotations may also experience a breaking changes because canvas and manifest identifiers for version 3 are necessarily different from version 2 identifiers. We will be doing our best, for the time being, to ensure version 2.0 manifests remain accessible from the archivelab.org address (via redirects) and will retain the iiif.archivelab.org canvas identifiers.

Internet Archive Blogs

A blog from the team at archive.org