What Happened at the Virtual Library Leaders Forum?

The Internet Archive team, its partners, and enthusiasts recently shared updates on how the organization is empowering research, ensuring preservation of vital materials, and extending access to knowledge to a growing number of grateful users.

The 2023 Library Leaders Forum, held virtually Oct. 4, featured snapshots of the many activities the organization is supporting on a global scale. Together, the efforts are making a difference in the lives of students, scholars, educators, entrepreneurs, journalists, public servants — anyone who needs trusted information without barriers.

“It’s important for us to recognize that the Internet Archive is a library. It’s a research library in the role that it plays, in the way that it works,” said Brewster Kahle, founder of the Internet Archive.

Watch the 2023 Library Leaders Forum:

With the rise of misinformation and new artificial intelligence technologies, reliable, digital information is needed more than ever, he said.  

“This is going to be a challenging time in the United States when all of our institutions — the press, the election system, and libraries — are going to be tested,” Kahle said. “It’s time for us to make sure we stand up tall and be as useful to people in the United States and to people around the world who are having some of the same issues.”

To provide citizens everywhere with free access to government data, documents, records, the Archive launched Democracy’s Library last year. The collection now has 889,000 government publications, with many more items donated but yet to be organized, said the Archive’s Jamie Joyce at the forum. The goal is to digitize municipal, provincial, state and federal documents, along with datasets, research, records publications, and microfiche so they are searchable and accessible.

The Archive is taking a leadership role in harnessing the power of AI to make its information easier for users to find, Kahle added. It is also preserving state television newscasts from Russia and Iran, along with translations, to allow researchers to track trends in coverage.

Collections as data

Thomas Padilla, deputy director of data archiving and data services at the Internet Archive, reported on a project that examines how libraries can support responsible use of collections as data. Working in partnership with Iowa State University, University of Pennsylvania, and James Madison University, it is a community development effort for libraries, archives, museums and galleries to help researchers use new technology (text and data mining, machine learning) while also mitigating potential harm that can be generated by the process.

Through the effort, the Archive gave grants to 12 research libraries and cultural heritage organizations to explore questions around collections as data, Padilla said. As it became apparent that others around the world were grappling with similar issues, the project convened representatives from 60 organizations representing 18 countries earlier this year in Canada. The group agreed on core principles (The Vancouver Statement on Collections-As-Data) to use when providing machine actionable collection data to researchers. Next, the project expects to issue a roadmap for the broader international community in this space, Padilla said.

Helping libraries help publishers

The recent forum also featured digitization managers from the Internet Archive who are collaborating with partner libraries, including Tim Bigelow, Sophie Flynn-Piercy, Elizabeth MacLead, Andrea Mills and Jeff Sharpe. These librarians are at institutions big and small from the University of North Carolina at Chapel Hill to the Wellcome Trust in London, working with teams of professionally trained technicians to digitize collections.

One of those partnerships is taking an exciting new direction. The Boston Public Library’s partnership with the Archive began in 2007. Over the years, the team has completed digitization of the John Adams presidential library, Shakespeare’s First Folio (his 36 plays published in 1632), more than 17,000 government documents and the Houghton Mifflin trade book archival collection, according to Bigelow, the Northeast Regional digitization manager for the Archive.

The Houghton Mifflin collection includes 20,000 titles dating back to 1832, including some of the best known works in American fiction and children’s literature, such as books by Ralph Waldo Emerson and the Curious George series. The publisher gave BPL the entire physical collection for preservation (90% of which were out of print) and continues to add new titles as they are published. With the formal agreement of Houghton Mifflin, BPL and the Archive have been working together since 2017 to digitize every book—those in the public domain are completely readable and downloadable; those still in copyright are available through controlled digital lending (CDL).

Lawsuit updates

As in Boston, many libraries have embraced CDL. However, commercial publishers have challenged the practice.

Lila Bailey, senior policy counsel for the Archive, provided an update at the forum on the Hachette v. Internet Archive lawsuit, in which the court ruled in favor of the publishers in limiting the use of CDL. The Archive filed an appeal in September.  Bailey encouraged supporters to consider filing amicus briefs when the Archive’s case is expected to be reviewed by the appellate court.

For the Internet Archive—and libraries everywhere—to continue their work, the Archive is advocating for a legal infrastructure that ensures libraries can collect digital materials, preserve those materials in different formats, lend digital materials, and cooperate with other libraries.

“In our evolving digital society, will new technologies serve the public good, or only corporate interests?” Bailey asked in her remarks at the forum. “Libraries are on the front line of the fight to decide this question in favor of the public good. In order to maintain our age-old role as guardians of knowledge, we need our rights to own, lend and preserve books, as we all live more and more of our lives online.”

Join Us for Our Annual Celebration – Thursday, October 12!

We are just one week away from our annual celebration on Thursday, October 12! Party in the streets with us in person or celebrate with us online—however you choose to join in the festivities, be sure to grab your ticket now!

What’s in Store?

📚 Empowering Research: We’ll explore how research libraries like the Internet Archive are considering artificial intelligence in a live presentation, “AI @ IA: Research in the Age of Artificial Intelligence.” Come see how the Internet Archive is using AI to build new capabilities into our library, and how students and scholars all over the world use the Archive’s petabytes of data to inform their own research.

🏆 Internet Archive Hero Award: This year, we’re honored to recognize the incredible Connie Chan, our local District 1 supervisor, with the prestigious Internet Archive Hero Award. Supervisor Chan’s unwavering support for the digital rights of libraries culminated in a unanimously passed resolution at the Board of Supervisors, and we can’t wait to present her with this well-deserved honor live from our majestic Great Room. Join us in applauding her remarkable contributions!

🌮 Food Truck Delights: Arrive early and tantalize your taste buds with an assortment of treats from our gourmet food trucks.

💃 Street Party: After the ceremony, let loose and dance the night away to the tunes of local musicians, Hot Buttered Rum. Get ready to groove and celebrate under the starry San Francisco sky!

🎟️ Register now for in-person or online attendance!

Internet Archive’s Annual Celebration
Thursday, October 12 from 5pm – 10pm PT; program at 7pm PT
300 Funston Avenue, San Francisco
Register now for in-person or online attendance

Wrapping up Legal Literacies for Text and Data Mining – Cross-Border (LLTDM-X)

In August 2022, the UC Berkeley Library and Internet Archive were awarded a grant from the National Endowment for the Humanities (NEH) to study legal and ethical issues in cross-border text and data mining (TDM).

The project, entitled Legal Literacies for Text Data Mining – Cross-Border (“LLTDM-X”), supported research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or licensed content, or involves international research collaborations.

LLTDM-X is now complete, resulting in the publication of an instructive case study for researchers and white paper. Both resources are explained in greater detail below.

Project Origins

LLTDM-X built upon the previous NEH-sponsored institute, Building Legal Literacies for Text Data Mining. That institute provided training, guidance, and strategies to digital humanities TDM researchers on navigating legal literacies for text data mining (including copyright, contracts, privacy, and ethics) within a U.S. context.

A common challenge highlighted during the institute was the fact that TDM practitioners encounter expanding and increasingly complex cross-border legal problems. These include situations in which: (i) the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) the human subjects they are studying or who created the underlying content reside in another country; or, (iii) the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply.

Project Design

LLTDM-X was designed to identify and better understand the cross-border issues that digital humanities TDM practitioners face, with the aim of using these issues to inform prospective research and education. Secondarily, it was hoped that LLTDM-X would also suggest preliminary guidance to include in future educational materials. In early 2023, the project hosted a series of three online round tables with U.S.-based cross-border TDM practitioners and law and ethics experts from six countries. 

The round table conversations were structured to illustrate the empirical issues that researchers face, and also for the practitioners to benefit from preliminary advice on legal and ethical challenges. Upon the completion of the round tables, the LLTDM-X project team created a hypothetical case study that (i) reflects the observed cross-border LLTDM issues and (ii) contains preliminary analysis to facilitate the development of future instructional materials.

The project team also charged the experts with providing responsive and tailored written feedback to the practitioners about how they might address specific cross-border issues relevant to each of their projects.

Guidance & Analysis

Case Study

Extrapolating from the issues analyzed in the round tables, the practitioners’ statements, and the experts’ written analyses, the Project Team developed a hypothetical case study reflective of “typical” cross-border LLTDM issues that U.S.-based practitioners encounter. The case study provides basic guidance to support U.S. researchers in navigating cross-border TDM issues, while also highlighting questions that would benefit from further research. 

The case study examines cross-border copyright, contracts, and privacy & ethics variables across two distinct paradigms: first, a situation where U.S.-based researchers perform all TDM acts in the U.S., and second, a situation where U.S.-based researchers engage with collaborators abroad, or otherwise perform TDM acts in both U.S. and abroad.

White Paper

The LLTDM-X white paper provides a comprehensive description of the project, including origins and goals, contributors, activities, and outcomes. Of particular note are several project takeaways and recommendations, which the project team hopes will help inform future research and action to support cross-border text data mining. Project takeaways touched on seven key themes: 

  1. Uncertainty about cross-border LLTDM issues indeed hinders U.S. TDM researchers, confirming the need for education about cross-border legal issues; 
  2. The expansion of education regarding U.S. LLTDM literacies remains essential, and should continue in parallel to cross-border education; 
  3. Disparities in national copyright, contracts, and privacy laws may incentivize TDM researcher “forum shopping” and exacerbate research bias;
  4. License agreements (and the concept of “contractual override”) often dominate the overall analysis of cross-border TDM permissibility;
  5. Emerging lawsuits about generative artificial intelligence may impact future understanding of fair use and other research exceptions; 
  6. Research is needed into issues of foreign jurisdiction, likelihood of lawsuits in foreign countries, and likelihood of enforcement of foreign judgments in the U.S. However, the overall “risk” of proceeding with cross-border TDM research may remain difficult to quantify; and
  7. Institutional review boards (IRBs) have an opportunity to explore a new role or build partnerships to support researchers engaged in cross-border TDM.

Gratitude & Next Steps

Thank you to the practitioners, experts, project team, and generous funding of the National Endowment for the Humanities for making this project a success. 

We aim to broadly share our project outputs to continue helping U.S.-based TDM researchers navigate cross-border LLTDM hurdles. We will continue to speak publicly to educate researchers and the TDM community regarding project takeaways, and to advocate for legal and ethical experts to undertake the essential research questions and begin developing much-needed educational materials. And, we will continue to encourage the integration of LLTDM literacies into digital humanities curricula, to facilitate both domestic and cross-border TDM research.

[Note: this content is cross-posted on the Legal Literacies for Text and Data Mining project site and the UC Berkeley Library Update blog.]

Celebrating Wendy Hanamura, Internet Archive’s ‘Storyteller-in-Chief’

When Wendy Hanamura came to the Internet Archive nearly a decade ago, she used her talent as a journalist and  media professional to share the story of the organization with an ever-growing audience.

“The power of storytelling is a tool that cannot be underestimated,” Hanamura said.

As she retires this fall as Director of Partnerships, she leaves a lasting imprint on the Archive. Hanamura’s creativity and dedication helped build new connections, attract more donors, and advance the mission of universal access to all knowledge.

“She became the storyteller-in-chief. She’s helped the organization understand itself and communicate what we’ve done,” said Brewster Kahle, founder of the Internet Archive. “Through that, she has really helped shape the Internet Archive.”

Wendy at the Internet Archive’s annual celebration in 2017, which she produced.

During her tenure, Hanamura oversaw projects big and small that expanded the visibility and sustainability of the Archive. She stewarded relationships that moved the organization into new areas, such as controlled digital lending and the decentralized web. Along the way, Hanamura became known for her personal touch, warmly moderating discussions, mentoring young staffers, and extending her spirit of generosity to others.

“Wendy makes things work. She thinks things through, connects competent people, works to the highest standard, translates between different types of people, and is decisive and diplomatic,” says Jeff Ubois, of Lever for Change, a nonprofit affiliate of the MacArthur Foundation.

Ubois got to know Hanamura when he was at the MacArthur Foundation and she was spearheading the Archive’s proposal for the MacArthur 100&Change competition. Although the Archive wasn’t awarded the grant, it was one of eight semi-finalists of nearly 2,000 applicants for the grant. The major endeavor, Ubois said, required Hanamura to thoroughly imagine and manage a multi-step process, while enlisting the support of others to participate. “Her vision on the one hand and her implementation skills on the other are superpowers,” Ubois said.

Brewster Kahle, Wendy Hanamura and John Gonzalez supporting Internet Archive’s “100&Change” grant submission.

Hanamura began her career in New York as a reporter-researcher at Time Magazine, after graduating summa cum laude from Harvard University. She moved into broadcast journalism and worked as a correspondent in Tokyo for World Monitor on the Discovery Channel. In the San Francisco Bay area she worked at the local CBS affiliate, covering breaking news, and then as a producer at PBS. Hanamura ran an independent documentary company for 15 years, spent time as General Manager of independent tv network Link TV and was chief digital officer at KCET/Link before joining the Archive in 2014.

“I think of myself as a storyteller for change. What it comes down to is telling your story—your vision—convincingly and bringing others into that vision and finding ways to integrate them,” said Hanamura, 62, who lives in San Francisco. “That’s just always what I’ve done whether it be for Time Magazine, a television network, PBS or the Internet Archive.”

Preserving Important Voices

In building partnerships for the Internet Archive, Hanamura said the best pitch is always personal.

She would often hold up the book, Executive order 9066: The Internment of 110,000 Japanese Americans and explain how, after discovering it in her local library in sixth grade, it changed her life. The book is out of print and hard to find. When her son was taking a college class in Asian American identity, it would have been perfect for him. “But his generation believes that if it’s not online, it doesn’t exist,” Hanamura would say. “The only place he can access this book online is the Internet Archive.” She could go on to suggest there are more “valuable, precious” books that need to be available online to people everywhere in the world.

In a tribute to her father’s service in a WWII all-Japanese unit, Hanamura produced a documentary, “Honor Bound: A Personal Journey, the Story of the 100th/442nd Regimental Combat Team.” Hanamura also became involved in helping Densho, a Seattle-based nonprofit, build a Digital Library of Japanese American Incarceration materials at the Internet Archive. Her own mother was 14 when she was incarcerated in wartime camps and today, at 95, is one of the few remaining survivors—a story Hanamura told for the Archive’s 25th anniversary.

Digital Library of Japanese American Incarceration

“The people who knew and experienced the Japanese American incarceration are dying in great numbers,” Hanamura said. “I really wanted to make sure that the most important voices, the most important literature and research was captured and preserved for all time.”

Hanamura was able to secure support from the U.S. Department of Interior and National Park Services’ Japanese American Confinement Site program to partially fund the effort.

“Wendy’s love of history, community, and story made working with her and the Internet Archive a joy,” said Tom Ikeda, the founder of Densho who collaborated with Hanamura on the project. “[The Archive] was a natural partner to preserve and share over 1,000 video-recorded oral histories of Japanese Americans incarcerated during WWII.”

‘A Great Connector’

At the Archive, Kahle said that Hanamura explained not only the history of the organization and the dream of the internet as a library, but how it can help people with challenges in the world. She established online fundraising efforts that people cared about—cultivating thousands of donors through her work. Hanamura grew the Archive’s annual contributions from about $350,000 when she started to about $5 million when she turned over philanthropic operations in 2019.

Brewster Kahle and Wendy Hanamura at the DWeb Summit, 2018.

When it came time to celebrate the Archive’s 25th anniversary, it was Hanamura who coordinated the coverage, content and celebration. She also developed the “Way Forward” part of the campaign, an innovative approach to consider where the world would be in 25 years if access to knowledge was not protected.

“Wendy cares deeply about social issues and has a great sense of what might be done to make the world a little more creative, fair, open, and prosperous,” Ubois says.

Hanamura also cares about the rising generation of professionals in technology who will carry on this work in the coming years.  She has supported the decentralized web community and helped it coalesce–from holding Decentralized Web Summits beginning in 2016 to producing DWeb Camps starting in 2019.

“I admire her ability to be such a great connector, which comes from her acute ability in understanding people’s interests and strengths,” said Mai Ishikawa Sutton, who has worked alongside Hanamura as a co-organizer of the camp. “She has an empathetic ear to people as they talk about what they’re working on and what they’re struggling with. She’s a dynamic force when it comes to helping people who are building a better decentralized, resilient web.”

Held in California, the first camp in 2019 drew about 350 people, and by 2023 it had grown to more than 500 participants. From planning the technical sessions to the nitty-gritty logistics of catering and music, Hanamura’s leadership makes it happen, said Ishikawa Sutton.

Wendy and DWeb Camp organizers, 2019.

Catherine Stihler, CEO of Creative Commons, said Hanamura exudes kindness and empathy. “She has an ability to make you feel so welcome. She’s one of those people if you’re in a room of strangers, they aren’t strangers for long,” says Stihler, who has watched her energy, joy and inclusion bring people together at the DWeb camps.

“[Hanamura] is central to the success of this camp. She has her fingerprint on everything from top to bottom,” Ishikawa Sutton said. “She is able to really envision what it means to build a meaningful event where people can share their work, have fun, and be their full selves.”

As a result of the gatherings, many participants have built trust, collaborated on projects, shared grants and been hired—in part, thanks to Hanamura’s effort to connect people, according to Ishikawa Sutton, who is a co-founder and editor of COMPOST, an online magazine about the digital commons, and project manager of Distributed.Press. She has also played a critical role in making the DWeb organization financially sustainable, they added.

“What I love about the Internet Archive is that we are always looking to what’s next and over the horizon,” said Hanamura, who sees great potential in the young, idealistic supporters of the DWeb, calling it a “life-giving movement.”

Lasting Impact

Although Hanamura retired from her position at the Internet Archive Sept. 14, she plans to continue to be involved in the DWeb community as a volunteer.

Hanamura said her decision to step down as Director of Partnerships was prompted by her desire to give her full attention to caregiving for her husband, who has a terminal illness.

Among her final projects at the Archive was creating a meditation garden outside Funston Street building to provide a quiet space for reflection.

The meditation garden that Wendy designed & landscaped as a parting gift to Internet Archive staff and friends.

“Working at the Internet Archive has been a gift and such an education,” Hanamura said “Here, we have the mission to create lasting impact, lasting stories, and lasting artifacts. I am so grateful to the Internet Archive for giving me the opportunity to use my skills toward that end.”

Added Kahle: “The Internet Archive staff and patrons have felt the power of Wendy Hanamura spending her lifeforce building a library all these years.”

Internet Archive to Honor Supervisor Connie Chan with 2023 Hero Award

Announced today, Connie Chan, Supervisor of San Francisco’s District 1, will receive the 2023 Internet Archive Hero Award. Supervisor Chan will be presented the award on stage at next week’s evening celebration at the Internet Archive.

The Internet Archive Hero Award is an annual award that recognizes those who have exhibited leadership in making information available for digital learners all over the world. Previous recipients have included public domain advocate Carl Malamud, librarians Kanta Kapoor and Lisa Radha Vohra, copyright expert Michelle Wu, the Biodiversity Heritage Library, and the Grateful Dead.

In April, Supervisor Chan, whose district includes the Internet Archive, authored and unanimously passed a resolution at the San Francisco Board of Supervisors, backing the Internet Archive and the digital rights of all libraries. “At a time when we are seeing an increase in censorship and book bans across the country, we must move to preserve free access to information,” said Supervisor Chan, about the resolution. “I am proud to stand with the Internet Archive, our Richmond District neighbor, and digital libraries throughout the United States.”

Supervisor Connie Chan with Internet Archive’s Brewster Kahle and digital library supporters rally for the digital rights of libraries on the steps of San Francisco City Hall, April 19, 2023.

Many thanks to Supervisor Chan for being a strong advocate for libraries, and for making San Francisco the first municipality to codify the importance of digital libraries and controlled digital lending in a resolution. For this fearless act of standing with libraries, the Internet Archive is proud to honor Supervisor Connie Chan with the 2023 Internet Archive Hero Award.

Join us next week on Thursday, October 12 at 7pm PT, as Supervisor Chan accepts the award live on stage during our evening celebration. Tickets are available for in-person attendance or the livestream.

Leveraging Technology to Scale Library Research Support: ARCH, AI, and the Humanities

Kevin Hegg is Head of Digital Projects at James Madison University Libraries (JMU). Kevin has held many technology positions within JMU Libraries. His experience spans a wide variety of technology work, from managing computer labs and server hardware to developing a large open-source software initiative. We are thankful to Kevin for taking time to talk with us about his experience with ARCH (Archives Research Compute Hub), AI, and supporting research at JMU

Thomas Padilla is Deputy Director, Archiving and Data Services. 

Thomas: Thank you for agreeing to talk more about your experience with ARCH, AI, and supporting research. I find that folks are often curious about what set of interests and experiences prepares someone to work in these areas. Can you tell us a bit about yourself and how you began doing this kind of work?

Kevin: Over the span of 27 years, I have held several technology roles within James Madison University (JMU) Libraries. My experience ranges from managing computer labs and server hardware to developing a large open-source software initiative adopted by numerous academic universities across the world. Today I manage a small team that supports faculty and students as they design, implement, and evaluate digital projects that enhance, transform, and promote scholarship, teaching, and learning. I also co-manage Histories Along the Blue Ridge which hosts over 50,000 digitized legal documents from courthouses along Virginia’s Blue Ridge mountains.

Thomas: I gather that your initial interest in using ARCH was to see what potential it afforded for working with James Madison University’s Mapping Black Digital and Public Humanities project. Can you introduce the project to our readers? 

Kevin: The Mapping the Black Digital and Public Humanities project began at JMU in Fall 2022. The project draws inspiration from established resources such as the Colored Convention Project and the Reviews in Digital Humanities journal. It employs Airtable for data collection and Tableau for data visualization. The website features a map that not only geographically locates over 440 Black digital and public humanities projects across the United States but also offers detailed information about each initiative. The project is a collaborative endeavor involving JMU graduate students and faculty, in close alliance with JMU Libraries. Over the past year, this interdisciplinary team has dedicated hundreds of hours to data collection, data visualization, and website development.

Mapping the Black Digital and Public Humanities, project and organization type distribution

The project has achieved significant milestones. In Fall 2022, Mollie Godfrey and Seán McCarthy, the project leaders, authored, “Race, Space, and Celebrating Simms: Mapping Strategies for Black Feminist Biographical Recovery“, highlighting the value of such mapping projects. At the same time, graduate student Iliana Cosme-Brooks undertook a monumental data collection effort. During the winter months, Mollie and Seán spearheaded an effort to refine the categories and terms used in the project through comprehensive research and user testing. By Spring 2023, the project was integrated into the academic curriculum, where a class of graduate students actively contributed to its inaugural phase. Funding was obtained to maintain and update the database and map during the summer.

Looking ahead, the project team plans to present their work at academic conferences and aims to diversify the team’s expertise further. The overarching objective is to enhance the visibility and interconnectedness of Black digital and public humanities projects, while also welcoming external contributions for the initiative’s continual refinement and expansion.

Thomas: It sounds like the project adopts a holistic approach to experimenting with and integrating the functionality of a wide range of tools and methods (e.g., mapping, data visualization). How do you see tools like ARCH fitting into the project and research services more broadly? What tools and methods have you used in combination with ARCH?

Kevin: ARCH offers faculty and students an invaluable resource for digital scholarship by providing expansive, high-quality datasets. These datasets enable more sophisticated data analytics than typically encountered in undergraduate pedagogy, revealing patterns and trends that would otherwise remain obscured. Despite the increasing importance of digital humanities, a significant portion of faculty and students lack advanced coding skills. The advent of AI-assisted coding platforms like ChatGPT and GitHub CoPilot has democratized access to programming languages such as Python and JavaScript, facilitating their integration into academic research.

For my work, I employed ChatGPT and CoPilot to further process ARCH datasets derived from a curated sample of 20 websites focused on Black digital and public humanities. Utilizing PyCharm—an IDE freely available for educational purposes—and the CoPilot extension, my coding efficiency improved tenfold.

Next, I leveraged ChatGPT’s Advanced Data Analysis plugin to deconstruct visualizations from Stanford’s Palladio platform, a tool commonly used for exploratory data visualizations but lacking a means for sharing the visualizations. With the aid of ChatGPT, I developed JavaScript-based web applications that faithfully replicate Palladio’s graph and gallery visualizations. Specifically, I instructed ChatGPT to employ the D3 JavaScript library for ingesting my modified ARCH datasets into client-side web applications. The final products, including HTML, JavaScript, and CSV files, were made publicly accessible via GitHub Pages (see my graph and gallery on GitHub Pages)

Black Digital and Public Humanities websites, graph visualization

In summary, the integration of Python and AI-assisted coding tools has not only enhanced my use of ARCH datasets but also enabled the creation of client-side web applications for data visualization.

Thomas: Beyond pairing ChatGPT with ARCH, what additional uses are you anticipating for AI-driven tools in your work?

Kevin: AI-driven tools have already radically transformed my daily work. I am using AI to reduce or even eliminate repetitive, mindless tasks that take tens or hundreds of hours. For example, as part of the Mapping project, ChatGPT+ helped me transform an AirTable with almost 500 rows and two dozen columns into a series of 500 blog posts on a WordPress site. ChatGPT+ understands the structure of a WordPress export file. After a couple of hours of iterating through my design requirements with ChatGPT, I was able to import 500 blog posts into a WordPress website. Without this intervention, this task would have required over a hundred hours of tedious copying and pasting. Additionally, we have been using AI-enabled platforms like Otter and Descript to transcribe oral interviews.

I foresee AI-driven tools playing an increasingly pivotal role in many facets of my work. For instance, natural language processing could automate the categorization and summarization of large text-based datasets, making archival research more efficient and our analyses richer. AI can also be used to identify entities in large archival datasets. Archives hold a treasure trove of artifacts waiting to be described and discovered. AI offers tools that will supercharge our construction of finding aids and item-level metadata.  

Lastly, AI could facilitate more dynamic and interactive data visualizations, like the ones I published on GitHub Pages. These will offer users a more engaging experience when interacting with our research findings. Overall, the potential of AI is vast, and I’m excited to integrate more AI-driven tools into JMU’s classrooms and research ecosystem.

Thomas: Thanks for taking the time Kevin. To close out, whose work would you like people to know more about? 

Kevin: Engaging in Digital Humanities (DH) within the academic library setting is a distinct privilege, one that requires a collaborative ethos. I am fortunate to be a member of an exceptional team at JMU Libraries, a collective too expansive to fully acknowledge here. AI has introduced transformative tools that border on magic. However, loosely paraphrasing Immanuel Kant, it’s crucial to remember that technology devoid of content is empty. I will use this opportunity to spotlight the contributions of three JMU faculty whose work celebrates our local community and furthers social justice.

Mollie Godfrey (Department of English) and Seán McCarthy (Writing, Rhetoric, and Technical Communication) are the visionaries behind two inspiring initiatives: the Mapping Project and the Celebrating Simms Project. The latter serves as a digital, post-custodial archive honoring Lucy F. Simms, an educator born into enslavement in 1856 who impacted three generations of young students in our local community. Both Godfrey and McCarthy have cultivated deep, lasting connections within Harrisonburg’s Black community. Their work strikes a balance between celebration and reparation. Collaborating with them has been as rewarding as it is challenging.

Gianluca De Fazio (Justice Studies) spearheads the Racial Terror: Lynching in Virginia project, illuminating a grim chapter of Virginia’s past. His relentless dedication led to the installation of a historical marker commemorating the tragic lynching of Charlotte Harris. De Fazio, along with colleagues, has also developed nine lesson plans based on this research, which are now integrated into high school curricula. My collaboration with him was a catalyst for pursuing a master’s degree in American History.

Racial Terror: Lynching in Virginia

Both the Celebrating Simms and Racial Terror projects are highlighted in the Mapping the Black Digital and Public Humanities initiative. The privilege of contributing to such impactful projects alongside such dedicated individuals has rendered my extensive tenure at JMU both meaningful and, I hope, enduring.

Book Talk: The Internet Con by Cory Doctorow

Join us for a virtual book talk with author Cory Doctorow about THE INTERNET CON, the disassembly manual we need to take back our internet.

REGISTER NOW

When the tech platforms promised a future of “connection,” they were lying. They said their “walled gardens” would keep us safe, but those were prison walls.

The platforms locked us into their systems and made us easy pickings, ripe for extraction. Twitter, Facebook and other Big Tech platforms hard to leave by design. They hold hostage the people we love, the communities that matter to us, the audiences and customers we rely on. The impossibility of staying connected to these people after you delete your account has nothing to do with technological limitations: it’s a business strategy in service to commodifying your personal life and relationships.

We can – we must – dismantle the tech platforms. In The Internet Con, Cory Doctorow explains how to seize the means of computation, by forcing Silicon Valley to do the thing it fears most: interoperate. Interoperability will tear down the walls between technologies, allowing users leave platforms, remix their media, and reconfigure their devices without corporate permission.

Interoperability is the only route to the rapid and enduring annihilation of the platforms. The Internet Con is the disassembly manual we need to take back our internet.

REGISTER NOW

ABOUT THE AUTHOR
CORY DOCTOROW is a science fiction author, activist and journalist. He is the author of many books, most recently RADICALIZED and WALKAWAY, science fiction for adults; HOW TO DESTROY SURVEILLANCE CAPITALISM, nonfiction about monopoly and conspiracy; IN REAL LIFE, a graphic novel; and the picture book POESY THE MONSTER SLAYER. His latest book is ATTACK SURFACE, a standalone adult sequel to LITTLE BROTHER. In 2020, he was inducted into the Canadian Science Fiction and Fantasy Hall of Fame. He works for the Electronic Frontier Foundation, is a MIT Media Lab Research Affiliate, is a Visiting Professor of Computer Science at Open University, a Visiting Professor of Practice at the University of North Carolina’s School of Library and Information Science and co-founded the UK Open Rights Group.

Book Talk: The Internet Con by Cory Doctorow
Tuesday, October 31 @ 10am PT / 1pm ET
Register now for the virtual discussion!

Academic Librarian Leans on Internet Archive for Access and Analysis

For Meghan Kwast, having access to the Internet Archive helps her library staff at California Lutheran University operate more efficiently to better serve faculty and students.  

Meghan Kwast, head of collection management services, California Lutheran University

Budgets and staffing limitations have forced Kwast to come up with some creative strategies to meet the needs of users. This includes tapping into the digital resources available through the Internet Archive—especially when there are requests for items not in the university stacks.

“While Interlibrary Loan is available for most scholars, delivery times can vary from a few days to several weeks,” said Kwast, head of collection management services at Cal Lutheran in Thousand Oaks, California. “For researchers and scholars, this is time lost. Internet Archive saves them from these delays.”

The broader, virtual collection often includes niche subjects titles that the Cal Lutheran library doesn’t carry. Also, providing digital, rather than print materials, reduces ILL shipping costs and avoids problems with physical deliveries due to weather, Kwast added.

‘A USEFUL TOOL’

For librarians like Kwast, the collections at the Internet Archive are helpful beyond connecting patrons with research materials. The Archive has been a useful tool in a campus project to evaluate the diversity of the Cal Lutheran print monograph collection.

Cal Lutheran enrolls about 3,200 undergraduate and graduate students in their College of Arts and Sciences, Bachelor’s Degree for Professionals, Graduate School of Education, School of Management, Graduate School of Psychology, and Pacific Lutheran Theological Seminary programs. The university operates across southern California, with its main campus in Thousand Oaks and satellite centers in Oxnard, Santa Maria and Westlake Village. The campus demographics have changed since it was founded in 1959—now students come from 59 countries, and the university is designated as a Hispanic Serving institution.

Kwast said she wanted to be intentional about ensuring the library collection reflects the current student population. Last year, the library embarked on an audit of authors represented in its collection. As Kwast’s team began to evaluate the authors, they relied on the Archive’s search engine to find books digitally, rather than having to physically pull them off the shelves.

“Internet Archive makes that process faster and more efficient for us,” Kwast said. “Having these materials digitized makes this project achievable. It makes it possible for us to serve today’s students.”

“The voices in our collection should reflect the voices on our campus, helping students see themselves in the research process and the sources they use.”

Meghan Kwast, head of collection management services, California Lutheran University

It was evident early in the assessment that most titles were written by white, cisgender men. Now, about halfway through the review, Kwast said the library discovered just 2 percent of authors were Hispanic/Latino, yet about 40 percent of the Cal Lutheran population identifies as Hispanic/Latino.

 “Some students from these communities are still trying to see themselves in higher education or in the field that they’re pursuing. The voices in our collection should reflect the voices on our campus, helping students see themselves in the research process and the sources they use,” Kwast said. “Where our collections are now is not reflective of where our community is.”

 Based on what was discovered in the author assessment, this fiscal year Cal Lutheran created a new item in its library budget specifically for purchasing books written by authors who are diverse by race, ethnicity, gender, sexuality, and ability. The library also started a diverse authors table to highlight some of these works, Kwast noted.

EQUITABLE POINTS OF ACCESS

The Internet Archive’s vast collection of digital resources is more needed than ever, Kwast added. During the pandemic, with limited access to their buildings, the Archive helped Cal Lutheran keep their library users connected. “Electronic resources and digital access to information are critical for public safety,” Kwast said.

Today, public libraries still have barriers to accessing materials, Kwast noted. Many of them require patrons to come on-site after registering for a card to verify identification and residence. For those without a home or those who work during normal business hours, this is an insurmountable challenge. Internet Archive removes some of those obstacles by providing 24-7 remote access from any location.

Documents that should be publicly available, such as those produced by Congress and public universities, are instead hidden behind paywalls and layers of complication, Kwast said. Internet Archive helps provide equitable points of access to information, which is a necessity today, Kwast said, regardless of a user’s income or ability.

“As librarians and information professionals, we are dealing with an information landscape that a lot of folks take for granted,” Kwast said, as digital collections are constantly changing with licensing limitations. “Just because [access] is not a problem for you as an individual does not mean it isn’t a very real issue that other folks face in their daily lives.”

Brewster Goes to Washington – Congressional Hearing on the Copyright Office Modernization Committee

A good day in Washington.   After two years of being on the Copyright Office Modernization Committee, helping advise the Copyright Office on their new registration and recordation process, a republican and a democrat from the House of Representatives held a hearing to ask questions of committee members. It was such a refreshing scene because it was bipartisan, they knew the issues, and they were spending time finding out what we suggested.

This all matters because the Copyright Office is moving to filings being digital, which is an improvement, and because it could make way for efficient submissions of digital files.   This would be a major way for the Library of Congress to get copies of books they would own, preserve, and make somewhat accessible.

Another attendee said they had gone to congressional meetings for 30 years and this one had the most engagement of any of them.  A good day in Washington, indeed.

Internet Archive is a Digital Oasis for Book and Music Lovers on Remote Vermont Island

Image: islelamotte.us

Living in the middle of Lake Champlain in Vermont, Eleanor Martinez says she enjoys the beautiful scenery all around, especially the fall foliage. It’s been an idyllic place to retire, but there is one thing she misses: a public library.

Martinez, and her husband, Sid, live on Isle La Motte, which is 7 miles long and 2 miles wide, accessible by one bridge and has a population of 400. There is a library on the island, but it is private, and open by appointment only. The public libraries in nearby towns have limited collections.

“The Internet Archive has been a lifesaver,” says Martinez, who discovered the online collection about two years ago. She’s a regular user of the virtual library, checking out books and music on her laptop in the comfort of her rural home.

The wooded, nine-acre property was a draw for the retirees, who relocated in 2018, but it is remote. In the winter, it can sometimes take more than a week for a snowplow to reach their gravel road. Martinez, 66, lived most of her life in more urban areas in California and Minnesota where she enjoyed large, metropolitan public libraries nearby. The Internet Archive has provided access to materials she would not otherwise be able to enjoy in her small town.

Martinez has tapped into the Internet Archive to check out books, from “The Modern Temper” by Joseph Wood Krutch to “The Theory of the Leisure Class” by Thorsten Veblen. She enjoys vintage cookbooks, books on gardening, knitting and poetry.

Martinez found Down Beat magazines dating back to the 1930s about the jazz and blues scene. She’s also discovered music not available elsewhere on vinyl or CD.

“I was able to check out 33-1/3 records and 78s, too,” Martinez said. “This is a boon to those of us who don’t have access to large collections of records, and for those of us who are low-income and living on a fixed income.”

One of her favorite music items is “In a Clock Store,” a novelty recording from 1907 that includes sounds from a clock in the background. “I’m listening to something that is from a time when my grandfather would have been a teenager,” she said. “It was a different world.”

Another copy of that 78rpm recording shines a light on the importance of digitizing and preserving recordings on the obsolete medium—notes made by the audio engineer at the time of digitization indicate that the second side of this record wasn’t able to be preserved “due to physical condition of disc.”

After a pause, Martinez added a final thought: “The Internet Archive has just about everything I’ve been looking for—even things that are pretty obscure. It’s amazing.”