This Spring, the Internet Archive hosted two in-person workshops aimed at helping to advance library support for web archive research: Digital Scholarship & the Web and Art Resources on the Web. These one-day events were held at the Association of College & Research Libraries (ACRL) conference in Pittsburgh and the Art Libraries Society of North America (ARLIS) conference in Mexico City. The workshops brought together librarians, archivists, program officers, graduate students, and disciplinary researchers for full days of learning, discussion, and hands-on experience with web archive creation and computational analysis. The workshops were developed in collaborationwith the New York Art Resources Consortium (NYARC) – and are part of an ongoing series of workshops hosted by the Internet Archive through Summer 2023.
Internet Archive Deputy Director of Archiving & Data Services Thomas Padilla discussing the potential of web archives as primary sources for computational research at Art Resources on the Web in Mexico City.
Designed in direct response to library community interest in supporting additional uses of web archive collections, the workshops had the following objectives: introduce participants to web archives as primary sources in context of computational research questions, develop familiarity with research use cases that make use of web archives; and provide an opportunity to acquire hands-on experience creating web archive collections and computationally analyzing them usingARCH (Archives Research Compute Hub) – a new service set to publicly launch June 2023.
Internet Archive Community Programs Manager Lori Donovan walking workshop participants through a demonstration of Palladio using a dataset generated with ARCH at Digital Scholarship & the Web In Pittsburgh, PA.
In support of those objectives, Internet Archive staff walked participants through web archiving workflows, introduced a diverse set of web archiving tools and technologies, and offered hands-on experience building web archives. Participants were then introduced to Archives Research Compute Hub (ARCH). ARCH supports computational research with web archive collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. ARCH does this by streamlining generation and access to more than a dozen research ready web archive datasets, in-browser visualization, dataset analysis, and open dataset publication. Participants further explored data generated with ARCH in Palladio, Voyant, and RAWGraphs.
Network visualization of the Occupy Web Archive collection, created using Palladio based on a Domain Graph Dataset generated by ARCH.
Gallery visualization of the CARTA Art Galleries collection, created using Palladio based on an Image Graph Dataset generated by ARCH.
At the close of the workshops, participants were eager to discuss web archive research ethics, research use cases, and a diverse set of approaches to scaling library support for researchers interested in working with web archive collections – truly vibrant discussions – and perhaps the beginnings of a community of interest! We plan to host future workshops focused on computational research with web archives – please keep an eye on our Event Calendar.
As Twitter has entered the Musk era, many people are leaving the platform or rethinking its role in their lives. Whether they join another platform like Mastodon (as I have) or continue on at Twitter, the instability occasioned by Twitter’s change in ownership has revealed an underlying instability in our digital information ecosystem.
Many have now seen how, when someone deletes their Twitter account, their profile, their tweets, even their direct messages, disappear. According to the MIT Technology Review, around a million people have left so far, and all of this information has left the platform along with them. The mass exodus from Twitter and the accompanying loss of information, while concerning in its own right, shows something fundamental about the construction of our digital information ecosystem: Information that was once readily available to you—that even seemed to belong to you—can disappear in a moment.
Losing access to information of private importance is surely concerning, but the situation is more worrying when we consider the role that digital networks play in our world today. Governments make official pronouncements online. Politicians campaign online. Writers and artists find audiences for their work and a place for their voice. Protest movements find traction and fellow travelers. And, of course, Twitter was a primary publishing platform of a certain U.S. president.
If Twitter were to fail entirely, all of this information could disappear from their site in an instant. This is an important part of our history. Shouldn’t we be trying to preserve it?
I’ve been working on these kinds of questions, and building solutions to some of them, for a long time. That’s part of why, over 25 years ago, I founded the Internet Archive. You may have heard of our “Wayback Machine,” a free service anyone can use to view archived web pages from the mid-1990’s to the present. This archive of the web has been built in collaboration with over a thousand libraries around the world, and it holds hundreds of billions of archived webpages today–including those presidential tweets (and many others). In addition, we’ve been preserving all kinds of important cultural artifacts in digital form: books, television news, government records, early sound and film collections, and much more.
The scale and scope of the Internet Archive can give it the appearance of something unique, but we are simply doing the work that libraries and archives have always done: Preserving and providing access to knowledge and cultural heritage. For thousands of years, libraries and archives have provided this important public service. I started the Internet Archive because I strongly believed that this work needed to continue in digital form and into the digital age.
While we have had many successes, it has not been easy. Like the record labels, many book publishers didn’t know what to make of the internet at first, but now they see new opportunities for financial gain. Platforms, too, tend to put their commercial interests first. Don’t get me wrong: Publishers and platforms continue to play an important role in bringing the work of creators to market, and sometimes assist in the preservation task. But companies close, and change hands, and their commercial interests can cut against preservation and other important public benefits.
Traditionally, libraries and archives filled this gap. But in the digital world, law and technology make their job increasingly difficult. For example, while a library could always simply buy a physical book on the open market in order to preserve it on their shelves, many publishers and platforms try to stop libraries from preserving information digitally. They may even use technical and legal measures to prevent libraries from doing so. While we strongly believe that fair use law enables libraries to perform traditional functions like preservation and lending in the digital environment, many publishers disagree, going so far as to sue libraries to stop them from doing so.
We should not accept this state of affairs. Free societies need access to history, unaltered by changing corporate or political interests. This is the role that libraries have played and need to keep playing. This brings us back to Twitter.
In 2010, Twitter had the tremendous foresight of engaging in a partnership with the Library of Congress to preserve old tweets. At the time, the Library of Congress had been tasked by Congress “to establish a national digital information infrastructure and preservation program.” It appeared that government and private industry were working together in search of a solution to the digital preservation problem, and that Twitter was leading the way.
It was not long before the situation broke down. In 2011, the Library of Congress issued a report noting the need for “legal and regulatory changes that would recognize the broad public interest in long-term access to digital content,” as well as the fact that “most libraries and archives cannot support under current funding” the necessary digital preservation infrastructure.” But no legal and regulatory changes have been forthcoming, and even before the 2011 report, Congress pulled tens of millions of dollars out of the preservation program. In these circumstances, it is perhaps unsurprising that, by 2017, the Library of Congress had ceased preserving most old tweets, and the National Digital Information Infrastructure and Preservation Program (NDIIPP) is no longer an active program at the Library of Congress. Furthermore, it is not clear whether Twitter’s new ownership will take further steps of its own to address the situation.
Whatever Musk does, the preservation of our digital cultural heritage should not have to rely on the beneficence of one man. We need to empower libraries by ensuring that they have the same rights with respect to digital materials that they have in the physical world. Whether that means archiving old tweets, lending books digitally, or even something as exciting (to me!) as 21st century interlibrary loan, what’s important is that we have a nationwide strategy for solving the technical and legal hurdles to getting this done.
Internet Archive has begun gathering content for the Digital Library of Amateur Radio and Communications (DLARC), which will be a massive online library of materials and collections related to amateur radio and early digital communications. The DLARC is funded by a significant grant from the Amateur Radio Digital Communications (ARDC), a private foundation, to create a digital library that documents, preserves, and provides open access to the history of this community.
The library will be a free online resource that combines archived digitized print materials, born-digital content, websites, oral histories, personal collections, and other related records and publications. The goals of the DLARC are to document the history of amateur radio and to provide freely available educational resources for researchers, students, and the general public. This innovative project includes:
A program to digitize print materials, such as newsletters, journals, books, pamphlets, physical ephemera, and other records from both institutions, groups, and individuals.
A digital archiving program to archive, curate, and provide access to “born-digital” materials, such as digital photos, websites, videos, and podcasts.
A personal archiving campaign to ensure the preservation and future access of both print and digital archives of notable individuals and stakeholders in the amateur radio community.
Conducting oral history interviews with key members of the community.
Preservation of all physical and print collections donated to the Internet Archive.
The DLARC project is looking for partners and contributors with troves of ham radio, amateur radio, and early digital communications related books, magazines, documents, catalogs, manuals, videos, software, personal archives, and other historical records collections, no matter how big or small. In addition to physical material to digitize, we are looking for podcasts, newsletters, video channels, and other digital content that can enrich the DLARC collections. Internet Archive will work directly with groups, publishers, clubs, individuals, and others to ensure the archiving and perpetual access of contributed collections, their physical preservation, their digitization, and their online availability and promotion for use in research, education, and historical documentation. All collections in this digital library will be universally accessible to any user and there will be a customized access and discovery portal with special features for research and educational uses.
We are extremely grateful to ARDC for funding this project and are very excited to work with this community to explore a multi-format digital library that documents and ensures access to the history of a specific, noteworthy community. Anyone with material to contribute to the DLARC library, questions about the project, or interest in similar digital library building projects for other professional communities, please contact:
Kay Savetz, K6KJN Program Manager, Special Collections email@example.com Twitter: @KaySavetz
This post is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/
What is an archivist to do when items of public record, which have been systematically added to publicly accessible collections for over a century, suddenly turn from paper into bits and bytes that disappear from the web, or even get stuck behind paywalls? Like many in my profession, I’ve been grappling with this question for a while. Having no real training in digital archiving and facing this quandary as a lone arranger, it’s sometimes hard to keep that grappling from turning into low-key panicking that my inaction has been causing information to be lost forever.
Imagine my excitement, then, when I learned about the Community Webs program – access to and training for Archive-It, collaboration with the Internet Archive, and a network of others like me to bounce ideas off and get inspiration from? Yes please! With the blessing of my boss, I applied right away and my library joined the program in April 2021.
(This might be a good point for a quick introduction. I work as the archivist/local history librarian at the Waltham Public Library (WPL) in Waltham, Massachusetts. Waltham is a city about 10 miles west of Boston, and is home to an ethnically and economically diverse population of just over 62,000 people. The WPL is a fully-funded community hub, fostering a healthy democratic society by providing a wealth of current informational, educational, and recreational resources free of charge to all members of the community. The library is known throughout the area for its knowledgeable and friendly staff, welcoming and safe environment, accessibility, convenience, current technology, and helpful assistance.)
I eagerly dove into the program and used our first web-archive collection – Waltham Public Library – as a testing ground, a place to gain familiarity with both Archive-It and the whole process of web archiving. I’ve been trying to capture content that aligns with the material found in the library’s analog records – annual reports, policies, announcements, event flyers, records from our Friends group, etc. – by doing a weekly crawl of the library website, our Friends website, and the library’s Twitter feed. For the most part this collection has been thankfully pretty straightforward.
Our largest collection so far is COVID-19 in Waltham, which makes up a portion of the library’s very first born-digital archival collection. That collection began in April 2020, when the WPL (like most other places) was closed to help “flatten the curve.” A month or two prior, as the pandemic was building steam, I had become fascinated with the 1918 influenza. A poke through our archives for the topic had been disappointing, as there wasn’t too much beyond a couple of newspaper clippings, brief mentions in the library trustees’ minutes, and a few pages in the records of the local nurses’ association. I was hoping to put together a better picture of what it was like to live in Waltham during the flu, perhaps to give myself a glimpse of what I could expect in the coming weeks (heh… how naïve I was).
I put out a call via the library’s social media for those who lived, worked, and/or went to school in Waltham to share their stories, hoping to build the kind of collection I wanted and failed to find from 1918. There was an initial rush of Google Form submissions, a handful of photos, and one video, and then nothing. I was pleased we had received some materials, but still wanted to paint a broader picture of Waltham under Covid. Enter Community Webs! For the past several months I’ve been working to collect retroactively what I was hoping to capture at the time – news articles, videos, the city website, information from the schools, and so on. While it’s not as comprehensive as it might have been if I’d been able to gather it all as it happened, I’ve been able to save over 500 GB of data that will help those in the future to better imagine what it was like to live in Waltham during Covid.
Finally, related to the quandary in the first paragraph of this post, our most complicated collection is the Waltham News Tribune. The WPL has microfilm copies of the paper going back to its earliest iteration in the 1860s, and part of my job has been to collect each issue and send yearly batches to a vendor for microfilming. However, as of this past May, the publisher has moved the paper entirely online, with some content requiring a paid subscription to view. The WPL has a subscription so that we can continue to provide free access to our patrons, but what happens to our archive of back issues? Does it just stop abruptly in May 2022, even as time and local news continue to march on? As it is, our microfilm is heavily used, especially since the paper’s offices burned down in 1999, making ours the only existing archive.
Thanks to web archiving, we’re able to continue to fulfill our unofficial role as the repository for the city newspaper, at least in theory. In practice, I look at the daily crawls of the digital edition of the paper and can’t help but see that it is no longer the type of local news we’ve been archiving for over a century. The corporate publisher of the paper has consolidated ours with those from several other local cities and towns, and has sacrificed true local news coverage for more generic topics, many of which aren’t even related specifically to Massachusetts. This is a problem that sits well outside of my archives wheelhouse, but at least I feel I can do my due diligence by capturing what local news does trickle through.
I’ve had a slower go of web archiving than I’d like so far, thanks to several months of parental leave in 2021 and a very packed part-time work schedule. Nevertheless, I’ve been chipping away at our collections and planning for more, with an eye to add more diverse voices than those that make up much of our analog collections. I’m grateful for the encouragement and help I’ve received from Community Webs staff and peers, and want to give a special shout-out to the Archive-It folks who hold office hours to assist us with technical issues! This really is a fantastic program, and I’m so glad my library is part of it.
Guest Post by: Tricia Dean, Tech Services Manager at Wilmington Public Library District (IL)
This post is part of a series written by members of the Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/
I was excited when I saw the call for participants in Community Webs. While Wilmington, Illinois is a small, rural town (5,664 people), the thought was that we still had something to contribute. Most Archive-It partners are universities, museums and large libraries, and being in their company was a little daunting to me initially. Other institutions have someone who opens the project, and then it develops into a larger team project. Wilmington Public Library District (WPLD) has a much smaller staff; the project has been wholly mine, which has been both thrilling and terrifying.
Wilmington is a small rural town, falling on the lower end of the economic scale. Because we are isolated,the library plays a vital part in the community. We offer the usual storytimes and adult programs, but also loan out hotspots and ChromeBooks. We have 45 hotspots and these are almost always checked out; some people are using them for vacations, but by usage it is apparent that others are using them as their primary means of connecting to the Internet. Internet access has been more and more important, but after the Covid-19 broke out, more governmental services went strictly online, making access even more critical – and to many who had not been regular patrons. WPLD is a hub for the community, offering computers, information, tax forms, and a place to come in and chat – even more important when we are trying to stay close and limit outside contact.
I am a Chicago native who went to Champaign-Urbana for grad school. I was a scanner for the Internet Archive for several years where I was privileged to handle some incunabula (pre-1500 items). I am the Technical Services Supervisor at Wilmington; primarily I catalog our materials, but I also tend toward Projects, from adding series labels to re-orienting all the calls in the juvenile non-fiction section. I am currently going through our attic to help determine what we have (it’s a Mystery!). I’m making lists, and hoping to have items to scan which would be available online, in multiple places. I applied for the Community Webs program (with my director’s blessing) because I felt that it’s important for small towns to be represented in the collection of history. Only 20% of the population still lives outside major metro areas, but it is every bit as important to capture that life as it is to retain the history of large cities.
Wilmington Library joined Community Webs in the summer of 2021. After some technical clarifications with the Archive-It staff WLPD was set up. In considering what made Wilmington unique, the first link was to our library and social media pages. Social media has grown in importance in the last twenty years, but it became a vital link during Covid when services were otherwise unavailable. Wilmington Library YouTube videos, how-tos, crafts and storytime, stand to remind us of how we responded and as a continuing reference for parents who can’t get to the library. But since social media, specifically, is known for ‘right now,’ it lacks the kind of reflection over time that we can create through the Community Webs project.
We may be small, but we have a number of historical articles and sites which needed to be brought together. We want to reflect events that have been impactful to our community, from the explosion of the Joliet Armory in the 1940s to the continuing issues with the Wilmington Dam, which has proved dangerous, but has complicated ownership issues. I still have a long way to go; the projects (attic/local history/web archive) are all intertwined. Wilmington has the usual Community Resources and City Government collections in Archive-It. Going forward, we want to continue to develop our Wilmington History collection. We are working on local history and will establish a collection of materials from our attic and public donations. Our local paper has vertical files which could be a goldmine of information – again, on my to-do list. We will be kicking off an Oral History Project, which will begin with a series of simple gatherings/coffee hours for our seniors, providing a place for them to gather, and a space to share their stories. I am hoping these will be in our Community Webs archive. Who better to speak to where we’ve been and where we are than some of our oldest residents?
Why is Community Webs important? Because it will help to remember when we cannot keep up with the information overload. Because there is so much happening that we miss a good deal of what is around us – or can’t bear to face it for long. Because so very very much of our lives are now online – and can be erased with a keystroke. Because we are seeing, painfully, that those who do not learn from the past will be/are condemned to re-live it. And, for Wilmington, I think it is important because so many of the voices and sites being captured are from museums, universities and large public libraries. It is important that we remember that we used to be far less urban than we are today. It is important to remember the smaller places, those who are too easily lost in the maelstrom of modern life, because to be forgotten is to be erased.
This post is part of a series written by members of Internet Archive’s Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/
Can you describe your community and the services and role of your organization within the community?
Inuit Circumpolar Council (ICC) Alaska works on behalf of the Inupiat of the North Slope, Northwest and Bering Straits Regions; St. Lawrence Island Yupik; and the Central Yup’ik and Cup’ik of the Yukon-Kuskokwim Region in Southwest Alaska. ICC Alaska is a national member of ICC International. Since inception in 1977, ICC has gained consultative status II with the United Nations, and is a Permanent Participant of the Arctic Council.
For example, ICC has provisional status with the International Maritime Organization (IMO), is an active member at the Arctic Council senior level and within the working groups and is a prominent voice at the UN Framework Convention on Climate Change (UNFCCC). Work and engagement occur in many ways at these different Fora. Within the UNFCCC, ICC has taken a leadership role in putting forward Indigenous Knowledge and establishing a platform for providing equitable space for multiple knowledge systems. Additionally, at the UNFCCC COP 26, ICC Chair, Dr. Dalee Sambo Dorough, led an ICC delegation made up of Inuitrepresentatives from across the Arctic.
An immense amount of work occurs in direct partnership with Inuit communities to inform work at international fora. For example, ICC is facilitating the development of international protocols for Equitable and Ethical Engagement. These protocols will provide a pathway to success for all that want to work within Inuit homelands and whose work impacts the Arctic. The protocols will aid in a paradigm shift in how work, decisions, and policies are currently created and carried out. The paradigm shift will lead toward greater equity and recognition of Inuit sovereignty and Self-determination.
Why was your organization interested in participating in Community Webs?
The Community Webs program was attractive to ICC because it provided the training and the storage to effectively preserve ICC’s digitized & born-digital archival materials. We were pleased to see this offering as a solution for an ongoing desire to archive the prolific organization’s digital materials & products. This work dovetails nicely with ICC Alaska’s efforts to digitize 47 boxes, or around 80 linear feet of material that span 6 decades, including audio, film, photographic media, and paper documents.
ICC Jam – part 2 – Greenland
Cultural programming as part of the 1983 General Assembly. In this clip, view performances from Greenland’s Tuktak Theater and a Greenlandic choir
ICC advocates for Inuit and Inuit way of life, highlighted by ICC’s General Assembly meetings. The ICC receives its mandate from a General Assembly held every four years. The General Assembly is the heart of the organization, providing an opportunity for sharing information, discussing common concerns, debating issues, and strengthening the unity between all Inuit across our homelands. Through the Community Webs project, ICC Alaska has been able to preserve archival video of the ICC General Assemblies going back 30 years using Archive-It and the Internet Archive, as well as all newsletters, press releases, resolutions, social media campaigns, and reports published on its website. These are a significant record of ICC advocacy, but more importantly, Inuit political and cultural heritage.
Why do you think it is important for public libraries, community archives, and other local and community-based organizations to do this work?
Community-based organizations are uniquely positioned as both a part of and apart from the community. This vantage point allows for the self-reflection and observation needed for web archiving, as well as the relationships within the community to create the space and dialogue needed for community archiving projects. By building more capacity within community-based organizations for web archiving and digital preservation efforts, we can expand the recorded historical narrative and humanities-based inquiries in a multitude of directions, to truly reflect the diversity of our world & time.
Where do you hope to see your web archiving program going?
The core goal of this work is to make ICC documents and its historical narrative more accessible and discoverable within ICC, to ICC’s member organizations, international bodies, and researchers, our aspirations are much bigger. Our hope is that this web archive goes beyond the core goal to inspire, delight, hearten, inform, and add depth to the conversations Inuit are having about cultural identity, relationship to the land, hunting, advocacy, self-determination, and self-governance.
We are curious about the intangible outcomes: What new work does the archive inspire? How does the archive add depth & historical weight to existing projects, discussions, and advocacy? What stories and knowledge gets re-remembered, or re-investigated after viewing archival materials? What advocacy, ethics, and philosophical works come from Inuit leaders informed by the legacy that the archive shared? Are youth leaders interested in adding to the archive?
Is there anything you would like your organization to contribute back to the broader community of web archiving and/or local history in the form of documentation, workflows, policy drafts or other resources?
We have several aspirations. Firstly, it is the telling of Inuit stories. The archive is another manifestation of that mission – to record and share Inuit voices across time. To increase access to those voices, information, knowledge, and history. The ICC Archival holdings are a historically unique & culturally significant telling of Inuit cultural heritage, history (including political history), educational pedagogy, philosophy, self-determination, values, ethics, environmental stewardship, and Indigenous Knowledge. It is important to create a way for Inuit to discover and interact with this work. Community Webs has offered a new tool in our toolkit.
Secondly, the goal is to move forward conversations about categorization and information management for indigenous communities. What does that look like in best practice? Can we, together with other Inuit archives, improve on existing practices to create a more equitable and ethical engagement with Inuit-produced information, the management of that information, and the discovery and access of that information.
What are you most excited to learn through your participation in Community Webs?
It was exciting to discover that many Inuit and Alaska Native resources that have already been preserved using the Internet Archive. These resources are often affected by insufficient financial support. Being able to have a preserved and accessible copy of these resources is an important step towards creating the bigger picture of the historical record of Inuit advocacy. As part of the Community Webs meetings, it was exciting to hear from other tribal librarians and community archivists across the country & world. Additionally, it was exciting to hear from speakers whose work informs our community archival work at ICC Alaska – such as Chaitra Powell who created (among other amazing things) the “Archive in a Backpack” project.
What impact do you think web archiving could have within your community?
Hopefully this work inspires other organizations to also preserve their digital assets, creating a richer narrative of Inuit political and cultural heritage.
What do you foresee as some of the challenges you may face?
We are eager to preserve our social media channels that have replaced the DRUM newsletter as a vehicle for keeping our community up-to-date on ICC’s work. Ongoing challenges with Facebook and Instagram archiving are preventing us from doing that. Hopefully these issues are resolved in the favor of the communities who created the content and bring their community and connections to these software platforms.
The COVID-19 pandemic has been life-changing for people around the globe. As efforts to slow the progress of the virus unfolded in early 2020, librarians, archivists and others with interest in preserving cultural heritage began considering ways to document the personal, societal, and systemic impacts of the global pandemic. These collections included preserving physical, digital and web-based information and artifacts for posterity and future research use.
In response, the Internet Archive’s Archive-It service launched a COVID-19 Web Archiving Special Campaign starting in April 2020 to allow existing Archive-It partners to increase their web archiving capacity or new partners to join to collect COVID-19 related content. In all, more than 100 organizations took advantage of the COVID-19 Web Archiving Special Campaign and more than 200 Archive-It partner organizations built more than 300 new collections specifically about the global pandemic and its effects on their regions, institutions, and local communities. From colleges, universities, and governments documenting their own responses to community-driven initiatives like Sonoma County Library’s Sonoma Responds Community Memory Archive, a variety of information has been preserved and made available. These collections are critical historical records in and of themselves, and when taken in aggregate will allow researchers a comprehensive view into life during the pandemic.
We have been exploring with partners ways to provide unified access to hundreds of individual COVID-related web collections created by Archive-It users. When the Institute of Museum and Library Services launched the American Rescue Plan grant program, that was part of the broader American Rescue Plan, a $1.9 trillion stimulus package signed into law on March 11, we applied and were awarded funding to build a COVID-19 Web Archive access portal – a dedicated search and discovery access platform for COVID-19 web collections from hundreds of institutions. The COVID-19 Web Archive will allow for browsing and full text search across diverse institutional collections and enable other access methods, including making datasets and code notebooks available for data analysis of the aggregate collections by scholars. This work will support scholars, public health officials, and the general public in fully understanding the scope and magnitude of our historical moment now and into the future. The COVID-19 Web Archive is unique in that it will provide a unified discovery mechanism to hundreds of aggregated web archive collections built by a diverse group of over 200 libraries from over 40 US states and several other nations, from large research libraries to small public libraries to government agencies. If you would like your Archive-It collection or a portion of it included in the COVID-19 Web Archive, please fill out this interest form by Friday, April 29, 2022. If you are an institution in the United States that has COVID-related web archives collected outside of Archive-It or Internet Archive services that you are interested in having included in the COVID-19 Web Archive, please contact firstname.lastname@example.org.
As the war intensifies in Ukraine, volunteers from around the world are working to archive digital content at risk of destruction or manipulation. The Internet Archive is supporting several preservation efforts including the Saving Ukrainian Cultural Heritage Online (SUCHO) initiative launched in early March.
“When we think about the internet, we think the data is always going to be there. But all this data exists on physical servers and they can get destroyed just like buildings and monuments,” said Quinn Dombrowski, academic technology specialist at Stanford University and co-founder of SUCHO. “A tremendous amount of effort and energy has gone into the development of these websites and digitized collections. The people of Ukraine put them together for a reason. They wanted to share their history, culture, language and literature with the world.”
More than 1,200 volunteers with SUCHO have saved 10 terabytes of data including 14,000 uploaded items (images and PDFs) and captured parts of 2,300 websites so far. This includes material from Ukrainian museums, library websites, digital exhibits, open access publications and elsewhere.
The initiative is using a combination of technologies to crawl and archive sites and content. Some of the information is stored at the Internet Archive, where it can be discovered and accessed using open-source software.
Staff at the Internet Archive are committed to assisting with the effort, which aligns with the organization’s mission of universal access to knowledge, and aim to make the web more useful and reliable, said Mark Graham, director of the Wayback Machine.
“This is a pivotal time in history,” he said. “We’re seeing major powers engaged in a war and it’s happening in the internet age where the platforms for information sharing and access we have built, and rely on, the Internet and the Web, are at risk.”
The Internet Archive is documenting and making information accessible that might not otherwise be available, Graham said. For years, the Wayback Machine has been archiving about 950 Russian news sites and 350 Ukrainian news sites. Stories that are deleted or altered are being archived for the historical record.
Recognizing the urgency of this moment, Dombrowski has been stunned by the response to help from archivists, scholars, librarians involved in cultural heritage and the general public. Volunteers need not have technical expertise or special language skills to be of value in the project.
“Many people were spending the days before they got involved with SUCHO scrolling the news and feeling helpless and wishing they could do something to contribute more directly towards helping out with the situation,” Dombrowski said. “It’s been really inspiring hearing the stories that people have told about what it’s meant to them to be able to be part of something like this.”
Gudrun Wirtz, head of the East European Department of the Bavarian State Library (Bayerische Staatsbibliothek) in Munich, was archiving on a smaller scale when she and other colleagues began to collaborate with SUCHO.
“We are committed to Ukraine’s heritage and horrified by this war against the people and their rich culture and the distorting of history going on,” Wirtz said. “As Germans we are especially shocked and reminded of our historical responsibility, because last time Ukraine was invaded it was 1941 by Nazi-Germany. We try to do everything we can at the moment.”
The invasion of Ukraine hits particularly close to home for Anna Kijas, a librarian at Tufts University and co-founder of SUCHO, who is a Polish immigrant with family members who lived through Soviet occupation following WWII.
“Contributing to the SUCHO effort is something tangible that I can do and bring my expertise as a librarian and digital humanist in order to help preserve as much of the cultural heritage of the Ukrainian people as is possible,” said Kijas.
The third co-founder SUCHO, Sebastian Majstorovic, is with the Austrian Centre for Digital Humanities and Cultural Heritage.
The Internet Archive is providing technical support, tools and training to assist volunteers, including those with SUCHO, who are giving of their time.
Through Archive-It, a customizable self-service web archiving platform that captures, stores, and provides access to web-based content, free online accounts have been offered to volunteer archivists. Mirage Berry, business development manager for Archive-It, has coordinated support with other preservation partners including the Harvard Ukrainian Research Institute, the Center for Urban History of East Central Europe, and East European & Central Asian Studies Collections librarian Liladhar Pendse at University of California, Berkeley.
“It’s so incredible how quickly all of these archivists have pulled together to do this,” Berry said. “Everyone wants to do something. You don’t need to have a ton of technical experience. For anyone who is willing to learn, it’s a great jumping off point for web archiving.”
SUCHO organizers anticipate after the immediate emergency of website archiving is over, there will be an ongoing need to stay vigilant with data curation of Ukrainian material. To learn more and get involved, visit http://www.sucho.org.
Guest post by: Olivia Radbill, Adult Services/Local History Librarian, South Pasadena Public Library
This post is part of a series written by members of Internet Archive’s Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/
The South Pasadena Public Library (SPPL) is a single branch library system located in the small city of South Pasadena, California, just fifteen minutes from downtown Los Angeles. SPPL serves a population of approximately 25,000 residents, many of whom are very dedicated to preservation and local history. As the Adult Services/Local History Librarian at SPPL, I regularly interact with local organizations, City staff, City commissioners, and residents in search of the many little-known details of South Pasadena’s history. My role not only entails organizing, processing, and making accessible local history, but also archiving current events that will inevitably be the subject of future research.
While this series did quell some of the community desire to interact with the Local History Collection, it did not address the needs of the community in regards to born-digital content. The COVID-19 pandemic highlighted certain gaps in our collection. One of the most notable gaps was the lack of any born-digital or web-archived content. Previously, SPPL has relied primarily on physical donations and physical City documentation. However, once these objects became inaccessible to both Library staff and patrons during our initial COVID-19 closure in March 2020, we sought means of preserving documentation that has increasingly moved to exclusively web-based platforms. For example, in April 2020 the City of South Pasadena launched “City Hall Scoop”, an online blog intended to provide quick, reliable news updates to local residents. It became imperative for Library staff to actively seek out and ensure preservation of this kind of content.
At the onset of our involvement in the Community Webs program, I strove to ensure that the objective of our internet archiving was specific, consistent, and attainable. After careful consideration, the following categories were determined to be priorities to the SPPL Local History Collection: City Government, Local Newspapers, and Nonprofit Organizations. Based on these categories we have identified many relevant websites, but chose to focus primarily on official websites and social media pages, to add to the Archive-It platform. The Community Webs project has been an invaluable resource for addressing the needs of both the SPPL staff and the community. Online trainings have aided significantly in overcoming learning curves, helped us determine the scope of our archiving project, and have allowed SPPL to create a system in which web-based content is an integral part of our Local History Collection. SPPL, as of March 2022, has archived, either singularly or on a recurring basis, eleven websites. We are hoping to archive 22 new sites by the end of the year, doubling the number we reached last year.