Art historians, critics, curators, humanities scholars and many others rely on the records of artists, galleries, museums, and arts organizations to conduct historical research and to understand and contextualize contemporary artistic practice. Yet, much of the art-related materials that were once published in print form are now available primarily or solely on the web and are thus ephemeral by nature. In response to this challenge, more than 40 art libraries spent the last 3 years developing a collective approach to preservation of web-based art materials at scale.
Supported by the Institute of Museum and Library Services and the National Endowment for the Humanities, The Collaborative ART Archive (CARTA) community has successfully aligned effort across libraries large and small, from Manoa, Hawaii to Toronto, Ontario and back resulting in preservation of and access to 800 web-based art resources, organized into 8 collections (art criticism, art fairs and events, art galleries, art history and scholarship, artists websites, arts education, arts organizations, auction houses), totalling nearly 9 TBs of data with continued growth. All collections are preserved in perpetuity by the Internet Archive.
Today, CARTA is excited to launch the CARTA portal – providing unified access to CARTA collections.
🎨 CARTA portal 🎨
The CARTA portal includes web archive collections developed jointly by CARTA members, as well as preexisting art-related collections from CARTA institutions, and non-CARTA member collections. CARTA portal development builds on the Internet Archive’s experience creating the COVID-19 Web Archive and Community Webs portal.
CARTA collections are searchable by contributing organization, collection, site, and page text. Advanced search supports more granular exploration by host, results per host, file types, and beginning and end dates.
Moving forward CARTA aims to grow and diversify its membership in order to increase collective ability to preserve web-based art materials. If your art library would like to join CARTA please express interest here..
Machine learning has many potential applications for working with GLAM (galleries, libraries, archives, museums) collections, though it is not always clear how to get started. This post outlines some of the possible ways in which open source machine learning tools from the Hugging Face ecosystem can be used to explore web archive collections made available via the Internet Archive’s ARCH (Archives Research Compute Hub). ARCH aims to make computational work with web archives more accessible by streamlining web archive data access, visualization, analysis, and sharing. Hugging Face is focused on the democratization of good machine learning. A key component of this is not only making models available but also doing extensive work around the ethical use of machine learning.
Below, I work with the Collaborative Art Archive (CARTA) collection focused on artist websites. This post is accompanied by an ARCH Image Dataset Explorer Demo. The goal of this post is to show how using a specific set of open source machine learning models can help you explore a large dataset through image search, image classification, and model training.
Later this year, Internet Archive and Hugging Face will organize a hands-on hackathon focused on using open source machine learning tools with web archives. Please let us know if you are interested in participating by filling out this form.
Choosing machine learning models
The Hugging Face Hub is a central repository which provides access to open source machine learning models, datasets and demos. Currently, the Hugging Face Hub has over 150,000 openly available machine learning models covering a broad range of machine learning tasks.
Rather than relying on a single model that may not be comprehensive enough, we’ll select a series of models that suit our particular needs.
A screenshot of the Hugging Face Hub task navigator presenting a way of filtering machine learning models hosted on the hub by the tasks they intend to solve. Example tasks are Image Classification, Token Classification and Image-to-Text.
Working with image data
ARCH currently provides access to 16 different “research ready” datasets generated from web archive collections. These include but are not limited to datasets containing all extracted text from the web pages in a collection, link graphs (showing how websites link to other websites), and named entities (for example, mentions of people and places). One of the datasets is made available as a CSV file, containing information about the images from webpages in the collection, including when the image was collected, when the live image was last modified, a URL for the image, and a filename.
Screenshot of the ARCH interface showing a preview for a dataset. This preview includes a download link and an “Open in Colab” button.
One of the challenges we face with a collection like this is being able to work at a larger scale to understand what is contained within it – looking through 1000s of images is going to be challenging. We address that challenge by making use of tools that help us better understand a collection at scale.
Building a user interface
Gradio is an open source library supported by Hugging Face that helps create user interfaces that allow other people to interact with various aspects of a machine learning system, including the datasets and models. I used Gradio in combination with Spacesto make an application publicly available within minutes, without having to set up and manage a server or hosting. See the docs for more information on using Spaces. Below, I show examples of using Gradio as an interface for applying machine learning tools to ARCH generated data.
I use the Gradio tab for random images to begin assessing images in the dataset. Looking at a randomized grid of images gives a better idea of what kind of images are in the dataset. This begins to give us a sense of what is represented in the collection (e.g., art, objects, people, etc.).
Screenshot of the random image gallery showing a grid of images from the dataset.
Introducing image search models
Looking at snapshots of the collection gives us a starting point for exploring what kinds of images are included in the collection. We can augment our approach by implementing image search.
There are various approaches we could take which would allow us to search our images. If we have the text surrounding an image, we could use this as a proxy for what the image might contain. For example, we might assume that if the text next to an image contains the words “a picture of my dog snowy”, then the image contains a picture of a dog. This approach has limitations – text might be missing, unrelated or only capture a small part of what is in an image. The text “a picture of my dog snowy” doesn’t tell us what kind of dog the image contains or if other things are included in that photo.
Making use of an embedding model offers another path forward. Embeddings essentially take an input i.e. text or image, and return a bunch of numbers. For example, the text prompt: ‘an image of a dog’, would be passed through an embedding model, which ‘translates’ text into a matrix of numbers (essentially a grid of numbers). What is special about these numbers is that they should capture some semantic information about the input; the embedding for a picture of a dog should somehow capture the fact that there is a dog in the image. Since these embeddings consist of numbers, we can also compare one embedding to another to see how close they are to each other. We expect the embeddings for similar images to be closer to each other and the embeddings for images which are less similar to each other to be farther away. Without getting too much into the weeds of how this works, it’s worth mentioning that these embeddings don’t just represent one aspect of an image, i.e. the main object it contains but also other components, such as its aesthetic style. You can find a longer explanation of how this works in this post.
Finding a suitable image search model on the Hugging Face Hub
To create an image search system for the dataset, we need a model to create embeddings. Fortunately, the Hugging Face Hub makes it easy to find models for this.
The Hub has various models that support building an image search system.
Hugging Face Hub showing a list of hosted models.
All models will have various benefits and tradeoffs. For example, some models will be much larger. This can make a model more accurate but also make it harder to run on standard computer hardware.
Hugging Face Hub provides an ‘inference widget’, which allows interactive exploration of a model to see what sort of output it provides. This can be very useful for quickly understanding whether a model will be helpful or not.
A screenshot of a model widget showing a picture of a dog and a cat playing the guitar. The widget assigns the label `”playing music`” the highest confidence.
For our use case, we need a model which allows us to embed both our input text, for example, “an image of a dog,” and compare that to embeddings for all the images in our dataset to see which are the closest matches. We use a variant of the CLIP model hosted on Hugging Face Hub: clip-ViT-B-16. This allows us to turn both our text and images into embeddings and return the images which most closely match our text prompt.
Aa screenshot of the search tab showing a search for “landscape photograph” in a text box and a grid of images resulting from the search. This includes two images containing trees and images containing the sky and clouds.
While the search implementation isn’t perfect, it does give us an additional entry point into an extensive collection of data which is difficult to explore manually. It is possible to extend this interface to accommodate an image similarity feature. This could be useful for identifying a particular artist’s work in a broader collection.
While image search helps us find images, it doesn’t help us as much if we want to describe all the images in our collection. For this, we’ll need a slightly different type of machine learning task – image classification. An image classification model will put our images into categories drawn from a list of possible labels.
We can find image classification models on the Hugging Face Hub. The “Image Classification Model Tester” tab in the demo Gradio application allows us to test most of the 3,000+ image classification models hosted on the Hub against our dataset.
This can give us a sense of a few different things:
How well do the labels for a model match our data?A model for classifying dog breeds probably won’t help us much!
It gives us a quick way of inspecting possible errors a model might make with our data.
It prompts us to think about what categories might make sense for our images.
A screenshot of the image classification tab in the Gradio app which shows a bar chart with the most frequently predicted labels for images assigned by a computer vision model.
We may find a model that already does a good job working with our dataset – if we don’t, we may have to look at training a model.
Training your own computer vision model
The final tab of our Gradio demo allows you to export the image dataset in a format that can be loaded by Label Studio, an open-source tool for annotating data in preparation for machine learning tasks. In Label Studio, we can define labels we would like to apply to our dataset. For example, we might decide we’re interested in pulling out particular types of images from this collection. We can use Label Studio to create an annotated version of our dataset with these labels. This requires us to assign labels to images in our dataset with the correct labels. Although this process can take some time, it can be a useful way of further exploring a dataset and making sure your labels make sense.
With a labeled dataset, we need some way of training a model. For this, we can use AutoTrain. This tool allows you to train machine learning models without writing any code. Using this approach supports creation of a model trained on our dataset which uses the labels we are interested in. It’s beyond the scope of this post to cover all AutoTrain features, but this post provides a useful overview of how it works.
As mentioned in the introduction, you can explore the ARCH Image Dataset Explorer Demo yourself. If you know a bit of Python, you could also duplicate the Space and adapt or change the current functionality it supports for exploring the dataset.
Internet Archive and Hugging Face plan to organize a hands-on hackathon later this year focused on using open source machine learning tools from the Hugging Face ecosystem to work with web archives. The event will include building interfaces for web archive datasets, collaborative annotation, and training machine learning models. Please let us know if you are interested in participating by filling out this form.
This Spring, the Internet Archive hosted two in-person workshops aimed at helping to advance library support for web archive research: Digital Scholarship & the Web and Art Resources on the Web. These one-day events were held at the Association of College & Research Libraries (ACRL) conference in Pittsburgh and the Art Libraries Society of North America (ARLIS) conference in Mexico City. The workshops brought together librarians, archivists, program officers, graduate students, and disciplinary researchers for full days of learning, discussion, and hands-on experience with web archive creation and computational analysis. The workshops were developed in collaborationwith the New York Art Resources Consortium (NYARC) – and are part of an ongoing series of workshops hosted by the Internet Archive through Summer 2023.
Internet Archive Deputy Director of Archiving & Data Services Thomas Padilla discussing the potential of web archives as primary sources for computational research at Art Resources on the Web in Mexico City.
Designed in direct response to library community interest in supporting additional uses of web archive collections, the workshops had the following objectives: introduce participants to web archives as primary sources in context of computational research questions, develop familiarity with research use cases that make use of web archives; and provide an opportunity to acquire hands-on experience creating web archive collections and computationally analyzing them usingARCH (Archives Research Compute Hub) – a new service set to publicly launch June 2023.
Internet Archive Community Programs Manager Lori Donovan walking workshop participants through a demonstration of Palladio using a dataset generated with ARCH at Digital Scholarship & the Web In Pittsburgh, PA.
In support of those objectives, Internet Archive staff walked participants through web archiving workflows, introduced a diverse set of web archiving tools and technologies, and offered hands-on experience building web archives. Participants were then introduced to Archives Research Compute Hub (ARCH). ARCH supports computational research with web archive collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. ARCH does this by streamlining generation and access to more than a dozen research ready web archive datasets, in-browser visualization, dataset analysis, and open dataset publication. Participants further explored data generated with ARCH in Palladio, Voyant, and RAWGraphs.
Network visualization of the Occupy Web Archive collection, created using Palladio based on a Domain Graph Dataset generated by ARCH.
Gallery visualization of the CARTA Art Galleries collection, created using Palladio based on an Image Graph Dataset generated by ARCH.
At the close of the workshops, participants were eager to discuss web archive research ethics, research use cases, and a diverse set of approaches to scaling library support for researchers interested in working with web archive collections – truly vibrant discussions – and perhaps the beginnings of a community of interest! We plan to host future workshops focused on computational research with web archives – please keep an eye on our Event Calendar.
In the six weeks since announcing that Internet Archive has begun gathering content for the Digital Library of Amateur Radio and Communications (DLARC), the project has quickly grown to more than 25,000 items, including ham radio newsletters, podcasts, videos, books, and catalogs. The project seeks additional contributions of material for the free online library.
More than 300 radio related books are available in DLARC via controlled digital lending. These materials may be checked out by anyone with a free Internet Archive account for a period of one hour to two weeks. Radio and communications books donated to Internet Archive are scanned and added to the DLARC lending library.
Amateur radio podcasts and video channels are also among the first batch of material in the DLARC collection. These include Ham Nation, Foundations of Amateur Radio, the ICQ Amateur/Ham Radio Podcast, with many more to come. Providing a mirror and archive for “born digital” content such as video and podcasts is one of the core goals of DLARC.
Additions to DLARC also include presentations recorded at radio communications conferences, including GRCon, the GNU Radio Conference; and the QSO Today Virtual Ham Expo. A growing reference library of past radio product catalogs includes catalogs from Ham Radio Outlet and C. Crane.
DLARC is growing to be a massive online library of materials and collections related to amateur radio and early digital communications. It is funded by a significant grant from Amateur Radio Digital Communications (ARDC) to create a digital library that documents, preserves, and provides open access to the history of this community.
Anyone with material to contribute to the DLARC library, questions about the project, or interest in similar digital library building projects for other professional communities, please contact:
Kay Savetz, K6KJN Program Manager, Special Collections email@example.com Mastodon: firstname.lastname@example.org
Internet Archive has begun gathering content for the Digital Library of Amateur Radio and Communications (DLARC), which will be a massive online library of materials and collections related to amateur radio and early digital communications. The DLARC is funded by a significant grant from the Amateur Radio Digital Communications (ARDC), a private foundation, to create a digital library that documents, preserves, and provides open access to the history of this community.
The library will be a free online resource that combines archived digitized print materials, born-digital content, websites, oral histories, personal collections, and other related records and publications. The goals of the DLARC are to document the history of amateur radio and to provide freely available educational resources for researchers, students, and the general public. This innovative project includes:
A program to digitize print materials, such as newsletters, journals, books, pamphlets, physical ephemera, and other records from both institutions, groups, and individuals.
A digital archiving program to archive, curate, and provide access to “born-digital” materials, such as digital photos, websites, videos, and podcasts.
A personal archiving campaign to ensure the preservation and future access of both print and digital archives of notable individuals and stakeholders in the amateur radio community.
Conducting oral history interviews with key members of the community.
Preservation of all physical and print collections donated to the Internet Archive.
The DLARC project is looking for partners and contributors with troves of ham radio, amateur radio, and early digital communications related books, magazines, documents, catalogs, manuals, videos, software, personal archives, and other historical records collections, no matter how big or small. In addition to physical material to digitize, we are looking for podcasts, newsletters, video channels, and other digital content that can enrich the DLARC collections. Internet Archive will work directly with groups, publishers, clubs, individuals, and others to ensure the archiving and perpetual access of contributed collections, their physical preservation, their digitization, and their online availability and promotion for use in research, education, and historical documentation. All collections in this digital library will be universally accessible to any user and there will be a customized access and discovery portal with special features for research and educational uses.
We are extremely grateful to ARDC for funding this project and are very excited to work with this community to explore a multi-format digital library that documents and ensures access to the history of a specific, noteworthy community. Anyone with material to contribute to the DLARC library, questions about the project, or interest in similar digital library building projects for other professional communities, please contact:
Kay Savetz, K6KJN Program Manager, Special Collections email@example.com Twitter: @KaySavetz
Internet Archive’s Community Webs program is excited to announce that metadata for more than 4,800 archived websites and web collections created by 23 Community Webs member organizations are now available in Digital Public Library of America (DPLA). This marks the first of many metadata ingests that will come over the next months and years, as additional web and digital archives are created and described by members of the program. To access Community Webs web content in DPLA, click here.
The Community Webs program was launched in 2017, and currently provides web and digital archiving training, infrastructure, services, and professional community cultivation for more than 150 public libraries and cultural heritage organizations across the country and around the world. The participating organizations have shared goals of documenting local history and community archiving, especially documenting communities and populaces traditionally excluded from the historical record. These goals dovetail nicely with DPLA’s recently launched Digital Equity Project, which aims to provide support to libraries and archives as they shift toward greater inclusion of diverse stories and voices.
Community Webs collections now available in DPLA include:
The #Syllabus collection, created by the Schomburg Center for Research in Black Culture in New York City, which “aims to web archive Black-authored and Black-related educational resources to document Black studies, movements, and experiences in the twenty-first century.”
The D.C. Punk (Web) Archive, created by People’s Archive, DC Public Library, which documents the punk and hardcore music scenes in Washington, DC.
The Covid-19 in Hennepin County collection, created by Hennepin County Library, which documents the pandemic’s impact on Minneapolis, Minnesota and the surrounding areas, is one of a dozen web collections on local impacts of the Covid-19 pandemic which are now available in DPLA.
The Internet Archive has been a DPLA content provider since 2015, primarily contributing digital materials from our many print digitizing partnerships. However, this is the first time our partners’ web collections have appeared in the DPLA. We are excited for this opportunity to add community-focused born-digital and web collections from our program partners to the already unparalleled breadth of cultural heritage collections accessible via DPLA’s portal. We think these hyperlocal archived web resources will add additional depth and context to DPLA’s existing national collections. Meanwhile, the Community Webs collections’ inclusion in the portal will put these materials alongside other types of digital objects and in front of a broader audience of researchers, steps that are vital to dismantling the silos that often enclose web archives.
We are grateful to be partnering with DPLA to increase access to these vital community history collections and look forward to building more integrations and furthering this collaboration in the years to come. We would like to extend special thanks to the team at DPLA for all their work making this integration possible and to the 23 Community Webs member organizations who have both built and shared their local history web content for posterity.
This post is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/
What is an archivist to do when items of public record, which have been systematically added to publicly accessible collections for over a century, suddenly turn from paper into bits and bytes that disappear from the web, or even get stuck behind paywalls? Like many in my profession, I’ve been grappling with this question for a while. Having no real training in digital archiving and facing this quandary as a lone arranger, it’s sometimes hard to keep that grappling from turning into low-key panicking that my inaction has been causing information to be lost forever.
Imagine my excitement, then, when I learned about the Community Webs program – access to and training for Archive-It, collaboration with the Internet Archive, and a network of others like me to bounce ideas off and get inspiration from? Yes please! With the blessing of my boss, I applied right away and my library joined the program in April 2021.
(This might be a good point for a quick introduction. I work as the archivist/local history librarian at the Waltham Public Library (WPL) in Waltham, Massachusetts. Waltham is a city about 10 miles west of Boston, and is home to an ethnically and economically diverse population of just over 62,000 people. The WPL is a fully-funded community hub, fostering a healthy democratic society by providing a wealth of current informational, educational, and recreational resources free of charge to all members of the community. The library is known throughout the area for its knowledgeable and friendly staff, welcoming and safe environment, accessibility, convenience, current technology, and helpful assistance.)
I eagerly dove into the program and used our first web-archive collection – Waltham Public Library – as a testing ground, a place to gain familiarity with both Archive-It and the whole process of web archiving. I’ve been trying to capture content that aligns with the material found in the library’s analog records – annual reports, policies, announcements, event flyers, records from our Friends group, etc. – by doing a weekly crawl of the library website, our Friends website, and the library’s Twitter feed. For the most part this collection has been thankfully pretty straightforward.
Our largest collection so far is COVID-19 in Waltham, which makes up a portion of the library’s very first born-digital archival collection. That collection began in April 2020, when the WPL (like most other places) was closed to help “flatten the curve.” A month or two prior, as the pandemic was building steam, I had become fascinated with the 1918 influenza. A poke through our archives for the topic had been disappointing, as there wasn’t too much beyond a couple of newspaper clippings, brief mentions in the library trustees’ minutes, and a few pages in the records of the local nurses’ association. I was hoping to put together a better picture of what it was like to live in Waltham during the flu, perhaps to give myself a glimpse of what I could expect in the coming weeks (heh… how naïve I was).
I put out a call via the library’s social media for those who lived, worked, and/or went to school in Waltham to share their stories, hoping to build the kind of collection I wanted and failed to find from 1918. There was an initial rush of Google Form submissions, a handful of photos, and one video, and then nothing. I was pleased we had received some materials, but still wanted to paint a broader picture of Waltham under Covid. Enter Community Webs! For the past several months I’ve been working to collect retroactively what I was hoping to capture at the time – news articles, videos, the city website, information from the schools, and so on. While it’s not as comprehensive as it might have been if I’d been able to gather it all as it happened, I’ve been able to save over 500 GB of data that will help those in the future to better imagine what it was like to live in Waltham during Covid.
Finally, related to the quandary in the first paragraph of this post, our most complicated collection is the Waltham News Tribune. The WPL has microfilm copies of the paper going back to its earliest iteration in the 1860s, and part of my job has been to collect each issue and send yearly batches to a vendor for microfilming. However, as of this past May, the publisher has moved the paper entirely online, with some content requiring a paid subscription to view. The WPL has a subscription so that we can continue to provide free access to our patrons, but what happens to our archive of back issues? Does it just stop abruptly in May 2022, even as time and local news continue to march on? As it is, our microfilm is heavily used, especially since the paper’s offices burned down in 1999, making ours the only existing archive.
Thanks to web archiving, we’re able to continue to fulfill our unofficial role as the repository for the city newspaper, at least in theory. In practice, I look at the daily crawls of the digital edition of the paper and can’t help but see that it is no longer the type of local news we’ve been archiving for over a century. The corporate publisher of the paper has consolidated ours with those from several other local cities and towns, and has sacrificed true local news coverage for more generic topics, many of which aren’t even related specifically to Massachusetts. This is a problem that sits well outside of my archives wheelhouse, but at least I feel I can do my due diligence by capturing what local news does trickle through.
I’ve had a slower go of web archiving than I’d like so far, thanks to several months of parental leave in 2021 and a very packed part-time work schedule. Nevertheless, I’ve been chipping away at our collections and planning for more, with an eye to add more diverse voices than those that make up much of our analog collections. I’m grateful for the encouragement and help I’ve received from Community Webs staff and peers, and want to give a special shout-out to the Archive-It folks who hold office hours to assist us with technical issues! This really is a fantastic program, and I’m so glad my library is part of it.
On June 21st, the Community Webs program team hosted its 2022 US Symposium at the National Museum of the American Indian in Washington, DC. For this day-long meeting, we welcomed over 30 librarians and archivists from across the country for presentations, discussion, networking, and some much-needed catch up following two years of entirely virtual events.
Community Webs is a community history web and digital archiving program operated by the Internet Archive. The program seeks to advance the capacity for community-focused memory organizations to build web and digital archives documenting local histories, with a particular focus on communities that have been underrepresented in the historic record. Community Webs provides its members with web and digital archiving tools, as well as training, technical support and access to a network of organizations doing similar work. The Community Webs program, including this event, is generously funded with support from the Institute of Museum and Library Services (IMLS) and the Mellon Foundation.
The day began with opening remarks and program updates from Internet Archive staff, including an overview of Community Webs and the significant growth the program has experienced since its launch in 2017. Staff provided a glimpse at what lies ahead both for Community Webs and the Internet Archive’s Archiving and Data Services team. This included plans to incorporate digitization, digital preservation and other forms of digital collecting into Community Webs, as well as projects and services either newly released or in development at IA.
The first keynote speaker of the day was Dr. Doretha Williams, Director of the Robert F. Smith Center for the Digitization and Curation of African American History at the National Museum of African American History and Culture. Dr. Williams detailed her organization’s commitment to serving its communities via the Center’s Community Curation Program, Internships and Fellowships Program, Family History Center, and Great Migration Home Movie Project. Throughout her presentation, Dr. Williams stressed the importance of community input and partnerships to achieving the Center’s mission, echoing one of the central tenets of the Community Webs program.
Following this presentation, three speakers shared their experiences working on collaborative web archiving initiatives. Lori Donovan, Senior Program Manager for Community Programs at the Internet Archive, began with an overview of various collaborative web archiving initiatives the Internet Archive and its partners have participated in, including the Collaborative ART Archive (CARTA), a web archiving initiative aimed at capturing web-based art materials utilizing a collective approach. Roger Lawson, Executive Librarian at the National Gallery of Art, shared his institution’s perspective as a member of CARTA. Finally, Christie Moffatt, Digital Manuscripts Program Manager at the National Library of Medicine, described working with colleagues both across her organization and externally to capture health-related web content at a national scale. Each of these presentations emphasized the advantages in scale, resources, staffing and knowledge-sharing that can be achieved by pursuing web archiving via collaborative entities.
Our afternoon session kicked off with a second keynote presentation from Leslie Johnston, Director of Digital Preservation at the National Archives and Records Administration (NARA). Johnston detailed the challenges NARA faces while contending with digital preservation across the enterprise. These challenges include the heterogeneity of digital outputs and technologies, the complexity of digital objects and environments, the scale of the archivable digital universe, and the difficulties in ensuring equitable access. As an antidote to these challenges, Johnston recommends archivists provide guidance to content creators, take a risk-based approach, prioritize basic levels of control, maintain scalable and flexible infrastructure, and engage in collaborations and partnerships. She also advocated for a people- rather than technology-centric approach to digital preservation, again mirroring the ethos of the Community Webs program.
For our final speaker session of the afternoon, we welcomed Community Webs members up to the lectern to share their web archiving and digital goals and achievements. Librarian, archivist, Phd student, and creative polymath kYmberly Keeton discussed her work as founder of Art | Library Deco, an online archive of African American art. Keeton described working closely with the artists featured in the archive, reiterating the theme of collaboration espoused by other speakers at the event. Tricia Dean, Tech Services Manager at Wilmington Public Library (Illinois), argued for the importance of capturing the histories of small and rural communities through initiatives like Community Webs. Liz Paulus, Adult Services Librarian at Cedar Mill & Bethany Community Libraries described her efforts to capture the online Cedar Mill News via web archiving, stressing how one successful project can play a significant role when advocating for future resources. Longtime Community Webs member Dylan Gaffney, Information Services Associate for Local History & Special Collections at Forbes Library, described his library’s participation in States of Incarceration, a traveling exhibition on mass incarceration, the Historic Northampton Enslaved People Project, and other initiatives. Gaffney credited Community Webs with paving the way for an equity-focused approach to digital projects such as these. Finally, Dana Hamlin, Archivist at Waltham Public Library showcased her organization’s web archiving efforts, highlighting the library’s COVID-19 collections and their attempts to capture the online local newspaper, the Waltham News Tribune.
Throughout the day, attendees had opportunities to discuss digital initiatives at their organizations, to catch up informally after a long hiatus, and to browse the exhibitions on display at the National Museum of the American Indian. We’re so grateful to all of our Community Webs members who were able to attend the event and especially to those who shared their knowledge. Our next Community Webs Symposium will be held in Chattanooga this September 13 to coincide with the Association for Rural and Small Libraries Conference. We are looking forward to seeing more program members there!
Guest Post by: Tricia Dean, Tech Services Manager at Wilmington Public Library District (IL)
This post is part of a series written by members of the Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/
I was excited when I saw the call for participants in Community Webs. While Wilmington, Illinois is a small, rural town (5,664 people), the thought was that we still had something to contribute. Most Archive-It partners are universities, museums and large libraries, and being in their company was a little daunting to me initially. Other institutions have someone who opens the project, and then it develops into a larger team project. Wilmington Public Library District (WPLD) has a much smaller staff; the project has been wholly mine, which has been both thrilling and terrifying.
Wilmington is a small rural town, falling on the lower end of the economic scale. Because we are isolated,the library plays a vital part in the community. We offer the usual storytimes and adult programs, but also loan out hotspots and ChromeBooks. We have 45 hotspots and these are almost always checked out; some people are using them for vacations, but by usage it is apparent that others are using them as their primary means of connecting to the Internet. Internet access has been more and more important, but after the Covid-19 broke out, more governmental services went strictly online, making access even more critical – and to many who had not been regular patrons. WPLD is a hub for the community, offering computers, information, tax forms, and a place to come in and chat – even more important when we are trying to stay close and limit outside contact.
I am a Chicago native who went to Champaign-Urbana for grad school. I was a scanner for the Internet Archive for several years where I was privileged to handle some incunabula (pre-1500 items). I am the Technical Services Supervisor at Wilmington; primarily I catalog our materials, but I also tend toward Projects, from adding series labels to re-orienting all the calls in the juvenile non-fiction section. I am currently going through our attic to help determine what we have (it’s a Mystery!). I’m making lists, and hoping to have items to scan which would be available online, in multiple places. I applied for the Community Webs program (with my director’s blessing) because I felt that it’s important for small towns to be represented in the collection of history. Only 20% of the population still lives outside major metro areas, but it is every bit as important to capture that life as it is to retain the history of large cities.
Wilmington Library joined Community Webs in the summer of 2021. After some technical clarifications with the Archive-It staff WLPD was set up. In considering what made Wilmington unique, the first link was to our library and social media pages. Social media has grown in importance in the last twenty years, but it became a vital link during Covid when services were otherwise unavailable. Wilmington Library YouTube videos, how-tos, crafts and storytime, stand to remind us of how we responded and as a continuing reference for parents who can’t get to the library. But since social media, specifically, is known for ‘right now,’ it lacks the kind of reflection over time that we can create through the Community Webs project.
We may be small, but we have a number of historical articles and sites which needed to be brought together. We want to reflect events that have been impactful to our community, from the explosion of the Joliet Armory in the 1940s to the continuing issues with the Wilmington Dam, which has proved dangerous, but has complicated ownership issues. I still have a long way to go; the projects (attic/local history/web archive) are all intertwined. Wilmington has the usual Community Resources and City Government collections in Archive-It. Going forward, we want to continue to develop our Wilmington History collection. We are working on local history and will establish a collection of materials from our attic and public donations. Our local paper has vertical files which could be a goldmine of information – again, on my to-do list. We will be kicking off an Oral History Project, which will begin with a series of simple gatherings/coffee hours for our seniors, providing a place for them to gather, and a space to share their stories. I am hoping these will be in our Community Webs archive. Who better to speak to where we’ve been and where we are than some of our oldest residents?
Why is Community Webs important? Because it will help to remember when we cannot keep up with the information overload. Because there is so much happening that we miss a good deal of what is around us – or can’t bear to face it for long. Because so very very much of our lives are now online – and can be erased with a keystroke. Because we are seeing, painfully, that those who do not learn from the past will be/are condemned to re-live it. And, for Wilmington, I think it is important because so many of the voices and sites being captured are from museums, universities and large public libraries. It is important that we remember that we used to be far less urban than we are today. It is important to remember the smaller places, those who are too easily lost in the maelstrom of modern life, because to be forgotten is to be erased.
This post is part of a series written by members of Internet Archive’s Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/
Can you describe your community and the services and role of your organization within the community?
Inuit Circumpolar Council (ICC) Alaska works on behalf of the Inupiat of the North Slope, Northwest and Bering Straits Regions; St. Lawrence Island Yupik; and the Central Yup’ik and Cup’ik of the Yukon-Kuskokwim Region in Southwest Alaska. ICC Alaska is a national member of ICC International. Since inception in 1977, ICC has gained consultative status II with the United Nations, and is a Permanent Participant of the Arctic Council.
For example, ICC has provisional status with the International Maritime Organization (IMO), is an active member at the Arctic Council senior level and within the working groups and is a prominent voice at the UN Framework Convention on Climate Change (UNFCCC). Work and engagement occur in many ways at these different Fora. Within the UNFCCC, ICC has taken a leadership role in putting forward Indigenous Knowledge and establishing a platform for providing equitable space for multiple knowledge systems. Additionally, at the UNFCCC COP 26, ICC Chair, Dr. Dalee Sambo Dorough, led an ICC delegation made up of Inuitrepresentatives from across the Arctic.
An immense amount of work occurs in direct partnership with Inuit communities to inform work at international fora. For example, ICC is facilitating the development of international protocols for Equitable and Ethical Engagement. These protocols will provide a pathway to success for all that want to work within Inuit homelands and whose work impacts the Arctic. The protocols will aid in a paradigm shift in how work, decisions, and policies are currently created and carried out. The paradigm shift will lead toward greater equity and recognition of Inuit sovereignty and Self-determination.
Why was your organization interested in participating in Community Webs?
The Community Webs program was attractive to ICC because it provided the training and the storage to effectively preserve ICC’s digitized & born-digital archival materials. We were pleased to see this offering as a solution for an ongoing desire to archive the prolific organization’s digital materials & products. This work dovetails nicely with ICC Alaska’s efforts to digitize 47 boxes, or around 80 linear feet of material that span 6 decades, including audio, film, photographic media, and paper documents.
ICC Jam – part 2 – Greenland
Cultural programming as part of the 1983 General Assembly. In this clip, view performances from Greenland’s Tuktak Theater and a Greenlandic choir
ICC advocates for Inuit and Inuit way of life, highlighted by ICC’s General Assembly meetings. The ICC receives its mandate from a General Assembly held every four years. The General Assembly is the heart of the organization, providing an opportunity for sharing information, discussing common concerns, debating issues, and strengthening the unity between all Inuit across our homelands. Through the Community Webs project, ICC Alaska has been able to preserve archival video of the ICC General Assemblies going back 30 years using Archive-It and the Internet Archive, as well as all newsletters, press releases, resolutions, social media campaigns, and reports published on its website. These are a significant record of ICC advocacy, but more importantly, Inuit political and cultural heritage.
Why do you think it is important for public libraries, community archives, and other local and community-based organizations to do this work?
Community-based organizations are uniquely positioned as both a part of and apart from the community. This vantage point allows for the self-reflection and observation needed for web archiving, as well as the relationships within the community to create the space and dialogue needed for community archiving projects. By building more capacity within community-based organizations for web archiving and digital preservation efforts, we can expand the recorded historical narrative and humanities-based inquiries in a multitude of directions, to truly reflect the diversity of our world & time.
Where do you hope to see your web archiving program going?
The core goal of this work is to make ICC documents and its historical narrative more accessible and discoverable within ICC, to ICC’s member organizations, international bodies, and researchers, our aspirations are much bigger. Our hope is that this web archive goes beyond the core goal to inspire, delight, hearten, inform, and add depth to the conversations Inuit are having about cultural identity, relationship to the land, hunting, advocacy, self-determination, and self-governance.
We are curious about the intangible outcomes: What new work does the archive inspire? How does the archive add depth & historical weight to existing projects, discussions, and advocacy? What stories and knowledge gets re-remembered, or re-investigated after viewing archival materials? What advocacy, ethics, and philosophical works come from Inuit leaders informed by the legacy that the archive shared? Are youth leaders interested in adding to the archive?
Is there anything you would like your organization to contribute back to the broader community of web archiving and/or local history in the form of documentation, workflows, policy drafts or other resources?
We have several aspirations. Firstly, it is the telling of Inuit stories. The archive is another manifestation of that mission – to record and share Inuit voices across time. To increase access to those voices, information, knowledge, and history. The ICC Archival holdings are a historically unique & culturally significant telling of Inuit cultural heritage, history (including political history), educational pedagogy, philosophy, self-determination, values, ethics, environmental stewardship, and Indigenous Knowledge. It is important to create a way for Inuit to discover and interact with this work. Community Webs has offered a new tool in our toolkit.
Secondly, the goal is to move forward conversations about categorization and information management for indigenous communities. What does that look like in best practice? Can we, together with other Inuit archives, improve on existing practices to create a more equitable and ethical engagement with Inuit-produced information, the management of that information, and the discovery and access of that information.
What are you most excited to learn through your participation in Community Webs?
It was exciting to discover that many Inuit and Alaska Native resources that have already been preserved using the Internet Archive. These resources are often affected by insufficient financial support. Being able to have a preserved and accessible copy of these resources is an important step towards creating the bigger picture of the historical record of Inuit advocacy. As part of the Community Webs meetings, it was exciting to hear from other tribal librarians and community archivists across the country & world. Additionally, it was exciting to hear from speakers whose work informs our community archival work at ICC Alaska – such as Chaitra Powell who created (among other amazing things) the “Archive in a Backpack” project.
What impact do you think web archiving could have within your community?
Hopefully this work inspires other organizations to also preserve their digital assets, creating a richer narrative of Inuit political and cultural heritage.
What do you foresee as some of the challenges you may face?
We are eager to preserve our social media channels that have replaced the DRUM newsletter as a vehicle for keeping our community up-to-date on ICC’s work. Ongoing challenges with Facebook and Instagram archiving are preventing us from doing that. Hopefully these issues are resolved in the favor of the communities who created the content and bring their community and connections to these software platforms.