Author Archives: Caralee Adams

Library as Laboratory Recap: Analyzing Biodiversity Literature at Scale

At a recent webinar hosted by the Internet Archive, leaders from the Biodiversity Heritage Library (BHL) shared how its massive open access digital collection documenting life on the planet is an invaluable resource of use to scientists and ordinary citizens.

“The BHL is a global consortium of the  leading natural history museums, botanical gardens, and research institutions — big and small— from all over the world. Working together and in partnership with the Internet Archive, these libraries have digitized more than 60 million pages of scientific literature available to the public”, said Chris Freeland, director of Open Libraries and moderator of the event.

Watch session recording:

Established in 2006 with a commitment to inspiring discovery through free access to biodiversity knowledge, BHL has 19 members and 22 affiliates, plus 100 worldwide partners contributing data. The BHL has content dating back nearly 600 years alongside current literature that, when liberated from the print page, holds immense promise for advancing science and solving today’s pressing problems of climate change and the loss of biodiversity.

Martin Kalfatovic, BHL program director and associate director of the Smithsonian Libraries and Archives, noted in his presentation that Charles Darwin and colleagues famously said “the cultivation of natural science cannot be efficiently carried on without reference to an extensive library.”

“Today, the Biodiversity Heritage Library is creating this global, accessible open library of literature that will  help scientists, taxonomists, environmentalists—a host of people working with our planet—to actually have ready access to these collections,” Kalfatovic said. BHL’s mission is to improve research methodology by working with its partner libraries and the broader biodiversity and bioinformatics community. Each month, BHL draws about 142,000 visitors and 12 million users overall.

“The outlook for the planet is challenging. By unlocking this historic data [in the Biodiversity Heritage Library], we can find out where we’ve been over time to find out more about where we need to be in the future.”

Martin Kalfatovic, program director, Biodiversity Heritage Library

Most of the BHL’s materials are from collections in the global north, primarily in large, well-funded institutions. Digitizing these collections helps level the playing field, providing researchers in all parts of the world equal access to vital content.

The vast collection includes species descriptions, distribution records, climate records, history of scientific discovery, information on extinct species, and records of scientific distributions of where species live. To date, BHL has made over 176,000 titles and 281,000 volumes available. Through a partnership with the Global Names Architecture project, more than 243 million instances of taxonomic (Latin) names have been found in BHL content.

Kalfatovic underscored the value of BHL content in understanding the environment in the wake of recent troubling news from the Sixth Assessment Report (AR6) published by the  Intergovernmental Panel on Climate Change about the impact of the earth’s warming. 

Biodiversity Heritage Library by the numbers.

“The outlook for the planet is challenging,” he said. “By unlocking this historic data, we can find out where we’ve been over time to find out more about where we need to be in the future.”

JJ Dearborn, BHL data manager, discussed how digitization transforms physical books into digital objects that can be shared with “anyone, at any time, anywhere.” She describes the Wikimedia ecosystem as “fertile ground for open access experimentation,” crediting the organization with giving BHL the ability to reach new audiences and transform its data into 5-star linked open data. “Dark data” that is locked up in legacy formats, JP2s, and OCR text are sources of valuable checklist, species occurrence, and event sampling data that the larger biodiversity community can use to improve humanity’s collective ability to monitor biodiversity loss and the destructive impacts of climate change, at scale.  

The majority of the world’s data today is siloed, unstructured, and unused, Dearborn explained. This “dark data” “represents an untapped resource that could really transform human understanding if it could be truly utilized,” she said. “It might represent a gestalt leap for humanity.” 

The event was the fifth in a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The final session of the Library as Laboratory series will be a series of lightning talks on May 11 at 11am PT / 2pm ET—register now!

Library as Laboratory Recap: Opening Television News for Deep Analysis and New Forms of Interactive Search

Watching a single episode of the evening news can be informative. Tracking trends in broadcasts over time can be fascinating. 

The Internet Archive has preserved nearly 3 million hours of U.S. local and national TV news shows and made the material open to researchers for exploration and non-consumptive computational analysis. At a webinar April 13, TV News Archive experts shared how they’ve curated the massive collection and leveraged technology so scholars, journalists and the general public can make use of the vast repository.

Roger Macdonald, founder of the TV News Archive, and Kalev Leetaru, collaborating data scientist and GDELT Project founder, spoke at the session. Chris Freeland, director of Open Libraries, served as moderator and Internet Archive founder Brewster Kahle offered opening remarks.

Watch video

“Growing up in the television age, [television] is such an influential, important medium—persuasive, yet not something you can really quote,” Kahle said. “We wanted to make it so that you could quote, compare and contrast.” 

The Internet Archive built on the work of the Vanderbilt Television Archive, and the UCLA Library Broadcast NewsScape to give the public a broader “macro view,” said Kahle. The trends seen in at-scale computational analyses of news broadcasts can be used to understand the bigger picture of what is happening in the world and the lenses through which we see the world around us.

In 2012, with donations from individuals and philanthropies such as the Knight Foundation, the Archive started repurposing the closed captioning data stream required of all U.S. broadcasters into a search index. “This simple approach transformed the antiquated experience of searching for specific topics within video,” said Macdonald, who helped lead the effort. “The TV caption search enabled discovery at internet speed with the ability to simultaneously search millions of programs and have your results plotted over time, down to individual broadcasters and programs.”

“[Television] is such an influential, important medium—persuasive, yet not something you can really quote. We wanted to make it so that you could quote, compare and contrast.”

Brewster Kahle, Internet Archive

Scholars and journalists were quick to embrace this opportunity, but the team kept experimenting with deeper indexing. Techniques like audio fingerprinting, Optical Character Recognition (OCR) and Computer Vision made it possible to capture visual elements of the news and improve access, Macdonald said. 

Sub-collections of political leaders’ speeches and interviews have been created, including an extensive Donald Trump Archive. Some of the Archive’s most productive advances have come from collaborating with outsiders who have requested more access to the collection than is available through the public interface, Macdonald said. With appropriate restrictions to maintain respect for broadcasters and distribution platforms, the Archive has worked with select scientists and journalists as partners to use data in the collection for more complex analyses.

Treating television as data

Treating television news as data creates vast opportunities for computational analysis, said Leetaru. Researchers can track word frequency use in the news and how that has changed over time.  For instance, it’s possible to look at mentions of COVID-related words across selected news programs and see when it surged and leveled off with each wave before plummeting downward, as shown in the graph below.

The newly computed metadata can help provide context and assist with fact checking efforts to combat misinformation. It can allow researchers to map the geography of television news—how certain parts of the world are covered more than others, Leetaru said. Through the collections, researchers have explored  which presidential tweets challenging election integrity got the most exposure on the news.  OCR of every frame has been used to create models of how to identify names of every “Dr.” depicted on cable TV after the outbreak of COVID-19 and calculate air time devoted to the medical doctors commenting on one of the virus variants.  Reverse image lookup of images in TV news has been used to determine the source of photos and videos.  Visual entity search tools can even reveal the increasing prevalence of bookshelves as backdrops during home interviews in the pandemic, as well as appearances of books by specific authors or titles. Open datasets of computed TV news metadata are available that include all visual entity and OCR detections, 10-minute interval captioning ngrams and second by second inventories of each broadcast cataloging whether it was “News” programming, “Advertising” programming or “Uncaptioned” (in the case of television news this is almost exclusively advertising).

From television news to digitized books and periodicals, dozens of projects rely on the collections available at archive.org for computational and bibliographic research across a large digital corpus. Data scientists or anyone with questions about the TV News Archives, can contact info@archive.org.

Up Next

This webinar was the fourth a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The next will be about Analyzing Biodiversity Literature at Scale on April 27. Register here.

Meet the Librarians: Alexis Rossi, Media & Access

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


Alexis Rossi has always loved books and connecting others with information. After receiving her undergraduate degree in English and creative writing, she became a book editor and then worked in online news. 

Alexis Rossi

In 2006, Rossi joined the staff of the Internet Archive. She was working on the launch of the Open Library project when she recognized the need to learn more about how to best organize materials. She enrolled at San Jose State University and earned her Master’s of Library and Information Science in 2010.

“It gave me a better grasp of how to hierarchically organize information in a way that is sensible and useful to other libraries,” Rossi said. “It also gave me better familiarity with how other more traditional libraries actually work—the types of data and systems they use.”

Rossi concentrated on web interfaces for library information, understanding digital metadata, and how to operate as a digital librarian. In addition to overseeing the Open Library project, at the Internet Archive, Rossi managed a revamp of the organization’s website, ran the Wayback Machine for four years, founded the webwide crawling program, and is currently a librarian and director of media & access.

“One of the themes of my life is trying to empower people to do whatever they want to do,” said Rossi, who grew up in Monterey, California, and now lives in San Francisco. “Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

“Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

Alexis Rossi, Media & Access

Rossi acknowledges she is privileged to have means to avail herself to an abundance of information, while many in other parts of the world do not. There are so many societal problems she cannot solve, Rossi said, but she believes her work is making a contribution.  

“We can build a library that allows people to access information for free, wherever they are, and however they can get to it, in whatever way. That, to me, is incredibly important,” Rossi said. It’s also rewarding to help patrons discover new information and recover materials they may have thought were lost, she added.

When she’s not working, Rossi enjoys making funky jewelry and elaborate cakes (a skill she learned on YouTube).

Among the millions of items and collections in the Internet Archive, what is Rossi’s favorite? Video and audio recordings of her dad, now 73, playing the piano, organ and accordion: “It’s just so good. It’s such a perfect little piece of history.”

Library as Laboratory Recap: Curating the African Folktales in the Internet Archive’s Collection

Laura Gibbs and Helen Nde share a passion for African folktales. They are both active researchers and bloggers on the subject who rely on the Internet Archive’s extensive collection in their work.

In the third of a series of webinars highlighting how researchers in the humanities use the Internet Archive, Gibbs and Nde spoke on March 30 about how they use the online library and contribute to its resources.

Watch now:

Gibbs was teaching at the University of Oklahoma in the spring of 2020 when the campus library shut down due to the pandemic. “That’s when I learned about controlled digital lending at the Internet Archive and that changed everything for me. I hadn’t realized how extensive the materials were,” said Gibbs, who was trained as a folklorist. She retired last May and began a project of cross-referencing her bookshelves of African and African-American folktales to see how many were available at the Internet Archive. Being able to check out one digital title at a time through controlled digital lending (CDL) opened up new possibilities for her research. 

“It was just mind boggling to me and so exciting,” she said of discovering the online library. “I want to be a provocation to get other people to go read, do their own writing and thinking from books that we can all access. That’s what the Internet Archive has miraculously done.”

A Reader’s Guide to African Folktales at the Internet Archive by Laura Gibbs. Now available.

Gibbs said it has been very helpful to use the search function using the title of a book, name of an illustrator or some other kind of detail. With an account, the user can see the search results and borrow the digital book through CDL. “It’s all super easy to do. And if you’re like me and weren’t aware of the amazing resources available through controlled digital lending, now is the time to create your account at the Internet Archive,” Gibbs said. 

Every day, Gibbs blogs about a different book and rewrites a 100-word “tiny-tale” synopsis. In less than a year, she compiled A Reader’s Guide to African Folktales at the Internet Archive, a curated bibliography of hundreds of folktale books that she has shared with the public through the Internet Archive. Some are in the public domain, but many are later works and only available for lending one copy at a time through CDL. 

In her work, Nde explores mythological folklore from the African continent and is dedicated to preserving the storyteller traditions of African peoples, which is largely oral culture. Nde maintains the Mythological Africans website where she hosts storytelling sessions, modern lectures, and posts essays.

“[The Internet Archive] is an amazing resource of information online, which is readily available, and really goes to dispel the notion that there is no uniformity of folklore from the African continent,” Nde said. “Through Mythological Africans, I am able to share these stories and make these cultures come alive as much as possible.”

As an immigrant in the United States from Cameroon, Nde began to research the topic of African folklore because she was curious about exploring her background and identity. She said she found a community and a creative outlet for examining storytelling, poetry, dance and folktales. Nde said examining Gibb’s works gave her an opportunity to reconnect with some of the favorite books from her childhood. She’s also discovered reference books through the Internet Archive collection that have been helpful. Nde is active on social media (Twitter.com/mythicafricans) and has a YouTube channel on African mythology. She recently collaborated on a project with PBS highlighting the folklore behind an evil entity called the Adze, which can take the form of a firefly. 

The presenters said when citing material from the Internet Archive, not only can they link to a source, a blog or an academic article, they can link to the specific page number in that source. This gives credit to the author and also access to that story for anybody who wants to read it for themselves.

The next webinar in the series, Television as Data: Opening TV News for Deep Analysis and New Forms of Interactive Search, on April 13 will feature Roger MacDonald, Founder of the TV News Archive and Kalev Leetaru, Data Scientist at GDELT. Register now.

Meet the Librarians: Lisa Seaberg, Patron Services & Open Library

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


Like any good librarian, Lisa Seaberg of the Internet Archive’s patron services team is prepared to answer the question: Can you recommend a book? In fact, Seaberg has 1,729 suggestions. She has organized what she wants to read in a publicly available list on Open Library.

Lisa Seaberg

“I’ve had a lifelong interest in reading and books,” said Seaberg, who worked as an assistant in her high school library in Milford, Connecticut. It was there that a mentoring librarian helped shape her taste in reading and introduced her to The Hitchhiker’s Guide to the Galaxy by Douglas Adams. 

Seaberg went on to earn her bachelor’s degree in library science from Southern Connecticut State University in 1996. She learned about the book publishing industry, practical skills of cataloguing, Boolean searching, and managing databases. She later earned a master’s degree in digital media from Quinnipiac University in Connecticut.

In 2017, Seaberg began to volunteer with Open Library and was hired to join the Internet Archive staff in 2020 to work for patron services. Based in Amsterdam, she responds to email requests to connect users with resources and helps coordinate a team of more than 200 volunteers to fix metadata issues. Seaberg works to maintain the digital collection, identify duplicates, and make sure the record represents the available books. She also fulfills interlibrary loan requests, as part of the Internet Archive’s new ILL service.

“It’s rewarding to make something discoverable.”

Lisa Seaberg, Patron Services & Open Library

Prior joining the Internet Archive, Seaberg worked at Gateway Computers in the late 90s where she gained useful technology experience. She later worked in communications for a hospital, managing its website. Those positions provided her with a sense of information architecture, she said, that she has applied to her work at the Internet Archive.

Lisa Seaberg

Seaberg said she is fascinated by everything that the Internet Archive provides to the public. In her job, she enjoys working with the book metadata. “It’s rewarding to make something discoverable,” she said. If people have an author they like, Seaberg tries to make sure there are subject headings and tags to make it easier for them to find related materials of interest. 

Recently, Seaberg said, it’s been meaningful to be involved in efforts to provide access to books being challenged by local school districts because of controversial content. She’s helped assemble digital collections of titles being targeted to ensure continuous access should an entity decide to ban them. 

When Seaberg is not working, she loves to play board games—gravitating to hobbyist, European games such as the Gaia Project, the complex, economy-building game that takes place in space. Her other main hobby is book hunting at charity shops and openbare boekenkastjes (free libraries) in and around her home in Amsterdam. Since Seaberg has limited shelf space, she sticks to her rule of only buying books that are on her Open Library Want To Read list.  

Among her favorite projects when it comes to the Internet Archive collection: Organizing the profiles of individual authors to make sure their works are all consolidated and easy to find for patrons. 

Meet the Librarians: Sawood Alam, Wayback Machine

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


Sawood Alam was born and raised on a farm in a remote village of India with no smartphones, television or electricity. 

Sawood Alam

“Books were one of the only means of learning and entertainment for us,” said Alam, who checked out as many books as he could from his school library every Thursday. “I had to take my buffalo out every afternoon. It was a boring task out in the field with no one to talk to, so books were my companions.”

When he was 10 years old, Alam helped at his school library, which was all run by children. He said he learned a lot about sorting, indexing and categorizing books—the beginning of a lifelong passion.  

Nearly two decades later, Alam completed his PhD in computer science with a specialty in web archiving from Old Dominion University. He was part of the Web Science and Digital Libraries Research Group at the university. 

Alam joined the staff of the Internet Archive as a web and data scientist in 2020. Working with the Wayback Machine team, Alam supports researchers from all around the world conducting analyses with Internet Archive collections. When someone has a research question that involves interaction with Wayback Machine APIs or downloading a large number of archived web pages, he helps prepare the data and provides technical assistance. Alam tries to improve the discoverability of items in massive web collections. His data insights and quality assurance efforts enhance web crawling and Wayback Machine operations.

Alam also collaborates with partners from academia, industry, and organizations on various research, development and standardization efforts. His own research has focused on archive profiling, interoperability and cooperation among archives, which are all topics the data scientist writes about and shares on Twitter.

“My first language is Urdu so when I see books and materials in Urdu in the Internet Archive it brings me joy.”

Sawood Alam, Wayback Machine

Formal academic training in the field of web archiving is uncommon, said Alam. With his background, he’s able to understand the data scientists’ research needs, he said, making his skills a perfect match for his position at the Internet Archive. 

“‘Universal Access to All Knowledge’ is something that certainly resonates for me,” Alam said of the Internet Archive’s mission. “I would like to focus on making it more global.”

Sawood Alam

In recognition of his contribution to the library community with digital preservation, Alam received the NDSA 2020 Future Stewards Innovation Award.

Beyond his work at the Internet Archive, Alam serves the digital library and web archiving communities by peer-reviewing research papers and chairing sessions in journals and conferences in the fields of his interest and participating in conversations of International Internet Preservation Consortium (IIPC) with focus towards interoperability, collaborations, and other related topics.

Favorite items in the Internet Archive for Alam? “I established a volunteer-driven online Unicode Urdu books library, UrduWeb Digital Library, during my graduation years. My first language is Urdu so when I see books and materials in Urdu in the Internet Archive it brings me joy. Thanks to the Wayback Machine, I was able to narrate the lost story of the evolution of Urdu blogging on the 20th anniversary of the Internet Archive.”

Meet the Librarians: Catherine Falls, Community Webs

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


In the spring of 2021, Catherine Falls was hired by the Internet Archive to launch the Community Webs program in Canada. She was excited about the prospect of helping public libraries, museums, local historical societies and archives digitally preserve important material. 

Catherine Falls

“Most web archiving happens at really large institutions, so much of the experience of local communities is missing from the historic record. It’s giving us a biased view of contemporary society,” Falls said. “The more of these local organizations that we can get to do this archiving, the more the historic record will be brought into balance.”

Since her efforts began, the Internet Archive has partnered with 43 institutions and organizations in Canada to build community-based collections. Falls said it’s been rewarding to follow the growth and variety of web-archiving projects . For example, the Milton Public Library in Ontario is working with the Halton Black History Awareness Society and other organizations to document items that may not otherwise be captured on the web. Meanwhile, the ArQuives: Canada’s LGBTQ2+ Archives is working with its community members to build web archive collections that capture the community’s web presence.

Falls earned  bachelor’s degrees in commerce and art history from the University of British Columbia. She also has a master’s degree in library science and a master’s degree in art history from the University of Toronto. Before coming to the Internet Archive, she worked as an archivist in Canada at several institutions including York University and the Archives of Ontario.

“I’m interested in the free circulation of ideas and the library as a place where public knowledge is accessible.”

Catherine Falls, Community Webs

“I was drawn to libraries as a kind of place that facilitates research–which for me is the most exciting phase of any project,” Falls said. “I’m interested in the free circulation of ideas and the library as a place where public knowledge is accessible. I like how the intellectual possibilities of a library intersect with the library as a community space.”

Catherine Falls

Falls says her background gives her a solid understanding of the basic functions of the library and the common language used within the profession. With that theoretical grounding, she said she can approach her work from a critical perspective to make improvements. 

“It’s important to keep in mind that libraries are not infallible institutions. We need to be constantly questioning our practice and finding ways to be better,” Falls said. “It’s easy to say libraries are these beautiful, idyllic institutions. But I think it’s healthy to take a critical eye toward the work we do so that we can try to live up to our ideals in terms of whose stories we tell, who has access to our services, and what is preserved for the long term.” 

Falls said she enjoys the mission-driven focus of the Internet Archive. Operating in the library, technology and archival world, it has a dynamic, nimble culture that provides fertile ground in which to explore new ideas, she said. 

Falls’ favorite holdings are among some of the quirkier arts-related web archive collections in the Internet Archive: University of Michigan, School of Information, 20th Century Minimalist Music, Dalhousie University, Artist-Run Centres in Halifax, Nova Scotia, and Corning Museum of Glass, Contemporary Glass Podcasts.

Meet the Librarians: Jessamyn West, Accessibility

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


In her work, Jessamyn West is driven by a desire to help people and remove barriers to access.

“When I went to library school, I realized a lot of the things that were important to me lined up with library values,” West said. “Anti-censorship, intellectual freedom, and serving all the people — not just the people who can afford it, not just the people who can make it up two flights of stairs, not just people who can read small print. All the people.”

Jessamyn West

West is living out her values, processing requests from individuals to participate in the Internet Archive’s program for users with print disabilities. She receives emails from people around the world with blindness, low-vision, dyslexia, brain injuries and other cognition problems who need accessible content. In her role, West has helped qualify thousands of patrons to receive materials in alternative digital formats. 

Her qualifying work for the Internet Archive is among a variety of activities that keeps West busy with the Vermont Mutual Aid Society. West works part-time at the Kimball Library in Randolph, Vermont, where she helps adults in her community learn to use technology. She also does public speaking on the digital divide and other technology access issues, as well as writes a monthly column for Computers in Libraries Magazine.

“All I want to do is to get as much knowledge, to the most people, in as easy a way as possible.”

Jessamyn West, Vermont Mutual Aid Society

West grew up in Boxborough, Massachusetts, where she learned about computers from her dad and her mother introduced her to the importance of civic engagement and volunteerism. At Hampshire College, she earned a bachelor’s degree in linguistics and then moved to Seattle.

In 1994, West enrolled in the Graduate School of Library and Information Science (GSLIS) at the University of Washington. The shift to online information and the emergence of the web was presenting an opportunity and a challenge for libraries, which West said was exciting to be a part of at the time.

Jessamyn West

Her first job after graduation in 1996 was with AmeriCorps at the Seattle Public Library helping adults learn to use computers. She later pivoted working for an internet service provider before moving back East. In Vermont, she continued working with libraries and set up her own tech consultancy. West has worked as a tech liaison for Open Library and has been a qualifying authority for the Internet Archive since 2018

“All I want to do is to get as much knowledge, to the most people, in as easy a way as possible,” West said. “I think it’s important that we have all kinds of libraries. I wouldn’t want a world that was only digital libraries and I certainly wouldn’t want a world that was only physical libraries. It’s really nice that many people, depending where you are, can have access to either or both in the way that makes the most sense for them.”

West maintains a professional website (jessamyn.info), a personal website (jessamyn.com) and blogs at librarian.net. When she’s not working, she enjoys editing articles about  librarians and library topics on Wikipedia, playing pub trivia, creating moss terrariums, and writing postcards.

Among West’s favorite items at the Internet Archive: The Middlebury College collection of Vermont Life Magazine and The Great 78 Project.

Volunteers Rally to Archive Ukrainian Web Sites

As the war intensifies in Ukraine, volunteers from around the world are working to archive digital content at risk of destruction or manipulation. The Internet Archive is supporting several preservation efforts including the Saving Ukrainian Cultural Heritage Online (SUCHO) initiative launched in early March. 

“When we think about the internet, we think the data is always going to be there. But all this data exists on physical servers and they can get destroyed just like buildings and monuments,” said Quinn Dombrowski, academic technology specialist at Stanford University and co-founder of SUCHO. “A tremendous amount of effort and energy has gone into the development of these websites and digitized collections. The people of Ukraine put them together for a reason. They wanted to share their history, culture, language and literature with the world.”

Watch:

More than 1,200 volunteers with SUCHO have saved 10 terabytes of data including 14,000 uploaded items (images and PDFs) and captured parts of 2,300 websites so far. This includes material from Ukrainian museums, library websites, digital exhibits, open access publications and elsewhere. 

The initiative is using a combination of technologies to crawl and archive sites and content. Some of the information is stored at the Internet Archive, where it can be discovered and accessed using open-source software.

Staff at the Internet Archive are committed to assisting with the effort, which aligns with the organization’s mission of universal access to knowledge, and aim to make the web more useful and reliable, said Mark Graham, director of the Wayback Machine.

“This is a pivotal time in history,” he said. “We’re seeing major powers engaged in a war and it’s happening in the internet age where the platforms for information sharing and access we have built, and rely on, the Internet and the Web, are at risk.”

The Internet Archive is documenting and making information accessible that might not otherwise be available, Graham said. For years, the Wayback Machine has been archiving about 950 Russian news sites and 350 Ukrainian news sites. Stories that are deleted or altered are being archived for the historical record. 

“We’re seeing major powers engaged in a war and it’s happening in the internet age where the platforms for information sharing and access…are at risk.”

Mark Graham, director, Wayback Machine

Recognizing the urgency of this moment, Dombrowski has been stunned by the response to help from archivists, scholars, librarians involved in cultural heritage and the general public. Volunteers need not have technical expertise or special language skills to be of value in the project. 

“Many people were spending the days before they got involved with SUCHO scrolling the news and feeling helpless and wishing they could do something to contribute more directly towards helping out with the situation,” Dombrowski said. “It’s been really inspiring hearing the stories that people have told about what it’s meant to them to be able to be part of something like this.”

Gudrun Wirtz, head of the East European Department of the Bavarian State Library (Bayerische Staatsbibliothek) in Munich, was archiving on a smaller scale when she and other colleagues began to collaborate with SUCHO.

“We are committed to Ukraine’s heritage and horrified by this war against the people and their rich culture and the distorting of history going on,” Wirtz said. “As Germans we are especially shocked and reminded of our historical responsibility, because last time Ukraine was invaded it was 1941 by Nazi-Germany. We try to do everything we can at the moment.”

Anna Kiljas, Tufts University

The invasion of Ukraine hits particularly close to home for Anna Kijas, a librarian at Tufts University and co-founder of SUCHO, who is a Polish immigrant with family members who lived through Soviet occupation following WWII.

“Contributing to the SUCHO effort is something tangible that I can do and bring my expertise as a librarian and digital humanist in order to help preserve as much of the cultural heritage of the Ukrainian people as is possible,” said Kijas. 

The third co-founder SUCHO, Sebastian Majstorovic, is with the Austrian Centre for Digital Humanities and Cultural Heritage. 

The Internet Archive is providing technical support, tools and training to assist volunteers, including those with SUCHO, who are giving of their time.

Through Archive-It, a customizable self-service web archiving platform that captures, stores, and provides access to web-based content, free online accounts have been offered to volunteer archivists. Mirage Berry, business development manager for Archive-It, has coordinated support with other preservation partners including the Harvard Ukrainian Research Institute, the Center for Urban History of East Central Europe, and East European & Central Asian Studies Collections librarian Liladhar Pendse at University of California, Berkeley.

“It’s so incredible how quickly all of these archivists have pulled together to do this,” Berry said. “Everyone wants to do something. You don’t need to have a ton of technical experience. For anyone who is willing to learn, it’s a great jumping off point for web archiving.”

SUCHO organizers anticipate after the immediate emergency of website archiving is over, there will be an ongoing need to stay vigilant with data curation of Ukrainian material. To learn more and get involved, visit http://www.sucho.org.

Library as Laboratory Recap: Applications of Web Archive Research with the Archive Unleashed Cohort Program

From projects that compare public health misinformation to feminist media tactics, the Internet Archive is providing researchers with vital data to assist them with archival web collection analysis.

In the second of a series of webinars highlighting how the Internet Archive supports digital humanities research, five scholars shared their experience with the Archives Unleashed Project on March 16. 

Archives Unleashed was established in 2017 with funding from the Andrew Mellon Foundation. The team developed open-source, user-friendly Archives Research Compute Hub (ARCH) tools to allow researchers to conduct scalable analyses, as well as resources and tutorials. An effort to build and engage a community of users led to a partnership with the Internet Archive. 

A cohort program was launched in 2020 to provide researchers with mentoring and technical expertise to conduct analyses of archival web material on a variety of topics. The webinar speakers provided an overview of their innovative projects: 

  • WATCH: Crisis communication during the COVID-19 pandemic was the focus of an investigation by Tim Ribaric and researchers at Brock University in Ontario, Canada. Using fully extracted texts from websites of municipal governments, community organizations and others, the team compared how well information was conveyed to the public. The analysis assessed four facets of communication: resilience, education, trust and engagement. The data set was used to teach senior communication students at the university about digital scholarship, Ribaric said, and the team is now finalizing a manuscript with the results of the analysis.
  • WATCH: Shana MacDonald from the University of Waterloo in Ontario Canada applied archival web data to do a comparative analysis of feminist media tactics over time. The project mapped the presence of feminist key concepts and terms to better understand who is using them and why. The researchers worked with the Archives Unleashed team to capture information from relevant websites, write code and analyze the data. They found the top three terms used were “media, culture and community,” MacDonald said, providing an interesting snapshot into trends with language and feminism.
  • WATCH: At the University of Siegen, a public research university in Germany, researchers examined the online commenting system on new websites from 1996 to 2021.  Online media outlets started to remove commenting systems in about 2015 and the project was focused on this time of disruption. With the rise of Web 2.0 and social media, commenting is becoming increasingly toxic and taking away from the main text, said the university’s Robert Jansma. Technology providers have begun to offer ways to stem the tide of these unwanted comments and, in general, the team discovered comments are not very well preserved.
  • WATCH: Web archives of the COVID-19 crisis through the IIPC Novel Coronavirus dataset was analyzed by a team at the University of Luxembourg led by Valérie Schafer. As a shared, unforeseen, global event, the researchers found vast institutional differences in web archiving. Looking at tracking systems from the U.S. Library of Congress, European libraries and others, the team did not see much overlap in national collections and are in the midst of finalizing the project’s results.
  • WATCH: Researchers at Arizona State University worked with ARCH tools to compare health misinformation circulating during the HIV/AIDS crisis and COVID-19 pandemic.  ASU’s Shawn Walker did a text analysis to link patterns and examine how gaps in understanding of health crises can fuel misinformation. In both cases, the community was trying to make sense of information in an uncertain environment. However, the government conspiracy theories rampant in the COVID-19 pandemic were not part of the dialogue during the HIV/AIDS crisis, Walker said.

Archives Unleashed is accepting applications for its 2022-23 cohort research teams. For more information, view the application & instructions: https://archivesunleashed.org/cohorts2022-2023/.

Up next in the Library as Laboratory series:

The next webinar in the series, Hundreds of Books, Thousands of Stories: A Guide to the Internet Archive’s African Folktales will be held March 30. Register now