Launching Legal Literacies for Text Data Mining – Cross Border (LLTDM-X)

We are excited to announce that the National Endowment for the Humanities (NEH) has awarded nearly $50,000 through its Digital Humanities Advancement Grant program to UC Berkeley Library and Internet Archive to study legal and ethical issues in cross-border text data mining research. NEH funding for the project, entitled Legal Literacies for Text Data Mining – Cross Border (LLTDM-X), will support research and analysis that addresses law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or licensed content, or involves international research collaborations. LLTDM-X builds upon Building Legal Literacies for Text Data Mining Institute (Building LLTDM), previously funded by NEH. UC Berkeley Library directed Building LLTDM, bringing together expert faculty from across the country to train 32 digital humanities researchers on how to navigate law, policy, ethics, and risk within text data mining projects (results and impacts are summarized in the white paper here.) 

Why is LLTDM-X needed?

Text data mining, or TDM, is an increasingly essential and widespread research approach. TDM relies on automated techniques and algorithms to extract revelatory information from large sets of unstructured or thinly-structured digital content. These methodologies allow scholars to identify and analyze critical social, scientific, and literary patterns, trends, and relationships across volumes of data that would otherwise be impossible to sift through. While TDM methodologies offer great potential, they also present scholars with nettlesome law and policy challenges that can prevent them from understanding how to move forward with their research. Building LLTDM trained TDM researchers and professionals on essential principles of licensing, privacy law, as well as ethics and other legal literacies —thereby helping them move forward with impactful digital humanities research. Further, digital humanities research in particular is marked by collaboration across institutions and geographical boundaries. Yet, U.S. practitioners encounter increasingly complex cross-border problems and must accordingly consider how they work with internationally-held materials and international collaborators.

How will LLTDM-X help? 

Our long-term goal is to design instructional materials and institutes to support digital humanities TDM scholars facing cross-border issues. Through a series of virtual roundtable discussions, and accompanying legal research and analyses, LLTDM-X will surface these cross-border issues and begin to distill preliminary guidance to help scholars in navigating them. After the roundtables, we will work with the law and ethics experts to create instructive case studies that reflect the types of cross-border TDM issues practitioners encountered. Case studies, guidance, and recommendations will be widely-disseminated via an open access report to be published at the completion of the project. And most importantly, these resources will be used to inform our future educational offerings.

The LLTDM-X team is eager to get started. The project is co-directed by Thomas Padilla, Deputy Director, Archiving and Data Services at Internet Archive and Rachael Samberg, who leads UC Berkeley Library’s Office of Scholarly Communication Services. Stacy Reardon, Literatures and Digital Humanities Librarian, and Timothy Vollmer, Scholarly Communication and Copyright Librarian, both at UC Berkeley Library, round out the team.

We would like to thank NEH’s Office of Digital Humanities again for funding this important work. The full press release is available at UC Berkeley Library’s website. We invite you to contact us with any questions.

Why it’s Important to #OwnBooks

Here’s Max Collins, lead singer of legendary alt-rock band Eve 6, reading a book that he owns.

As you know, the Internet Archive is currently being sued by four corporate publishers. The publishers want to stop libraries from owning books. In the age of Netflix and Spotify, ownership of culture is increasingly in the hands of large corporations rather than people, artists and public institutions.

We’re fighting back by celebrating book ownership with the #OwnBooks campaign. 

It’s very easy to take part. Choose a book that you’ve owned for a long time – ideally the oldest book you own! You can also choose another media piece, such as a record, CD, or DVD. Take a photo with the book and share it on social media. Tell us how long you’ve owned your book and use the #OwnBooks hashtag.

Check out why other readers like to #OwnBooks.

You could also tell us the story of your relationship with the book – what were the circumstances in which you acquired it? Does it spark any special memories for you? If you prefer, you could make a selfie video and record yourself telling the story of the book. 

We’ll retweet your posts. Make sure to use the #OwnBooks hashtag and mention @internetarchive to help us find them.

Celebrating 20 Years of the Live Music Archive

This week, the Live Music Archive collection at the Internet Archive reaches a milestone – 20 years since the collection was started. The roots of the Live Music Archive collection are visible right in the URL – etree. Did you ever wonder what the “etree” in the URL references? In 1998, the etree music community was created to promote the online trading of lossless audio recordings of live music performances. With the advent of more widely available broadband (by 1990’s standards, mind you) internet connections and the creation of lossless file compression formats (Shorten at first, followed by FLAC), the community established protocols to ensure the preservation and archiving of these original audio recordings. Preservation and archiving. The very ethos of the Internet Archive.

Early Live Music Archive logo

In July 2002, Jon Aizen, a software engineer at the Internet Archive and live music enthusiast, proposed to Brewster Kahle the idea of archiving live music recordings. Brewster was enthusiastic and so on July 23, 2002, Jon reached out to the etree community via their email list to make an offer. The Internet Archive was offering to provide “unlimited storage, unlimited bandwidth, forever, for free” to ensure the preservation and easy distribution of these live music recordings. The reply came back:  “We don’t believe you. But if you could, that would be our dream.” And we were off to the races to create the first library archive of lossless, legal, live audio recordings. The first order of business was to get explicit permission from the artists to not only preserve but also make available easy access to their recordings. Aizen and others starting emailing bands and documenting their responses. It would be a great story to have the first item as part of the collection to be some rare Grateful Dead recording from 1968, but it is actually an unassuming Rusted Root audience recording from August 24, 2001 uploaded to the new Live Music Archive collection by Aizen on August 12, 2002. You can listen to it here. Of course, there has to be a Grateful Dead connection as the show features a guest appearance by Mark Karan, guitarist (at the time) for Ratdog, one of Bob Weir’s side projects. Perhaps the fact that it is unassuming is more in line with the goal of preservation and archiving. Preservation of all, not just the shiny fancy gem. Permission from the Grateful Dead came a little while later, through Brewster’s connection to John Perry Barlow, who worked together on the board of the Electronic Frontier Foundation.

As the Live Music Archive was established, the etree community jumped in to help get things rolling – dedicating hundreds of hours to cataloging, uploading, and verifying recordings of shows. In those days it would take 6-12 hours to upload a show via FTP. Jon Aizen describes grabbing shows off etree’s FTP server network as well as from hard drives and other sources and uploading them to the Live Music Archive. Aizen also worked in the early days to create the curation process to enable volunteers to ensure that uploads were permitted by the artists. The Internet Archive team also worked on the “deriver” software which would convert the lossless recordings to MP3 and other more accessible formats (which came after heated debate amongst the etree community, for many of whom the notion of lossy distribution of recordings was anathema). Today’s uploading experience is a web interface that takes most folks 10-20 minutes to upload a show and have it almost immediately available to the world. There were many people involved in the early days and I’m sure we will miss some, but we’d like to thank the following notable contributors:

Alexis Rossi
Brad Leblanc
Bram Cohen
Caleb Epstein
Diana Hamilton
Ghost
Greg Pope
John Dailey
Jon Aizen
Lauren Gelman
Marc Pujol
Mark Goldey
Matt Vernon
Parker Thompson
Peter Hedeman
Ryan Brase
Tom Anderson
Tom Horton
Tracey Jaquith
Tyler Huff

Brad Leblanc recalls doing all the tasks manually – validating checksums, moving files to public download areas, running derivation routines to create mp3/ogg files for streaming. Brad, Jon, and all the others were curating this new collection, bit by bit, as well as building software to automate the process. The Live Music Archive volunteers today still refer to themselves as curators. An amazing task with incredible results.

A grand offer followed by a positive, yet skeptical, response. And then a lot of hard work by both Internet Archive staff and engineers as well as volunteers from the live music taping and trading community. For 20 years, we have kept curating, uploading to the Live Music Archive about 1,000 recordings per month with the total now at 240,000 recordings in total – by far the largest collection of live music recordings in the world. We should reach 250,000 by next summer. More than 8,000 artists have given permission to have recordings of their shows archived on the Live Music Archive. Those recordings have been listened to more than 600,000,000 (yes, 600 Million) times. And many of those are not even the Grateful Dead, giving visibility to artists that might otherwise have less exposure. The Grateful Dead remains the cornerstone artist of the Live Music Archive, but there are many other options on the Live Music Archive – jambands, folk singers, bluegrass, rock, pop, jazz, classical, experimental, mainstream artists, and every combination you can think of.

Beyond listening to the music, what impact has the Live Music Archive had on the artists? The recordings allow their fans to hear the shows they were at or couldn’t make it to or the one across the country that happened yesterday. Building and fortifying a fanbase through the community of live music recordings. Not just for the fans, but the appreciation from the artists as well. One of our curators was having a conversation backstage before a show with a musician friend. It was an “in the round” type show featuring four songwriters alternating to perform their songs with the others playing or singing along. One of the other artists was on the couch trying to take a nap before the show. As soon as the conversation turned to the Live Music Archive, he popped off the couch to say, “I love the Live Music Archive! That place is great. I go there to check out music all the time.” From a nap to excitement in a second. The Live Music Archive is a resource both personally and professionally for musicians. A new musician joins the band? Send them to the Live Music Archive to check out some shows to learn how the songs are played live, the seques occur. A recent text one of our curators received was an artist looking for a recommendation, “What is a good recent recording I can send to some musicians? I love the Tahoe and Eugene recordings from earlier this year but need something more recent.” It was certainly enough to put a smile on a curator’s face.

From trading tapes (reel to reel, cassettes, DATs) by mail months/years after the show occurred to CD’s to FTP server networks to hearing the show hours after it ended on your mobile device – a transformation of a community. No longer hundreds and thousands hearing the show, but hundreds of thousands.

The Live Music Archive curators are not just archivists, but tapers and music fans themselves. Here are some suggestions for curators past and present.

From Jon Aizen:

“It’s hard to pick one, but I think Sim and Uniit at the State Theater in Ithaca in 2002 is an amazing example of the power of the Archive. If it weren’t for the Archive, this recording would be sitting on a tape somewhere, probably lost forever. This small act, never to be repeated (Sim and Uniit are friends, but not a regular act) is a moment in time perfectly captured.”

Sim Redmond and Uniit Carruyo Live at State Theater on 2002-09-14

From vanark:

“Some of my favorite recordings are from the most intimate settings – especially house concerts and in store performances. Close to the performer in a more informal environment, without a big PA or sound system. Musician, instrument (usually acoustic), microphones, and a couple dozen fans. In this recording, the in store occurred in the afternoon prior to the evening performance at a local club. JJ Grey walks to the small area in the corner of the store, sees the microphones set up in front of him and asks, ‘Whose are these?’ I raise my hand and a big smile rises across his face and I get a ‘That’s great!’ A short 5-song set promoting the newest CD. There were still hours before the evening show, so I head home and upload the show to the Archive before heading to the club. I think I had more fun at the free in store than the main event.”

JJ Grey Live at Newbury Comics – Faneuil Hall Marketplace on 2008-10-25

If you want to listen to some of the most popular recordings of all time on the Live Music Archive, here are some selections.

The most listened to item of all time. OAR has quite a following, and this one might have been embedded on their Myspace page to get 2.7 million listens:

Of A Revolution Live at Madison Square Garden on 2006-01-14

The most listened to Grateful Dead recording (no, not Cornell 1977, although that is second):

Grateful Dead Live at Robert F. Kennedy Stadium on 1973-06-10

An interesting show in the top 20 of all time, from a pizza/brewery in Asheville, NC, recorded by curator Gordon:

Patterson Hood Live at Asheville Pizza & Brewing Company on 2006-01-07

Whichever show you choose to listen to, whether it has been listened to 500,000 times or a backyard show from last weekend listened to 50 times, they have value to someone and it is not measured by the number of listens. The tapers are still out there capturing the moments from artists, new and established, doing covers or originals. Capturing, archiving, preserving.

From all the listeners, artists, and tapers, thank you to the Internet Archive and etree for taking that leap of faith in 2002 and pushing it forward. Who knows where we can take it from here? Let’s keep it going! Let’s start planning that party for the 25th anniversary – who’s in?

Book Signing with Congressman Adam Schiff at the Internet Archive

Please join us for a conversation and book signing sponsored by Booksmith, Berkeley Arts & Letters and the Internet Archive.

Congressman Schiff is celebrating the paperback launch of his #1 New York Times bestselling “Midnight in Washington: How We Almost Lost Our Democracy and Still Could”.

Tuesday 8/16/22 7:30 pm

300 Funston Ave.
San Francisco, CA 94118

Adam Schiff is the United States Representative for California’s 28th Congressional District. In his role as Chairman of the House Permanent Select Committee on Intelligence, Schiff led the first impeachment of Donald J. Trump. Before he served in Congress, he worked as an Assistant U.S. Attorney in Los Angeles and as a California State Senator.

RSVP here

New additions to the Internet Archive for July 2022

Many items are added to the Internet Archive’s collections every month, by us and by our patrons. Here’s a round up of some of the new media you might want to check out. Logging in might be required to borrow certain items. 

Notable new collections from our patrons: 

Books – 78,091 New items in July

This month we’ve added books on varied subjects in more than 20 languages. Click through to explore, but here are a few interesting items to start with:

Audio Archive – 91,636 New Items in July

The audio archive contains recordings ranging from alternative news programming, to Grateful Dead concerts, to Old Time Radio shows, to book and poetry readings, to original music uploaded by our users. Explore.

LibriVox Audiobooks – 119 New Items in July

Founded in 2005, Librivox is a community of volunteers from all over the world who record audiobooks of public domain texts in many different languages. Explore.

78 RPMs and Cylinder Recordings – 8,888 New Items in July

Listen to this collection of 78rpm records, cylinder recordings, and other recordings from the early 20th century. Explore.

Live Music Archive – 965 New Items in July

The Live Music Archive is a community committed to providing the highest quality live concerts in a lossless, downloadable format, along with the convenience of on-demand streaming (all with artist permission). Explore.

Movies – 135 New Items in July

Watch feature films, classic shorts, documentaries, propaganda, movie trailers, and more! Explore.

Canada is Leading the Way on User-Centered Copyright Policy

In an important new copyright decision, the Supreme Court of Canada reaffirmed its commitment to the principles of users rights and technological neutrality–principles which have made Canada a world leader in balanced copyright and support for controlled digital lending (CDL) by libraries.  

For many years now, the Supreme Court of Canada has emphasized the importance of these two principles in striking the proper copyright balance. With respect to user’s rights, the Supreme Court has held that exceptions and limitations to copyright are not mere loopholes–they are affirmative user’s rights. This means that copyright is not about maximizing the economic interests of publishers or anyone else, but instead about advancing the public good by seeking “the proper balance between the rights of a copyright owner and users’ interests.” With respect to technological neutrality, the Supreme Court has held that the Copyright Act must be interpreted in view of the principle of technological neutrality, according to which “[w]hat matters is what the user receives, not how the user receives it.” This means that, in general, the courts should “interpret the Copyright Act in a way that avoids imposing an additional layer of protections and fees based solely on the method of delivery of the work to the end user.” These principles have been particularly important for Canadian libraries and their patrons, supporting CDL and other important library practices there.

In many ways, these principles seem like good old fashioned common sense. But publishers and others have long claimed that these user rights and technological neutrality “pose[] a direct threat” to their economic interests. In the new case, SOCAN v. ESA, these arguments were once again brought before the Supreme Court of Canada–and once again rejected. 

As Professor Michael Geist has noted, the case:

provides a further entrenchment of Canadian copyright jurisprudence that holds users’ rights and the copyright balance as foundational elements of the law. . . . the court’s support for these principles is not obiter, rhetoric, or likely to change. Indeed, copyright lobby groups have spent much of the past two decades in denial, convinced that somehow the growing body of Supreme Court copyright cases will be reversed the next time the court confronts the issue. That has now led to multiple defeats at Canada’s highest court by copyright collectives such as Access Copyright and SOCAN. In each case, the core copyright principles have remained unchanged. Indeed, if anything, they have become more solidified as precedent builds upon precedent. Given these outcomes and last week’s SOCAN v. ESA decision, it is long past time for these groups to engage in copyright policy based on the realities of balance, users’ rights, and technological neutrality.

These principles–and a balanced approach overall–allow libraries in Canada to continue to fulfill their mission in the digital age, and allow ordinary citizens access to quality information, all while supporting a thriving creative industry at home and abroad.

DWeb Camp 2022: A Grounded Convening of Those Building a Decentralized, Values-Driven Web

Much has changed since 2016, when the Internet Archive held the first Decentralized Web Summit. Scrappy teams with lean funding have grown into formidable organizations with budgets in the millions. Niche technologies and far-fetched debates from a few years ago have dominated headlines and are shaping entire economies.

Each of the DWeb events reflected a moment in a quickly shifting landscape of protocols, institutions, and ideologies. In the three years since DWeb Camp in 2019, some major trends have transformed people’s thinking. The explosion of non-fungible tokens (NFTs) into the mainstream. The renaissance of projects centered on shared ownership and governance of assets. The reckoning with the power and potential of decentralized technologies: to either further entrench existing social inequities and exacerbate ecological harm, or radically reconstruct the ways in which individuals and communities can meaningfully address these and other crises of our time. 

As organizers of this community, the defining change was the development of the DWeb Principles. The Principles help us to define what we stand for, instead of merely what we stand against. They emerged out of discussions and alignment between many members of the DWeb community, and are just one part of a growing awareness of the ethics and beneficiaries of decentralized digital ecosystems. 

DWeb Camp 2022 will be held from August 24-28 at Camp Navarro, California. As the programming takes shape, the themes, spaces, and participants of this year’s event clearly reflect where we are in this still nascent movement. At DWeb Camp, we’ll be hacking and live testing cutting edge decentralized protocols, platforms, and hardware. We’ll tackle thorny topics about who these tools serve and how to govern and steward them sustainably. We’ll confront questions about power, marginalization, community, identity, ecology, and human rights. 

With all the DWeb events, we aim to create spaces for people to share their ideas, projects, and research among warm, supportive peers who believe in a plurality of approaches and solutions to build a decentralized values-driven web. By meeting in-person, outdoors among towering redwood trees, DWeb Camp is about manifesting that ethos as we invite all those participating to bring their full selves. We’re designing this event to be a place for us to be curious and humble. Not to come with all the answers but to be open to having your mind and heart changed.

Below are some of the Spaces, or thematic sessions, that will be held throughout the five-day event. In addition to the Spaces described below, we will build a local Mesh Network across the campground for participants to share locally-hosted materials, test hardware, and experience a community network first-hand.

Spaces

  • Hackers Hall  – Tech projects, Science Fair, and User testing
  • Healing Waters in Cambium Pavillion – Conversations, music, tea, and storytelling
  • People-2-People Tent – Exploration of emergent wisdom through play
  • Open Source Library – Storytelling, books and games
  • Redwood Parliament Pavillion – Imagine and co-inspire a governance layer for the DWeb
  • Filecoin Foundation Forest Hang Out – Connect with new friends while lying in hammocks
  • Redwood Cathedral – Wellness, meditation, and conversation 
  • Universal Access Amphitheater – Talks and breakout discussions
  • Be Water Waystation – Art and hands-on programs for children
  • Thunder Salon – Lightning talks

We’re lucky to have an incredible group of people stewarding the programming in each Space, ensuring that the sessions invite collective practice in discussion, imagination, and play. Continue reading below for more detailed descriptions of some of the Spaces, written by the stewards. An online schedule of all the sessions in each Space will become available the week of the event.

Hackers Hall

The Hacker’s Hall is the place for people of technical and non-technical backgrounds to meet each other at all hours of the day and night. We will have Wi-Fi, couches, whiteboards, and tables. It will be the Mesh Network Hub of the Camp. Come to the Science Fair on Thursday, where everyone can try interactive demos of existing decentralization projects and meet the people who are building them. Then on Friday, come to “Dogfooding Decentralization,” a User Testing Lab for DWeb project. Each team will have office hours where you can come deep dive with them.

Come build on and improve projects, test software, be a user tester, meet developers and designers, ask questions, and learn new things about the decentralization all around us! 

The Redwood Cathedral at Camp Navarro, the venue of DWeb Camp 2022

Healing Waters in Cambium Pavilion

Oceans and creeks, rivers and lakes, from the clouds in the sky to the pipes in our homes, water connects us all. This is the focus of Healing Waters at DWeb camp, an Indigenous-led, multi-modal celebration of this precious substance that supports all life on Earth. By the meeting place of the Navarro River and the Pacific Ocean, Healing Waters invites DWeb campers to explore their relationship to water and what it means to be fluid, literally and metaphorically. Our programming navigates the currents leading from Indigenous technologies and storytelling to hyper-modern science and cartography, with ports of call in art, music, policy, poetry, history, and mythology.

Programming Highlights:

  • A conversation led by Haudenosaunee artists Asha Veeraswamy and Amelia Winger-Bearskin about the parallels between open-source technology, decentralization, and the consensus-building practices that led to the formation of the Iroquois Confederacy, and deeply influenced the U.S. Constitution 
  • Data visualization workshop using real water data from the US Geological Survey led by data manager/designer Martha Bearskin 
  • Real-time data-driven VJ session featuring artist/technologist Devin Ronneberg
  • Morning communal singing rituals led by artist and opera singer Amelia Winger-Bearskin
  • Musical performances and night raves in the majestic redwood forest
  • Sound baths (meditative experiences in which the audience is “bathed” in immersive spatialized audio)
  • Martial arts instruction, guiding students to access the deep aquifer of intuition that flows just below the conscious mind

People-2-People Tent

Let’s myceliate!

Let’s root and spread our hyphae through the ground: tree-to-tree, person-to-person, peer-to-peer, and node-to-node.

Let’s relieve networks of the extractive transactional usage and explore in earnest what it’s like to design, form, and experience networks the way fungi do. The way the complex systems of our bodies do. The way humans do when we weave our relational webs. Our webs have connections, overlapping points, tensions, resistances, and anchors.

Let’s weave, let’s twine, let’s interwingle. Let’s use our technologies of language, of frames, of digital media to better see and play with these patterns of relating in real time, in real life, with each other.

Those working on peer-to-peer (P2P) projects are invited to do a Kindergarten Lightning Talk to share  their technologies using crayons and paper and pipe cleaners. We’ll have interactive sessions from different P2P projects like Scuttlebutt, Holochain, and Fluence. There will be a full on battle session (playful, of course) between blockchain folks and fully distributed folks over what the “D” in DWeb stands for. Think arts and crafts and workshops meet P2P technology!

Hammocks at Camp Navarro!

Filecoin Foundation Forest Hang Out

Our Venue Sponsor, Filecoin Foundation, invites you to hang out in the trees and meet Foundation leaders. This is the place to come to chill, meet new friends, and enjoy late night pizza cooked to order in a wood-fired oven on Wednesday and a Silent Disco on Friday. 

Open Source Library

Looking for a place of quiet contemplation? Come to the Open Source Library to peruse some favorite books of your fellow campers. We’ll ask each person to bring a few meaningful books to give away. Authors’ talks and storytelling, game nights and children’s films will all take place in the Library.

Redwood Parliament Pavilion

Imagine an Internet where democracy is at least as available as autocracy.

The decentralized Internet is a complex network of technical and social interdependencies; a mix of protocols and the communities that thrive in and across the network. However, the Internet as it currently exists has been flattened and consolidated to render these socio-technical complexities into top-down, autocratic defaults for social organization. And yet, these interdependencies continue to grow, challenging and proving the current form of the Internet socially unsustainable; calling us instead to develop more collective means and intuitions for how we govern our commons.

Redwood Parliament is a collection of events at DWeb Camp that will address these interdependencies in all of their complexity and practice alternatives to autocracy.

The track will bring together practitioners, researchers, artists, builders, and dreamers to actively imagine and co-inspire a governance layer for the decentralized Internet. Over four days, campers will have the opportunity to participate in a collection of distributed activities, workshops, and discussions designed to give us the conceptual and experiential tools and frameworks that we can take with us to help us do this work.

Together, we will:

  • Explore ways of flexibly composing and experimenting with different decision making structures through workshops and hands on engagement with new digital-native tools;
  • Immerse ourselves in a black-box modular governance Live Action Role Play (LARP);
  • Collectively develop a map of governance practices and protocols existing across the decentralized Internet;
  • Read, annotate, and be guided through various constitutions forming around the decentralized Internet;
  • Design ecological patterns, protocols, and mechanisms, guided by the ethos of the DWeb, to shape and inform the inter-relationship between our physical and economic environments; and
  • Engage in speculative writing and world building exercises focused on imagining approaches to governance past, present, and future;

These activities and happenings will complement and inform a series of meta-level discussions around research that the organizers of the Redwood Parliament have been conducting on this topic of a governance layer for the decentralized Internet.

— 

Redwood Parliament is a joint collaboration between Metagov, the Internet Archive, and RadicalxChange, with support from the Unfinished Network and the National Science Foundation.

Colgate University Libraries Donates to Expanding Government Document Microfiche Collection

Case Library and Geyer Center for Information Technology, Colgate University. Photo credit: Colgate University Office of University Communications.

From 1970 to 2004, Colgate University amassed as many as 1.5 million microfiche cards with documents from the U.S. federal government. 

The small, private liberal arts institution housed the collection in a central location accessible to the former reference service point and the circulation desk in Hamilton, New York. 

“Every single campus tour that goes through the library walks past this collection. Our well meaning student ambassadors would announce ‘Here’s our microfiche that no one uses,’” said Debbie Krahmer, accessible technology & government documents librarian at Colgate. 

Since the popularity of the miniaturized thumbnails of pages waned several years ago, many libraries have struggled with what to do with their microfiche collections, as they contain important information but are difficult to use. 

Krahmer was looking for ways to offload the materials and discovered the Internet Archive would accept microfiche donations for digitization. It was a way to preserve the content, make it easier for the public to access, and avoid putting the microfiche in a landfill.

“These government documents are meant to be available and accessible to the general public. For many there’s still a lot of good information in this collection,” said Courtney L. Young, the university librarian. “While the microfiche has been stored in large metal cabinets on the main level, many of our users do not see them. This project will improve that visibility and accessibility.”

About the donation

In July, the Internet Archive arranged for the twelve cabinets of microfiche, each in excess of 600 pounds, to be loaded onto pallets and shipped to the Internet Archive for preservation and digitization. Materials include Census data, documents from the Department of Education, Congressional testimony, CIA documents, and foreign news translated into English. 

Microfiche cabinets ready for shipping to the Internet Archive for preservation and digitization.

Colgate also gave indexes of the microfiche that will be “game changers” for other government libraries once they are digitized because the volumes are expensive and hard to acquire, Krahmer added. 

Krahmer said the moving process with the Internet Archive was easy and would recommend the option to other librarians.

“This is a lot easier than trying to figure out how to get these materials recycled,” Krahmer said. “In addition to improving discovery and access, this supports the university’s sustainability plan. It’s going to get digitized, be made available online, and preserved. This is win-win no matter how you look at it.”

Public access to government publications

Government documents from microfiche are coming to archive.org based on the combined efforts of the Internet Archive and its Federal Depository Library Program library partners. The Federal Depository Library Program (FDLP), founded in 1813, provides designated libraries with copies of bills, laws, congressional hearings, regulations, and executive and judicial branch documents and reports to share with the public.

Colgate joins Claremont Colleges, Evergreen State College, University of Alberta, University of California San Francisco, and the University of South Carolina that have contributed over 70 million pages on over one million microfiche cards. Other libraries are welcome to join this project.

Web Archiving to the Rescue: One Library’s Quest to Fill an Information Gap

Guest post by: Dana Hamlin, Archivist at Waltham Public Library

This post is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org/

What is an archivist to do when items of public record, which have been systematically added to publicly accessible collections for over a century, suddenly turn from paper into bits and bytes that disappear from the web, or even get stuck behind paywalls? Like many in my profession, I’ve been grappling with this question for a while. Having no real training in digital archiving and facing this quandary as a lone arranger, it’s sometimes hard to keep that grappling from turning into low-key panicking that my inaction has been causing information to be lost forever.

Imagine my excitement, then, when I learned about the Community Webs program – access to and training for Archive-It, collaboration with the Internet Archive, and a network of others like me to bounce ideas off and get inspiration from? Yes please! With the blessing of my boss, I applied right away and my library joined the program in April 2021.

The outside of the Waltham Public Library. Photo by C. Sowa.

(This might be a good point for a quick introduction. I work as the archivist/local history librarian at the Waltham Public Library (WPL) in Waltham, Massachusetts. Waltham is a city about 10 miles west of Boston, and is home to an ethnically and economically diverse population of just over 62,000 people. The WPL is a fully-funded community hub, fostering a healthy democratic society by providing a wealth of current informational, educational, and recreational resources free of charge to all members of the community. The library is known throughout the area for its knowledgeable and friendly staff, welcoming and safe environment, accessibility, convenience, current technology, and helpful assistance.)

I eagerly dove into the program and used our first web-archive collection – Waltham Public Library – as a testing ground, a place to gain familiarity with both Archive-It and the whole process of web archiving. I’ve been trying to capture content that aligns with the material found in the library’s analog records – annual reports, policies, announcements, event flyers, records from our Friends group, etc. – by doing a weekly crawl of the library website, our Friends website, and the library’s Twitter feed. For the most part this collection has been thankfully pretty straightforward.

Our largest collection so far is COVID-19 in Waltham, which makes up a portion of the library’s very first born-digital archival collection. That collection began in April 2020, when the WPL (like most other places) was closed to help “flatten the curve.” A month or two prior, as the pandemic was building steam, I had become fascinated with the 1918 influenza. A poke through our archives for the topic had been disappointing, as there wasn’t too much beyond a couple of newspaper clippings, brief mentions in the library trustees’ minutes, and a few pages in the records of the local nurses’ association. I was hoping to put together a better picture of what it was like to live in Waltham during the flu, perhaps to give myself a glimpse of what I could expect in the coming weeks (heh… how naïve I was).

Scrapbook page showing newspaper clippings from the early days of the 1918 flu. Scrapbook is part of the records of the Waltham Public Library. Photo by D. Hamlin.

I put out a call via the library’s social media for those who lived, worked, and/or went to school in Waltham to share their stories, hoping to build the kind of collection I wanted and failed to find from 1918. There was an initial rush of Google Form submissions, a handful of photos, and one video, and then nothing. I was pleased we had received some materials, but still wanted to paint a broader picture of Waltham under Covid. Enter Community Webs! For the past several months I’ve been working to collect retroactively what I was hoping to capture at the time – news articles, videos, the city website, information from the schools, and so on. While it’s not as comprehensive as it might have been if I’d been able to gather it all as it happened, I’ve been able to save over 500 GB of data that will help those in the future to better imagine what it was like to live in Waltham during Covid.

Screenshot from a WPL Instagram post sharing a patron’s submission to our COVID-19 in Waltham collection.
Screenshot examples of Covid-related content captured retroactively with Archive-It.

Finally, related to the quandary in the first paragraph of this post, our most complicated collection is the Waltham News Tribune. The WPL has microfilm copies of the paper going back to its earliest iteration in the 1860s, and part of my job has been to collect each issue and send yearly batches to a vendor for microfilming. However, as of this past May, the publisher has moved the paper entirely online, with some content requiring a paid subscription to view. The WPL has a subscription so that we can continue to provide free access to our patrons, but what happens to our archive of back issues? Does it just stop abruptly in May 2022, even as time and local news continue to march on? As it is, our microfilm is heavily used, especially since the paper’s offices burned down in 1999, making ours the only existing archive. 

Drawers full of microfilmed newspapers at the WPL. Photo by D. Hamlin.

Thanks to web archiving, we’re able to continue to fulfill our unofficial role as the repository for the city newspaper, at least in theory. In practice, I look at the daily crawls of the digital edition of the paper and can’t help but see that it is no longer the type of local news we’ve been archiving for over a century. The corporate publisher of the paper has consolidated ours with those from several other local cities and towns, and has sacrificed true local news coverage for more generic topics, many of which aren’t even related specifically to Massachusetts. This is a problem that sits well outside of my archives wheelhouse, but at least I feel I can do my due diligence by capturing what local news does trickle through. 

I’ve had a slower go of web archiving than I’d like so far, thanks to several months of parental leave in 2021 and a very packed part-time work schedule. Nevertheless, I’ve been chipping away at our collections and planning for more, with an eye to add more diverse voices than those that make up much of our analog collections. I’m grateful for the encouragement and help I’ve received from Community Webs staff and peers, and want to give a special shout-out to the Archive-It folks who hold office hours to assist us with technical issues! This really is a fantastic program, and I’m so glad my library is part of it.

August Book Talk: Dataraising and Digital Civil Society

Featuring the book How We Give Now by Lucy Bernholz. Published by MIT Press.

What is dataraising and why should nonprofits care? For millennia humans have given time and money to each other and to causes they care about. A few hundred years ago we invented nonprofit organizations and they’ve become a key mechanism in the donation of private resources for public benefit. Now, we can also donate digital data. Organizations such as iNaturalist use donated digital photographs to build communities of nature lovers and inform climate scientists. Other organizations are using donated data to build cultural archives, advocate for fair labor laws, protect consumers, and for medical research.

Watch session recording:

Join Lucy Bernholz, author of How We Give Now, Scott Loarie of iNaturalist, and Dr. Jasmine McNealy from the University of Florida for a discussion of the promises and perils of donating digital data and the implications for individuals, communities, and civil society.

Purchase your copy of How We Give Now from MIT Press.

August Book Talk: Dataraising and Digital Civil Society
Featuring Lucy Bernholz, author of How We Give Now, Scott Loarie of iNaturalist, and Dr. Jasmine McNealy from the University of Florida
August 10, 2022 @ 11am PT
Watch the session recording.