Tag Archives: EmpoweringResearch

Permanent Residents: A Research Guest Post

This post is part of our ongoing series highlighting how our patrons and partners use the Internet Archive to further their own research and programs.

From Patricia Rose, in her own words:

Tour guide Patricia Rose

In 2019, after retiring from an administrative career at the University of Pennsylvania, I signed up to be a tour guide at Philadelphia’s historic Laurel Hill Cemetery (now Laurel Hill East), the first American cemetery to be named a National Historic Landmark.  With more than 75,000 “permanent residents”, there are lots of opportunities to tour stopping at the graves of fascinating men and women, most from the nineteenth and first half of the twentieth century, although there are still some new burials.  It was so much fun I started leading tours at their larger sister cemetery, Laurel Hill West, itself listed on the National Registry of Historic Places, and with permanent residents mostly from the twentieth century to the modern day.

In 2020, COVID made fresh-air cemetery tours quite popular, and I led specialized tours on spiritualism, and on gay and lesbian residents called “Out of the Closet and into the Crypt.”  

Sara Yorke Stevenson

Among the stops on some of my tours was the grave of Sara Yorke Stevenson (1847 – 1921).  She was an Egyptologist, a museum curator, co-founder and leader, author, journalist and fighter for women’s suffrage.  She led a full and eventful life, born in Paris, and ending after her successful efforts to bring medical help to France during World War I, raising the equivalent of $36 million in today’s dollars. 

As part of the cemetery’s educational programming, my fellow tour guide Joe Lex (retired Professor of Emergency Medicine) created a wonderful podcast, All Bones Considered, focusing on both Laurel Hill East and West, and I jumped at the chance to present Stevenson on the podcast.

There is a wealth of information on Stevenson.  As a co-founder, curator, and board chair at the University of Pennsylvania Museum of Archaeology and Anthropology (the Penn Museum), Sara appears in numerous histories of the museum, and in volumes on the beginnings of archaeology in this country.  Luckily, in 2006, Sara’s private papers were discovered in the attic of a Philadelphia home that was being cleaned out for sale.  Those papers are now housed in the Special Collections of the LaSalle University Library, and in the Archives of the Penn Museum.  These I visited and enjoyed reading letters Sara received, a few materials she wrote, and relevant newspaper clippings she saved.

Title page from Maximilian in Mexico (1899) by Sara Yorke Stevenson

But I was still anxious to read Sara’s published writing, but who knew about the wealth of these materials at the Internet Archive?  Her book, Maximilian in Mexico: A Women’s Reminiscences of the French Intervention, 1862-1867, is in multiple copies.  Also her monograph, On Certain Symbols Used in the Decoration of Some Potsherds from Daphnae and Naukratis Now in the Museum of the University of Pennsylvania and various papers Stevenson delivered to the Oriental Club of Philadelphia, such as “The Feather and the Wing in Early Mythology,” and “Early Forms of Religious Symbolism, the Stone Axe and Flying Sun Disc.”

Fortunately, also in the Internet Archive I found relevant issues of the Bulletin of the Pennsylvania Museum from the early days of the twentieth century. (The Pennsylvania Museum became the Philadelphia Museum of Art, and its School of Industrial Art became Philadelphia’s University of the Arts.)  Sara served as a curator at the Philadelphia Museum, and also as the acting director. In the April 1908 edition of the Bulletin, the following appears:

“It is proposed to establish at the School of Industrial Art of the Pennsylvania Museum…a course in the training of curators for art, archaeological and industrial museums, under the supervision of Mrs. Cornelius Stevenson, ScD.”  

Bulletin of the Pennsylvania Museum, Number 22, April 1908.

Museums were being founded throughout the country, and there was a need for trained curators. The next issue of the Bulletin details the twelve lectures in Stevenson’s course.  She begins with The History of Museums, followed by the Modern Museum.  She covers the Museum Building, with attention to light, heat, water, workshops, repair shops and store rooms.  She addresses the Art of Collecting.  In addition to lecturing, she took her students to every museum in the city, met with directors and curators, critiqued exhibits and identified problems of preservation and conservation.  This was the first course in museum studies and curatorship offered in the United States, and luckily I could read all about it on the Internet Archive.

Finally, on the Archive I found John W. Jordan’s 1911 volume, Colonial Families of Philadelphia, which contains invaluable genealogical information on the families of Stevenson and her husband (and many others).

The Internet Archive’s Sara Yorke Stevenson collection was invaluable to me as I prepared my blog post. Going forward, I will turn to the Archive whenever I do research for my cemetery tours.  Thank you to all who have created this marvelous resource.

Should you wish to learn more about Laurel Hill East and West, please visit https://laurelhillphl.com/.  My podcast is part of episode #48, Shattering Some Glass Ceilings, on All Bones Considered, which is available at https://www.podbean.com/pu/pbblog-kty8f-780f6a, on Apple Podcast, or wherever you get your podcasts.  

Patricia Rose 
Philadelphia, PA

The Power of Preservation: How the Internet Archive Empowers Digital Investigations and Research

A part of a series: The Internet Archive as Research Library

Written by Caralee Adams

When gathering evidence for a court case or researching human rights violations, Lili Siri Spira often found that the material she needed was preserved by the Internet Archive.

Spira is the Social Media and Campaign Marketing Manager for TechEquity Collaborative, as well as the co-manager of RatedResilient.com, a platform that promotes psycho-social resilience for digital activists. She has interned at the Center for Justice & Accountability and was an open-source investigator at the Human Rights Center at UC Berkeley during college.

In Spira’s work, the Wayback Machine has played an integral role in providing stamped artifacts and metadata.

For example, when researching the Bolivian coup in 2019, she wanted to learn more about the sentiment of indigenous people toward political leadership. Spira used the Wayback Machine to examine how indigenous Bolivian websites had changed since 2009. She discovered after initial criticism, some websites seemed to have disappeared.

“The great thing about the Internet Archive is that it really protects the chain of custody,” Spira said. “It’s not only that you look back, but you can even find a website now and capture it in time with the metadata.”

In 2020, The Berkeley Protocol on Digital Open Source Violations provided global guidelines for using public digital information as evidence in international criminal and human rights investigations. Spira said this allows preserved website data to be used in court proceedings to hold parties accountable.

On other occasions, Spira has investigated companies suspected of unethical practices. Sometimes executives openly admitted to certain behaviors, only to later deny their action. Companies may attempt to erase past communication, but Spira said she can uncover the previous versions of websites through the Wayback Machine.

“Our knowledge is not being held sacred by many people in this country and around the world,” Spira said. “It’s incredibly important for research work in any field to have access to preserved [digital] information—especially when that research is making certain allegations against powerful entities and corporations.”

We thank Lili and her colleagues for sharing their story for how they use the Internet Archive’s collections in their work.

Unveiling the Hidden Truth: UCSF Industry Documents Library Empowers Research Into Tobacco, Drug and Related Industries

Whether you are a teacher, filmmaker, journalist, scientist or historian, having access to recordings about the tobacco, drug and other industries can be invaluable.

Still frames from a Marlboro commercial compilation.

For more than fifteen years, archivists at the University of California, San Francisco (UCSF) Industry Documents Library (IDL) have curated a collection of more than 5,000 video and audio files documenting the marketing, manufacturing, sales, and scientific research of tobacco, chemical, drug, and food products, as well as materials produced by public health advocates. As of 2023, the collection has received more than 300,000 views.

This wealth of information is available to the public through the UCSF Industry Archives Videos on the Internet Archive. The recordings include commercials, focus groups, internal corporate meetings and communications, depositions of tobacco industry employees, and government hearings.

Most of the files were made public beginning in 1998, following a lawsuit involving 46 states against tobacco manufacturers. In the settlement, the court ordered the companies to restrict advertising and release internal documents. “The industry put out misinformation for years to hold off on regulations,” said Rachel Taketa, IDL processing and reference archivist at UCSF. Having access to these materials provides new insight into marketing strategies that can help the public be on the lookout for future industry activities.

“It provides transparency and accountability,” said Kate Tasker, IDL managing archivist at UCSF. Examples from the collection are marketing campaigns and materials that targeted marginalized groups, in particular women and the African American and LGBTQ+ communities. “We talk to community advocacy organizations that often say it is powerful to show these videos to a group where it lays out clearly what the industry was doing to their community. It empowers people and inspires them to take action.”

Senate hearings in regards to S1883 The Tobacco Education Control Act of 1990.

UCSF archivists say the partnership with the Internet Archive provides users with two different access points and expands the audience for the collection beyond academics.  The Medical Heritage Library  has also added videos and audio files from UCSF into its larger collection on the Internet Archive, spreading the materials’ reach even further.

Next, the UCSF archivists are looking to develop new ways of working with and accessing the collection, using automated transcription to enable data scientists to analyze the recordings in new ways. The IDL is also adding opioid industry recordings to the collection as part of its work on the Opioid Industry Documents Archive, a collaboration with Johns Hopkins University. These new recordings will enable the public to learn more about the circumstances leading to the opioid crisis.

“It’s exciting to be connected to such an innovative organization as the Internet Archive,” Tasker said. “It’s out in front of a lot of big issues that most digital archives are facing. Whenever we’re looking to do something with a new media type, format, or a new way of distributing content to people, archivists and librarians look to what the Internet Archive is doing as a guide.”

CRASH! BARK! BOOM! The USC Sound Effects Library

For a simple overview of the collection being presented, read Craig Smith’s original blog entry over at the Freesound site.

While there are plenty of items at the Internet Archive that have no obvious home elsewhere online, there are also cases where we hold a copy of a frequently-available set of material, but we can provide it for much easier distribution and preview, including the ability to download the entire original set of files in one fell swoop.

Such it is with the USC SOUND EFFECTS LIBRARY, a collection of .WAV files taken from rapidly crumbling magnetic tape and presented for reference, enjoyment and even projects.


The world of sound effects is two-fold interesting:

There’s the interesting way we use recorded sound, cut together from various sources and even spliced from organic and generated sources, to provide the audio soundtrack for visual experiences in a way the audience thinks sounds “natural”.

And there’s the actual process of sound effects, of engineers going into the field or into a studio and generating sound after speculative sound, trying to find just the right combination of noise and speech to create just what they might need in the future.

As long as there has been performance on the Radio and to mediums beyond, the generating of sound effects live and recorded is a fascinating skill, shared among many different people, and is rightly considered an awards-worthy occupation. While not everyone is fascinated at this sort of work, many people are, and there’s a childlike delight in going through a “sound library” of effects and noises, getting ideas of how they might be used later.

As explained in a blog entry written by Craig Smith, a variety of tapes called the “Red” and “Gold” libraries of recorded sound effects were joined by a third set from a sound company called Sunset Editorial, who worked on hundreds of films over the years.


This collection has now been mirrored at the Internet Archive.

In the USC Optical Effects Library are over 1,000 digitized tapes of sound effects, including not just the sounds themselves but the voices of many different engineers bracketing them with explanations, cajoling and call-outs while they’re being made. We hear not just a dog panting, but an engineer talking to the dog that they’re doing a good job. Some recordings clearly have a crew sitting around while recordings are being made, and they hush with the sound of professionals knowing they can’t just edit the noise out if they talk over it.

There are machines: Planes, Cars and Weapons. There are explosions, fire and footsteps. There’s effects just called SCIFI or MAGIC, where the shared culture of Hollywood’s take on what things “sounded like” makes itself known.

The pleasant stroll of “just playing” the effects in our browser-based player belies the fact that at one time, this was magnetic reels, sliced with razors and joined with tape, used to remix and reconstitute environments of sound for entertainment. The push to digital allows for much more experimentation and mixing without generational loss and huge amounts of precious time, but in these versions we can hear how much work went into the foundational soundscape of entertainment in the 20th century.


Craig Smith, who made this collection available, goes into great detail in his blog entry about how fragile these tapes had become before being transferred, and how some were lost along the way. Folks unfamiliar with “Sticky Shed Syndrome” and the process of “baking tapes” will be surprised to know how quickly and dramatically tapes can fall apart after a passage of time. With large efforts by a number of people, the amount that was saved is now available at the Archive.

There is extensive metadata in each item, captured as spreadsheets and documents about the assumed sources or credits of the sound. They’re important to bring along with these noises if a patron wants to maintain a local copy.

Speaking of which.

In this collection is a massive compilation of all the data related to the project. It’s located in an item called “Sound Effect Libraries (Red, Gold, Sunset Editorial)”. Patrons whose immediate urge is to grab their own private set of the data to keep “safe” will want to go to this item, using either the direct download of the three .ZIP files inside, or to click on the TORRENT link to download the 20+ gigabytes of files. Depending on your bandwidth, it will take some time to download, but you can be assured that you got “all” the data from this amazing collection. This, in some ways, is the Internet Archive’s greatest strength – direct access to the original files for others to have, instead of adding a layer of processing and change as the presentation mediums of the day require modification for “ease”.

Enjoy the universe of sounds in this collection!

And as one final note – if your immediate thought when you hear the term “sound effects” is to request or wonder about the legendary “Wilhelm”, we’ve got you covered: The recording session is right here.

Internet Archive weighs in on Artificial Intelligence at the Copyright Office

All too often, the formulation of copyright policy in the United States has been dominated by incumbent copyright industries. As Professor Jessica Litman explained in a recent Internet Archive book talk, copyright laws in the 20th century were largely “worked out by the industries that were the beneficiaries of copyright” to favor their economic interests. In these circumstances, Professor Litman has written, the Copyright Office “plays a crucial role in managing the multilateral negotiations and interpreting their results to Congress.” And at various times in history, the Office has had the opportunity to use this role to add balance to the policymaking process.

We at the Internet Archive are always pleased to see the Copyright Office invite a broad range of voices to discussions of copyright policy and to participate in such discussions ourselves. We did just that earlier this month, participating in a session at the United States Copyright Office on Copyright and Artificial Intelligence. This was the first in a series of sessions the Office will be hosting throughout the first half of 2023, as it works through its “initiative to examine the copyright law and policy issues raised by artificial intelligence (AI) technology.”

As we explained at the event, innovative machine learning and artificial intelligence technology is already helping us build our library. For example, our process for digitizing texts–including never-before-digitized government documents–has been significantly improved by the introduction of LSTM technology. And state-of-the-art AI tools have helped us improve our collection of 100 year-old 78 rpm records. Policymakers dazzled by the latest developments in consumer-facing AI should not forget that there are other uses of this general purpose technology–many of them outside the commercial context of traditional copyright industries–which nevertheless serve the purpose of copyright: “to increase and not to impede the harvest of knowledge.” 

Traditional copyright policymaking also frequently excludes or overlooks the world of open licensing. But in this new space, many of the tools come from the open source community, and much of the data comes from openly-licensed sources like Wikipedia or Flickr Commons. Industry groups that claim to represent the voice of authors typically do not represent such creators, and their proposed solutions–usually, demands that payment be made to corporate publishers or to collective rights management organizations–often don’t benefit, and are inconsistent with, the thinking of the open world

Moreover, even aside from openly licensed material, there are vast troves of technically copyrighted but not actively rights-managed content on the open web; these are also used to train AI models. Millions, if not billions, of individuals have contributed to these data sources, and because none of them are required to register their work for copyright to arise, it does not seem possible or sensible to try to identify all of the relevant copyright owners–let alone negotiate with each of them–before development can continue. Recognizing these and a variety of other concerns, the European Union has already codified copyright exceptions which permit the use of copyright-protected material as training data for generative AI models, subject to an opt-out in commercial situations and potential new transparency obligations

To be sure, there are legitimate concerns over how generative AI could impact creative workers and cause other kinds of harm. But it is important for copyright policymakers to recognize that artificial intelligence technology has the potential to promote the progress of science and the useful arts on a tremendous scale. It is both sensible and lawful as a matter of US copyright law to let the robots read. Let’s make sure that the process described by Professor Litman does not get in the way of building AI tools that work for everyone.