This October, we are publishing The Vanishing Culture Report, a new open access report examining the power and importance of preservation in our digital age.
As more content is created digitally and provided to individuals and memory institutions through temporary licensing deals rather than ownership, materials such as sound recordings, books, television shows, and films are at constant risk of being removed from streaming platforms. This means they are vanishing from our culture without ever being archived or preserved by libraries.
But the threat of vanishing is not exclusive to digital content. As time marches on, analog materials on obsolete formats—VHS tapes, 78rpm recordings, floppy disks—are deteriorating and require urgent attention to ensure their survival. Without proper archiving, digitization, and access, the cultural artifacts stored in these formats are in danger of being lost forever.
By highlighting the importance of ownership and preservation in the digital age, The Vanishing Culture Report aims to inform individuals, institutions, and policymakers about the breadth and scale of cultural loss thus far, and inspire them to take proactive steps in ensuring that our cultural record remains accessible for future generations.
Share Your Story!
As part of the Vanishing Culture report, we’d like to hear from you. We invite you to share your stories about why preservation is important for the media you use on our site. Whether it’s a website crawl in the Wayback Machine, a rare book that shaped your perspective, a vintage film that captured your imagination, or a collection that you revisit often, we want to know why preserving these items is important to you. Share your story now!
Hundreds of people from all over the world gathered together on January 25 to honor the thousands of movies, plays, books, poems and songs that recently entered the U.S. public domain.
Steamboat Willie, Walt Disney’s 1928 animated film featuring Mickey Mouse, had top billing at the virtual event. Literature now free from restriction for reuse includes Orlando by Virginia Woolf and Tarzan Lord of the Jungle by Edgar R. Burroughs. Sound recordings from 1923 (released on a different schedule) joined the public domain such as ”Down Hearted Blues” by Bessie Smith and ”Who’s Sorry Now” by Isham Jones Orchestra.
WATCH RECORDING:
“There’s so much to rediscover and to celebrate,” said Jennifer Jenkins, director of the Center for the Study of the Public Domain at Duke Law School. For example, the release of The Great Gatsby into the public domain in 2021 inspired a creative flurry — new versions of the novel from the perspective of different characters, a prequel telling the backstory of Nick Caraway, a young adult remix, and song. “From the serious to the creative, to the whimsical to the wacky, these are all the great things we can do…now that [these works] are in the public domain and free to copy, to share, to digitize and to build upon without permission or fee.”
The winning film from the Public Domain Day 2024 Remix Contest was shown as well: “Sick on New Year’s,” by Ty Cummings. Every year since 2021, this contest has invited artists to remix works from its collection to showcase new and creative uses of public domain materials. Fifty films were submitted to this year’s competition, according to Amir Esfahania, artist in residence at the Archive. Learn more about the finalists or watch all the submissions in our recent blog post.
Advocacy
“Celebrating the public domain is not just about vintage references and period-appropriate clothing. It’s about understanding history to inform the present day,” said Lila Bailey, Internet Archive senior policy counsel and co-host of the virtual festivities. “We think there should be time set aside every year to celebrate the immense riches that free and open culture provides to everyone.”
While federal holiday recognition (like MLK Day or Presidents’ Day) for the public domain is unlikely, there was a discussion of an advocacy campaign for establishment of a commemorative Public Domain Day (more along the lines of National Data Privacy Day or National Whistleblowers Day).
“It only requires a simple resolution in the Senate with high chances of recognition,” said Amanda Levendowski, director of Georgetown Law School’s Intellectual Property and Information Policy Clinic. “Prospects for passage are way better than possible. About 80 percent of proposals are passed — and maybe next year, Public Domain Day will be among them.”
Experts said a successful drive for the designation will require a collaborative effort. A kickoff event will be held February 29 in New York City, hosted by Library Futures, executive director Jennie-Rose Halperin announced.
AI and the Public Domain
The online program also featured a panel discussion on generative artificial intelligence, copyright and artist expression. Experts weighed in on just what should be the copyright status of the outputs of generative AI.
Now, AI tools can turn text or simple descriptions into images that are genuinely new and often look like exactly the kind of things that people get copyrighted if a human made them, explained Matthew Sag, professor of law, artificial intelligence, machine learning, and data science at Emory University.
“The copyright office is quite clear that to get copyright, you have to have human authorship. So something created entirely by an unsupervised machine is not eligible for copyright,” Sag said, noting that the courts have recently agreed. “The interesting question is what about when humans are using AI as a tool and directing the output. This is where the controversy really is.”
On the panel, two artists, Heather Timm and Maxximillian, shared how they both leverage AI in the creative process.
Timm said she started using generative AI in 2021 and thinks the copyright office should cover works that have results from it. She has trained AI models on her own physical work and then created something new collaborating with the machine, as well as conceptualized how to blend different pieces of work in a collage or sculpture.
“I use it almost as a notebook,” Timm said. “If I have a concept or an idea about something on the go, I can immediately prompt that and have it as a placeholder to explore it later.”
As a filmmaker and musician, Maxximillian said she feels passionate about AI and it has saved her time creating animated characters and helping refine her text. “As a professional artist, I rely on copyright to keep viable the works that I produce for clients legally,” said Maxximillian. “It’s important to understand that copyright protection enables the creator to be a steward of that work. The question to consider: Who benefits by denying copyright on AI? I think nobody benefits.”
An open access publisher, Juliana Castro Varón, design director and founder of Cita Press, also addressed the issue. “I believe that AI may pose economic, power, and labor challenges, but I feel very confident that creativity will survive technology,” she said. All books Cita produces are in the public domain for everyone to download. “We are not at all against people using AI for their work, but we continue to hire humans…elevating the work of people is core to our mission.”
***
The event was co-hosted by Internet Archive and Library Futures with support from Creative Commons, Authors Alliance, Public Knowledge, SPARC and Duke Law’s Center for the Study of the Public Domain.
After sifting through a sea of talent and creativity, we are thrilled to present the cinematic achievements of three winners and two honorable mentions in our Public Domain Day 2024 Remix Contest. These winning entries not only captivated our imaginations, but also showcased the immense power of remixing, reimagining, and breathing new life into public domain works.
View the winning entries & honorable mentions below. Rick Prelinger, noted film archivist, helped judge the competition and offers why each film was selected for recognition.
Found-footage filmmaking is all about taking material that might have almost-sacred status and, well, bringing it back down to earth. We find this film worthy of our first prize because of its irreverent humor and skilled editing, its playful predictions of the future, and because it points to the limitless opportunities that a constantly-refreshed public domain offers makers in all media.
Second Place: “Keaton and Kaufman: The Cameramen” by Max Teeth
This film brings together two characters who will be familiar to people who love films, characters that lived and worked very far away from one another and did deeply different work, but might perhaps have more in common with one another than we might think. We see it as a poetic piece, a loving tribute to some of the people who put the motion in motion pictures.
Third Place: “Just Like a Hollywood Star” by Timothy Johnson
Our 3rd prize winner is a rich montage of sound and picture, focusing on images that model beauty, fitness, posture, proper behavior, and the laws of physics. We like this film’s uninhibited reach and its draw from wildly disparate material, often pretty predictable, to produce an unpredictable result.
Honorable Mention, Historical Perspective: “A Member of the Family” by Lizzy Tolentino
Combining government-produced films, family home movies and an unusual sponsored film by a world-famous company, this filmmaker makes a chilling statement about the gap between the promise of our society and the reality of 20th-century history. The public domain is a record of both proud achievements and disturbing histories, and we feel this film exemplified the potential of the public domain to reveal histories that some might prefer to be kept silent.
Honorable Mention, Quirkiest Film: “Domain” by Cullen J. Sanchez
Sometimes you just have to recognize the unusual. But this unusual film makes a critical point about the public domain — that WE are the public domain, and the public domain is us. Take it away! “It’s us. It’s all of us.”
This year we are welcoming many works from 1928 into the U.S. public domain (books, movies, images, etc.), as well as recorded sound from 1923.
Some of the big events from 1928 include the first machine sliced and wrapped loaf of bread being sold, the fatal Okeechobee hurricane, the failure of the St. Francis Dam in Los Angeles, the discovery of a moldy petri dish that would lead to the creation of penicillin, Amelia Earheart flying across the Atlantic, and a certain mouse making his public debut.
Movies
Everybody’s talking about Mickey. On November 18th, 1928 Steamboat Willie was published, the third Mickey Mouse film by Walt Disney and the first one to be published with sound. The prior two Mickey Mouse films, including Plane Crazy, had not been picked up for distribution so this was the public’s first introduction to the mouse. Steamboat Willie may have been named after another popular movie that came out in 1928, Buster Keaton’s Steamboat Bill, Jr., or perhaps the Vaudeville song, “Steamboat Bill” (popularized in 1910) which was included in the soundtrack (along with the 19th century song “Turkey in the Straw”).
The Jazz Age was really swinging, and 1923 saw the first recordings by King Oliver’s Jazz Band, including early work from Louis Armstrong on Dipper Mouth Blues. The first recorded example of jazz band boogie-woogie also came out that year, The Fives by Tampa Blue Jazz Band. And dancing the Charleston became a craze in 1923, thanks to Charleston from the 1923 musical “Runnin’ Wild.”
While the entrance to Tutankhamun’s tomb was found in 1922, it wasn’t until February of 1923 that the tomb was unsealed and of course the event was memorialized in song, including Old King Tut by Billy Jones and Ernest Hare, and Tut-Ankh-Amen (In the Valley of the Kings) by S. S. Leviathan Orchestra.
Some popular songs from 1923 that are have joined the public domain include:
Whether you are a teacher, filmmaker, journalist, scientist or historian, having access to recordings about the tobacco, drug and other industries can be invaluable.
For more than fifteen years, archivists at the University of California, San Francisco (UCSF) Industry Documents Library (IDL) have curated a collection of more than 5,000 video and audio files documenting the marketing, manufacturing, sales, and scientific research of tobacco, chemical, drug, and food products, as well as materials produced by public health advocates. As of 2023, the collection has received more than 300,000 views.
This wealth of information is available to the public through the UCSF Industry Archives Videos on the Internet Archive. The recordings include commercials, focus groups, internal corporate meetings and communications, depositions of tobacco industry employees, and government hearings.
Most of the files were made public beginning in 1998, following a lawsuit involving 46 states against tobacco manufacturers. In the settlement, the court ordered the companies to restrict advertising and release internal documents. “The industry put out misinformation for years to hold off on regulations,” said Rachel Taketa, IDL processing and reference archivist at UCSF. Having access to these materials provides new insight into marketing strategies that can help the public be on the lookout for future industry activities.
“It provides transparency and accountability,” said Kate Tasker, IDL managing archivist at UCSF. Examples from the collection are marketing campaigns and materials that targeted marginalized groups, in particular women and the African American and LGBTQ+ communities. “We talk to community advocacy organizations that often say it is powerful to show these videos to a group where it lays out clearly what the industry was doing to their community. It empowers people and inspires them to take action.”
UCSF archivists say the partnership with the Internet Archive provides users with two different access points and expands the audience for the collection beyond academics. The Medical Heritage Library has also added videos and audio files from UCSF into its larger collection on the Internet Archive, spreading the materials’ reach even further.
Next, the UCSF archivists are looking to develop new ways of working with and accessing the collection, using automated transcription to enable data scientists to analyze the recordings in new ways. The IDL is also adding opioid industry recordings to the collection as part of its work on the Opioid Industry Documents Archive, a collaboration with Johns Hopkins University. These new recordings will enable the public to learn more about the circumstances leading to the opioid crisis.
“It’s exciting to be connected to such an innovative organization as the Internet Archive,” Tasker said. “It’s out in front of a lot of big issues that most digital archives are facing. Whenever we’re looking to do something with a new media type, format, or a new way of distributing content to people, archivists and librarians look to what the Internet Archive is doing as a guide.”
In celebration of National Library Week, we’d like to introduce you to some of the professional librarians who work at the Internet Archive and in projects closely associated with our programs. Over the next two weeks, you’ll hear from librarians and other information professionals who are using their education and training in library science and related fields to support the Internet Archive’s patrons.
What draws librarians to work at the Internet Archive? From patron services to collection management to web archiving, the answers are as varied as the departments in which these professionals work. But a common theme emerges from the profiles—that of professionals wanting to use their skills and knowledge in support of the Internet Archive’s mission: “Universal Access to All Knowledge.”
We hope that over these next two weeks you’ll learn something about the librarians working behind the scenes at the Internet Archive, and you’ll come to appreciate the training and dedication that influence their daily work. We’re pleased to help you “Meet the Librarians” during this National Library Week and beyond:
Jessamyn West, accessibility – Vermont Mutual Aid Society
Jason Scott, free-range archivist, reporting in as 2017 draws to a close.
As part of our end-of-year fundraising drive, I thought it might be fun to tweet highlighted parts of the vast stacks of content that the Internet Archive makes available for free to millions. A lot of folks know about our Wayback Machine and its 20+ years of website history, but there’s petabytes of media and works available to see throughout the site. I called it “30 Days of Stuff”, and for the last 30 days I’ve been pointing out great items at the Archive, once a day.
You won’t have to swim upstream through my tweets; here on the last day, I’ve compiled the highlighted items in this entry. Enjoy these jewels in the Archive’s collection, a small sample of the wide range of items we provide.
Books and Texts
The Latch Key of my Bookhouse was one of the first books scanned by the Internet Archive in its book scanner tests, and it’s a 1921 directory of Children’s Literature that is filled with really nice illustrations that came out great.
Also in the DTIC collection is The Battalion Commander’s Handbook 1980, which besides the crazy front page of stamps, approvals and sign-offs, is basically a manager’s handbook written from the point of view of the US Army.
There are hundreds of tractor manuals at the Archive. Hundreds! Of all types, languages (a lot of them Russian) and level of information. Tractors are one of those tools that can last generations and keeping the maintenance on them in the field can make a huge difference in livelihood.
A lovely 1904 catalog for plums called The Maynard Plum Catalogue was scanned in with one of our partner organizations and it’s a breathless and inspiring declaration of the future wonder of the plums this wizard of plum-growing, Luther Burbank, was bringing to the world.
Xerox Corporation released “A Metamorphosis of Creative Copying” in 1964, which seems to function as both promotion for Xerox and a weird gift to give to your kids to color in.
In 2014, a short zine called The Tao of Bitcoin was released, telling people the dream of $10,000 bitcoin would be real.
The 1888 chapbook Goody Two-Shoes has lovely illustrations, and a fine short story.
Inside that directory was an ad for a school of whistling that said it taught using the methods of Agnes Woodward, and a quick scan of the Archive’s stacks showed that we had an entire copy of her book Whistling as an Art!
The medical treatise Sleep and Its Derangements, from 1869, is William A. Hammond, MD’s overview of sleep, and what can go wrong. Scanned from the Francis A. Countway Library of Medicine, it’s one of many thousands of books we’ve scanned with partners.
Let Hartman Feather Your Nest could be described as “A furniture catalog” in the same way the Sistine Chapel could be described as “a place of worship”. The catalog is a thundering, fist-pounding declaration of the superiority of the Hartman enterprise and the quality and breadth of furniture and service that will arrive at your door and be backed up to the far reaches of time.
Magazines
Photoplay considered itself the magazine for the motion picture industry in the first part of the 20th century, and this multi-volume compilation of photos, articles and advertisements is a truly lovely overview.
There’s over 140 issues of the classic Maximum RockNRoll zine, truly the king of music zines for a very long time. On its newsprint pages are howls and screeches of all manner of punk, rock and the needs of musicians.
A magazine created by the Walt Disney Company to trumpet various parts of Disneyland and its attractions was called Vacationland, and this Fall 1965 issue covers all sorts of stuff about the park’s first decade.
Movies
Rescued from a warehouse years ago, a collection of Hollywood movie “B-Roll”, unused secondary scenes often filmed by different crew, has been digitized. My personal favorite is [Western Film Scenes], which is circa 1950s footage of a Western Town, all of it utterly fake but feeling weirdly real, to be used in a western. Don’t miss everyone standing around looking right at you and looking like they agree quite energetically with you!
No compilation could be complete without the legendary Duck and Cover, a cartoon/PSA that explained the simple ways to avoid injury in a nuclear blast. Just lie down! It’ll be fine. Please note: This Probably Won’t Work. But the song is very catchy.
The very weird Electric Film Format Acid Test from 1990 has a semi-interested model holding up a color bar plate in a wide, wide variety of film and video formats. Filmed just a few blocks away from the Internet Archive’s current headquarters.
I snuck in a 1992 interview with the Archive’s founder, Brewster Kahle, back when he was 33 and working at WAIS, a company or two before the Archive and where he is asked about his thoughts on information and gathering of data. It’s quite interesting to hear the consistency of thought.
The Office of War Information worked with Disney to create “Dental Health“, a film to show to troops about proper dental care. It’s a combination of straightforward animation and industrial film-making worth enjoying.
Audio
We have a collection of hours of the radio show The Shadow from 1938-1939, starring Orson Welles at 23, at the height of his performance powers, playing the dual main role.
For Christmas Eve, we pointed to “Christmas Chopsticks”, a 1953 78rpm record of “Twas the Night Before Christmas” performed to the tune of the classic piano piece “Chopsticks”; one of tens of thousands of 78rpm records the Archive has been adding this year.
On Christmas, a user of the Archive uploaded two obscure albums he’d purchased on eBay – remnants of the S. S. Kresge Company, which became K-Mart, and which were played over the PA system for shoppers. He got his hands on Albums #261 and #294.
Earlier in the month before the user uploaded those Christmas albums, I linked to a different holiday collection of K-Mart items, a 1974 Reel-to-Reel that started with a K-Mart jingle and went full holiday from there.
Before he was a (retired) talk show host, and before he was a stand-up comedian, David Letterman worked and trained in radio. Happily, we have recordings of Dave Letterman, DJ, from when he was 22, at Ball State University.
Ron “Boogiemonster” Gerber has been hosting his weekly pop music recycling radio show, “Crap from the Past”, for over 25 years, and he’s been uploading and cataloging his show to the Archive for well over 10 of those years, including all the way back to the beginning of his show. The full Crap From The Past archive is up and is hundreds of hours of fun.
The truly weird “Conquer the Video Craze” is a 1982 record album with straightforward descriptions of how to beat games like Centipede, Defender, Stargate, Dig Dug, and more. This album has been sampled from by multiple DJs to bring that extra spice to a track.
Over 3,000 shows at the DNA Lounge are at the archive, including “Bootie: Gamer Night“, which combines mash-up tracks and video games. Bootie has been playing at DNA Lounge for years, and puts the audio from one song with the singing from another, and… it’s quite addicting, like games. This night was for the nearby Game Developers’ Conference being held the same week.
Software
In 2011, as part of a “retrocomputing” competition, we saw the release of “Paku-Paku”, a pac-clone program which ran in an obscure early PC-Compatible graphics mode that was very colorful and very small (160×100) and was built perfectly for it. You can play the game in your browser by clicking here.
Psion Chess is a game for the Macintosh that can play both you and itself with pretty high levels of skill and really sharp and crisp black and white graphics. It makes a really great screensaver in self-playing mode.
People often overuse a phrase like “Barely scratched the surface”, but I assure you there are millions of amazing items in the archive, and it’s been a pleasure to bring some to light. While the 30 Days of Stuff was a fun way to stretch out a month of fundraising with stuff to see every day, we’re here 24/7 to bring you all these items, and welcome you finding jewels, gems and clunkers throughout our hard drives whenever you want.
Every month, we look over the total download counts for all public items at archive.org. We sum item counts into their collections. At year end 2014, we found various source reliability issues, as well as overcounting for “top collections” and many other issues.
archive.org public items tracked over time
To address the problems we did:
Rebuilt a new system to use our database (DB) for item download counts, instead of our less reliable (and more prone to “drift”) SOLR search engine (SE).
Changed monthly saved data from JSON and PHP serialized flatfiles to new DB table — much easier to use now!
Fixed overcounting issues for collections: texts, audio, etree, movies
Fixed various overcounting issues related to not unique-ing <collection> and <contributor> tags (more below)
Fixes to character encoding issues on <contributor> tags
Bonus points!
We now track *all collections*. Previously, we only tracked items tagged:
<mediatype> texts
<mediatype> etree
<mediatype> audio
<mediatype> movies
For items we are tracking <contributor> tags (texts items), we now have a “Contributor page” that shows a table of historical data.
Graphs are now “responsive” (scale in width based on browser/mobile width)
The Overcount Issue for top collection/mediatypes
In the below graph, mediatypes and collections are shown horizontally, with a sample “collection hierarchy” today.
For each collection/mediatype, we show 1 example item, A B C and D, with a downloads/streams/views count next to it parenthetically. So these are four items, spanning four collections, that happen to be in a collection hierarchy (a single item can belong to multiple collections at archive.org)
The Old Way had a critical flaw — it summed all sub-collection counts — when really it should have just summed all *direct child* sub-collection counts (or gone with our New Way instead)
So we now treat <mediatype> tags like <collection> tags, in terms of counting, and unique all <collection> tags to avoid items w/ minor nonideal data tags and another kind of overcounting.
… and one more update from Feb/1:
We graph the “difference” between absolute downloads counts for the current month minus the prior month, for each month we have data for. This gives us graphs that show downloads/month over time. However, values can easily go *negative* with various scenarios (which is *wickedly* confusing to our poor users!)
Here’s that situation:
A collection has a really *hot* item one month, racking up downloads in a given collection. The next month, a DMCA takedown or otherwise removes the item from being available (and thus counted in the future). The downloads for that collection can plummet the next month’s run when the counts are summed over public items for that collection again. So that collection would have a negative (net) downloads count change for this next month!
Here’s our fix:
Use the current month’s collection “item membership” list for current month *and* prior month. Sum counts for all those items for both months, and make the graphed difference be that difference. In just about every situation that remains, graphed monthly download counts will be monotonic (nonnegative and increasing or zero).