In 1996, the World Wide Web was starting to catch on. Politicians were just beginning to explore how to use online communication to reach voters. And in a house in San Francisco, the fledgling Internet Archive was starting to archive pieces of the web before they disappeared.
That same year, a letter arrived from Washington, D.C., with the Smithsonian Institution’s iconic sunburst logo at the top. The Smithsonian had agreed to partner with the Internet Archive to preserve the digital record of the 1996 U.S. presidential election.
“It was a major milestone for us,” recalls Internet Archive founder Brewster Kahle. “The big Smithsonian was working with this new little Internet Archive nonprofit library.”
Together, the two institutions launched Web Archive 96, one of the first web collections the Internet Archive ever created. It captured the early campaign webpages of candidates Bill Clinton, Bob Dole, and Ross Perot — online brochures filled with policy positions, photos, and promises — along with news coverage of the race. It was a pioneering effort to preserve the political life of a nation as it moved onto the web. The collection is now a foundational part of our cultural history on the web, and is available for public access via the Wayback Machine.
Nearly thirty years later, that collaboration still stands out as visionary: two institutions, one old and one new, working together to recognize the internet as part of our shared cultural record.
Indeed, Smithsonian curators Larry Bird and Harry Rubenstein traveled to New Hampshire and Iowa every four years to collect buttons, signs and physical memorabilia from the campaign offices. Just as television changed the political landscape in the 1960s, they recognized the potential influence of the web in 1996. When they heard Kahle was archiving campaigns, Bird said they were “ecstatic” to collaborate.
“We were all over it,” said Bird, now a curator emeritus from the Smithsonian division of political history. “We were super glad that we could take this non-dimensional thing and for it to have a presence on the floor – even in this most rudimentary, stripped down way – limited to the candidates’ websites. It was an acknowledgement of where things were heading.”
Jeff Ubois, who forged the partnership in 1996, recalled “Why would anyone care about the ephemera of the web?” as the prevailing attitude at the time. “The Smithsonian helped change some of that.”
Once the Internet Archive partnered with the Smithsonian, “it wasn’t possible to dismiss web archiving as irrelevant, impossible, useless,” Ubois said.
People contact the Smithsonian often, Bird said, and the Internet Archive outreach was unexpected, but welcome. “We were constantly looking at the way things were shifting in politics, which always takes what’s popular and successful in the real world and bends it into its own political world or reality,” he said. “And this just seemed to be yet, the latest iteration of that as a cultural phenomenon….To have [the Internet Archive] assemble it wasn’t anything that any of us could have done at the time.”
‘Collection of Record for the Web’
Bird said the Internet Archive is a “remarkable resource” that he and other researchers have relied on for years.
“The museum is the collection of record for material things, objects, and dimensional things. And the Internet Archive is the collection of record for the web and all that implies,” Bird said. “There’s hardly anything that it doesn’t touch anymore. It didn’t start out that way, but it’s become that. It’s the collection of record that people use and cite and compare. It’s a tremendous historical resource.”
Preserving the evolution of political campaigns is important to anyone trying to do research or understand political trends over time, said David Almacy, president and chief executive officer of Far Post Media, a digital public affairs firm in Virginia and former White House E-Communications Director for President George W. Bush. In 1996, campaign websites were primarily online brochures – just text and photos without much customization. Today, websites are more advanced with video, digitally integrated with interactive elements that can be tailored to the user.
“The value is to provide an archive and a record of what was said, and basically a snapshot in time politically,” Almacy said. “It actually becomes fascinating to go back and look at the issues that were facing the country that would be deemed priorities in 1996 and how that compares to today. I assume a lot are the same – the economy, education, immigration, national security, global peace – but they’ve evolved in different ways. Many are very important to Americans, just as they were back then.”
Celebrating 1 trillion web pages archived, the Internet Archive is proud to honor the visionary who made it all possible. As announced in The New Yorker, this year’s Internet Archive Hero Award will be presented to Sir Tim Berners-Lee, inventor of the World Wide Web, whose groundbreaking work opened the door to a connected world and laid the foundation for our shared digital history.
The Internet Archive Hero Award is an annual award that recognizes those who have exhibited leadership in making information available for digital learners all over the world. Previous recipients have included the island nation of Aruba, public information advocate Carl Malamud, copyright expert Michelle Wu, and the Grateful Dead.
Sir Tim’s invention transformed how humanity shares knowledge, and his ongoing advocacy for an open and accessible web that empowers individuals continues to inspire us. We’re thrilled to recognize his enduring contributions as we mark this historic achievement for the web.
Sir Tim will receive the Hero Award at an event in San Francisco on October 9, and will be celebrated from afar during the Internet Archive’s annual celebration on October 22, “The Web We’ve Built.”
As the Internet Archive celebrates 1 trillion web pages archived, it’s worth revisiting what founder Brewster Kahle imagined back in 1996—when the web was still young and the Wayback Machine was years away from its public debut.
Nearly three decades ago, Internet Archive founder Brewster Kahle sketched out a bold vision for preserving the web before it could slip away—warning that without action, the digital age might echo the cultural losses of Alexandria’s library or early film reels.
Today, in 2025, many of the ideas he laid out in “Preserving the Internet,” published in the March 1997 issue of Scientific American, have come to life: a global digital library, tools that fight link rot, and researchers mining web history to understand our present. Other challenges he foresaw—like obsolete formats, legal battles, and questions of digital memory—remain pressing, but his optimism still holds: by building archives together, we can create a more reliable, enduring memory for the internet age.
Preserving the Internet Brewster Kahle Internet Archive 11/4/96 Bold efforts to record the entire Internet are expected to lead to new services. Submitted to Scientific American for March 1997 Issue
The early manuscripts at the Library of Alexandria were burned, much of early printing was not saved, and many early films were recycled for their silver content. While the Internet’s World Wide Web is unprecedented in spreading the popular voice of millions that would never have been published before, no one recorded these documents and images from 1 year ago. The history of early materials of each medium is one of loss and eventual partial reconstruction through fragments. A group of entrepreneurs and engineers have determined to not let this happen to the early Internet.
Even though the documents on the Internet are the easy documents to collect and archive, the average lifetime of a document is 75 days and then it is gone. While the changing nature of the Internet brings a freshness and vitality, it also creates problems for historians and users alike. A visiting professor at MIT, Carl Malamud, wanted to write a book citing some documents that were only available on the Internet’s World Wide Web system, but was concerned that future readers would get a familiar error message “404 Document not found” by the time the book was published. He asked if the Internet was “too unreliable” for scholarly citation.
Where libraries serve this role for books and periodicals that are no longer sold or easily accessible, no such equivalent yet exists for digital information. With the rise of the importance of digital information to the running of our society and culture, accompanied by the drop in costs for digital storage and access, these new digital libraries will soon take shape.
The Internet Archive is such a new organization that is collecting the public materials on the Internet to construct a digital library. The first step is to preserve the contents of this new medium. This collection will include all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system, and downloadable software.
If the example of paper libraries is a guide, this new resource will offer insights into human endeavor and lead to the creation of new services. Never before has this rich a cultural artifact been so easily available for research. Where historians have scattered club newsletters and fliers, physical diaries and letters, from past epochs, the World Wide Web offers a substantial collection that is easy to gather, store, and sift through when compared to its paper antecedents. Furthermore, as the Internet becomes a serious publishing system, then these archives and similar ones will also be available to serve documents that are no longer “in print”.
Apart from historical and scholarly research uses, these digital archives might be able to help with some common infrastructure complaints:
– Internet seems unreliable: “Document not found” – Information lacks context: “Where am I? Can I trust this information?” – Navigation: “Where should I go next?”
When working with books, libraries help with some of these issues, with “the stacks” of books, links to other libraries and librarians to help patrons.
Preservation of our Digital History
Where we can read the 400 year-old books printed by Gutenberg, it is often difficult to read a 15 year-old computer disk. The Commission for Preservation and Access in Washington DC has been researching the thorny problems faced trying to ensure the usability of the digital data over a period of decades. Where the Internet Archive will move the data to new media and new operating systems every 10 years, this only addresses part of the problem of preservation.
Using the saved files in the future may require conversion to new file formats. Text, images, audio, and video are undergoing changes at different rates. Since the World Wide Web currently has most of its textual and image content in only a few formats, we hope that it will be worth translating in the future, whereas we expect that the short lived or seldom used formats not be worth the future investment. Saving the software to read discarded formats often poses problems of preserving or simulating the machines that they ran on.
The physical security of the data must also be considered. Natural and political forces can destroy the data collected. Political ideologies change over time making what was once legal becomes illegal. We are looking for partners in other geographic and national locations to provide a robust archive system over time. To give some level of security from commercial forces that might want exclusive access to this archive, the data is donated to a special non-profit trust for long-term care taking. This non-profit organization is endowed with enough money to perform the necessary maintenance on the storage media over the years.
Packaging enough meta-data (information about the information) is necessary to inform future users. Since we do not know what future researchers will be interested in, we are documenting the methods of collection and attempt to be complete in those collections. As researchers start to use these data, the methods and data recorded can be refined.
Technical Issues of Gathering Data
Building the Internet Archive involves gathering, storing, and serving the terabytes of information that at some point were publicly accessible on the Internet.
Gathering these distributed files requires computers to constantly probe the servers looking for new or updated files. The Internet has several different subsystems to make information available such as the World Wide Web (WWW), File Transfer Protocol (FTP), Gopher, and Netnews. New systems for three-dimensional environments, chat facilities, and distributed software require new efforts to gather these files. Each of these systems requires special programs to probe and download appropriate files. Estimating the current size, turnover, and growth of the public Internet has proven tricky because of the dynamic nature of the systems being probed.
The World Wide Web is vast, growing rapidly, and filled with transient information. Estimated at 50 million pages with the average page online for only 75 days, the turnover is considerable. Furthermore, the number of pages is reported to be doubling every year. Using the average web page size of 30 kilobytes (including graphics) brings the current size of the Web to 1.5 terabytes (or million megabytes).
To gather the World Wide Web requires computers specifically programmed to “crawl” the net by downloading a web page, then finding the links to graphics and other pages on it, and then downloading those and continuing the process. This is the technique that the search engines, such as Altavista, use to create their indices to the World Wide Web. The Internet Archive currently holds 600GB of information of all types. In 1997 we will have collected a snapshot of the documents and images.
The information collected by these “crawlers” is not, unfortunately, all the information that can be seen on the Internet. Much of the data is restricted by the publisher, or stored in databases that are accessible through the World Wide Web but are not available to the simple crawlers. Other documents might have been inappropriate to collect in the first place, so authors can mark files or sites to indicate that crawlers are not welcome. Thus the collected Web will be able to give a feel of what the web looked like at a particular time, but will not simulate the full online environment.
While the current sizes are large, the Internet is continuing to grow rapidly. When it is common to connect one’s home camcorder to the upcoming high bandwidth Internet, it will not be practical to archive it all. At some point we will have to become more select what data will be of the most value in the future, but currently we can be afford to gather it all.
Storing Terabytes of Data Cost Effectively
Crucial to archiving the Internet, and digital libraries in general, is the cost effective storage of terabytes of data while still allowing timely access. Since the costs of storage has been dropping rapidly, the archiving cost is dropping. The flip side, of course, is that people are making more information available.
To stay ahead of this onslaught of text, images, and soon video information we believe we have to store the information for much less money than the original producers paid for their storage. It would be impractical to spend as much on our storage as everyone else combined.
Storage Technologies Cost per GigaByte Random access time
Memory (RAM) $12,000/GB 70nanoSeconds
Hard Disk $200/GB 15miliSeconds
Optical Disk Jukebox $140/GB 10seconds
Tape Jukebox $20/GB 4minutes
Tapes on shelf $2/GB human assistance required
(1 GigaByte = 1000 MegaBytes, 1TeraByte = 1000GigaBytes. A GigaByte is roughly enough to store 1000 books or 1 hour of compressed video)
With these prices, we chose hard disk storage for a small amount of the frequently accessed data combined with tape jukeboxes. In most applications we expect a small amount of information to be accessed much more frequently than the rest, leveraging the use of the faster disk technology rather than the tape jukebox.
Providing Access and New Services
After gathering and storing the public contents of the Internet, what services would then be of greatest value with such a repository? While it is impossible to be certain, digital versions of paper services might prove useful.
For instance, we can provide a “reliability service” for documents that are no longer available from the original publisher. This is similar to one of the roles of a library. In this way, one document can refer, through a hypertext link, to a document on another server and a reader will be able to follow that link even if the original is gone. We see this as an important piece of infrastructure if the global hypertext system is to become a medium for scholarly publishing.
Another application for a central archive would be to store an “official copy of record” of public information. These records are often of legal interest, helping to determine what was said or known at a particular time.
Historians have already found the material useful. David Allison of the Smithsonian Institution has used the materials for an exhibit on Presidential Election websites, which he thinks might be the equivalent to saving videotapes of early TV campaign advertisements. David Eddy Spicer of Harvard’s Kennedy School of Government has used the materials for their “case studies” in much the same way they collect old newspapers articles to capture a point in time.
With copies of the Internet over time and cross correlation of data from multiple sources, new services might help users understand what they are reading, when it was created, and what other people thought of it. With these services, people might be able to give a context to the information they are seeing and therefore know if they can trust it. Furthermore, the coordination of this meta-information and usage data can help build services for navigating the sea of data that is available.
Companies are also interested in saving similar information and building similar services based on their internal information to help employees effectively learn from the experiences of others.
The technologies and the services that will grow out of building digital archives and digital libraries could lead towards building a reliable system of information interchange based on electrons rather than paper. Using the “library” might be done many times a day to use documents that are no longer available on the Internet.
Legal and Social Issues
Creating an archive of informal and personal information has many difficult legal and social issues even if the material was intended to be publicly accessible at some point. Such a collection treads into the murky area intellectual property in the digital era. What can be done with the digital works that are collected gets into the area of copyright, privacy, import/export restrictions, and possession of stolen property.
To give a few examples: what if a college student made a web page that had pictures of her then-current boyfriend, but later wanted to take it down and “tear it up”, yet it lived on in digital archives (whether accessible or not). Should she have the right to remove that document? Should a candidate for political office be able to go back 15 years to erase his postings to public bulletin boards that have been saved in the Archive? What if a software program that is legal to publish in Denmark, but illegal in the United States is collected by an archive: should this program be removed and hidden even from historians and scholars? The legal and social issues raised by the construction of the Archive are not easily resolved.
By allowing authors to exclude their information from the Archive we hope to avoid some of the immediate issues, and allow enough time to pass to understand the larger issues at hand.
The Internet Archive might be able to help resolve some of these issues by publicly drawing the issues out and by participating in the debates. While many of these questions will take years to resolve, we feel it is important to proceed with the collection of the material since it can never be recovered in the future.
Where does it go from here?
The new technologies and services currently being created might be useful in all digital libraries and help make the Internet more robust and useful.
Through an archive of what millions of people are interested in making public, we might be able to detect new trends and patterns. Since these materials are in computer readable form, searching them, analyzing them, and distributing them has never been easier. A variety of services built on top of large data sets will allow us to connect people and ideas in new ways.
For instance, Firefly Inc. is using the individual tastes in music and movies to help suggest other CD’s and videos based on finding “similar” people. They have even found that people are interested in communicating with the other “similar” people directly thus forming communities based on similar interests. This kind of computer matchmaking which is based on detailed portraits of people’s preferences suggests similar services based on reading habits.
Trends in academic fields might be able to be detected more easily by studying gross statistics of the communications in the field. The hypertext links of the World Wide Web form an informal citation system similar to the footnote system already in use. Studying the topography of these links and their evolution might provide insights into what any given community thought was important.
If archiving cultural and personal histories become useful commercially, then the efforts can be expanded to record radio and video broadcasts. These systems might allow us to study these effects and influences on our lives.
Current terabyte technologies (storage hardware and management software) are relatively rare and specialized because of their costs, but as the costs drop we might see new applications that have traditionally used non-computer media. For instance,
– A video store holds about 5,000 video titles, or about 7 terabytes of compressed data. – A music radio station holds about 10,000 LP’s and CD’s or about 5 terabytes of uncompressed data. – The Library of Congress contain about 20 million volumes, or about 20 terabytes text if typed into a computer. – A semester of classroom lectures of a small college is about 18 terabytes of compressed data.
Therefore the continued reduction in price of data storage, and also data transmission, could lead to interesting applications as all the text of a library, music of a radio station, and video of a video store become cost effective to store and later transmitted in digital form.
In the end, our goal is to help people answer hard questions. Not “what is my bank balance?”, or “where can I buy the cheapest shoes”, or “where is my friend Bill?” – these will be answered by smaller commercial services. Rather, answer the hard questions like: “Should I go back to graduate school?” or “How should I raise my children?” or “What book should I read next?”. Questions such as these can be informed by the experiences of others. Can machines and digital libraries really help in answering such questions? In the long term, we believe yes, but perhaps in new ways which would have importance in education and day-to-day life.
Further Reading:
Preserving Digital Objects: Recurrent Needs and Challenges, December 1995 presentation at 2nd NPO conference on Multimedia Preservation, Brisbane, Australia.
The Vanished Library, Luciano Canfora. University of Berkeley Press, 1990.
Biography:
Brewster Kahle is a founder of the Internet Archive in April 1996. Before that, he was the inventor of the Wide Area Information Servers (WAIS) system in 1989 and founded WAIS Inc in 1992. WAIS helped bring commercial and government agencies onto the Internet by selling Internet publishing tools and production services to companies such as Encyclopaedia Britannica, New York Times, and the Government Printing Office.
Schooled at MIT (BSEE ’82), Brewster designed super computers in the 80’s at Thinking Machines Corporation.
This October, the Internet Archive will celebrate an extraordinary milestone: 1 trillion web pages preserved and available for access via the Wayback Machine.
The series of events scheduled throughout October will highlight the people, technology, and community efforts that have made this achievement possible, and will look ahead to the future of web preservation as we continue building the web’s collective memory together.
Oct 7 – The Vast Blue We: An interactive evening of live music with Del Sol Quartet, featuring new works by Erika Oba and Sam Reider, exploring the wonder of human collaboration. (7–8:15pm PT | San Francisco & online) — Learn more & register
Oct 21 – Doors Open 2025: Go behind the scenes at the Physical Archive to see the lifecycle of books, records, film, and more—from donation to digitization. (6–8pm PT | In person only) — Learn more & register
Oct 22 – The Web We’ve Built: Our annual celebration, marking 1 trillion webpages preserved in the Wayback Machine. Join us in San Francisco or online for an evening of talks, performances, and community. (5–10pm PT | Live stream 7–8pm PT) — Learn more & register
Santa Cruz-based steel lap guitarist, Bill Walker, performing at a virtual staff meeting (2020).
Since 2020, the Internet Archive has been inviting musicians from around the world to play short live sets for our virtual staff meetings. What started as a way to bring our staff together and support artists during the earliest days of the pandemic has grown into a beloved tradition: twice a week, we gather online for 10 minutes of live music before diving into our Monday morning or Friday lunch staff meetings. Check out past performances here.
We’d love to feature you!
How It Works
Performance: A 10-minute set via Zoom before one of our staff meetings
Schedule Options:
Mondays: Sound check at 9:40 AM PT, performance from 9:55–10:05 AM PT
Fridays: Sound check at 11:40 AM PT, performance from 11:55 AM–12:05 PM PT
Honorarium: $100 + tips (via Venmo or PayPal)
Creative Freedom: Play what you love—we welcome all genres, styles, and sounds!
Send an e-mail to our booking team at info@archive.org with a short bio and any links to your music, social media, or merch.
Why Play for the Archive?
The Internet Archive is a nonprofit research library with a mission to provide Universal Access to All Knowledge. Our staff—curious, grateful, and globally distributed—loves starting and ending the week with new music. It’s a short, fun way to share your sound with a receptive, appreciative audience.
This October, the Internet Archive’s Wayback Machine is projected to hit a once-in-a-generation milestone: 1 trillion web pages archived. That’s one trillion memories, moments, and movements—preserved for the public, forever.
We’ll be commemorating this historic achievement on October 22, 2025, with a global event: a party at our San Francisco headquarters and a livestream for friends and supporters around the world. More than a celebration, it’s a tribute to what we’ve built together: a free and open digital library of the web.
Join us in marking this incredible milestone. Together, we’ve built the largest archive of web history ever assembled. Let’s celebrate this achievement—in San Francisco and around the world—on October 22.
Here’s how you can take part:
1. RSVP Sign up now to be the first to know when registration opens for our in-person event and livestream. RSVP now
2. Support the Internet Archive Help us continue preserving the web for generations to come. Donate today!
3. Share Your Story What does the web mean to you? How has the Wayback Machine helped you remember, research, or recover something important? Submit your story
Let’s work together toward October 22—a day to look back, share stories, and celebrate the web we’ve built and preserved together.
A recent legal decision has reaffirmed the power of fair use in the digital age, and it’s a big win for libraries and the future of public access to knowledge.
On June 24, 2025, Judge William Alsup of the United States District Court for the Northern District of California ruled in favor of Anthropic, finding that the company’s use of purchased copyrighted books to train its AI model qualified as fair use. While the case centered on emerging AI technologies, the implications of the ruling reach much further—especially for institutions like libraries that depend on fair use to preserve and provide access to information.
What the Decision Says
In the case, publishers claimed that Anthropic infringed copyright by including copyrighted books in its AI training dataset. Some of those books were acquired in physical form and then digitized by Anthropic to make them usable for machine learning.
The court sided with Anthropic on this point, holding that the company’s “format-change from print library copies to digital library copies was transformative under fair use factor one” and therefore constituted fair use. It also ruled that using those digitized copies to train an AI model was a transformative use, again qualifying as fair use under U.S. law.
This part of the ruling strongly echoes previous landmark decisions, especially Authors Guild v. Google, which upheld the legality of digitizing books for search and analysis. The court explicitly cited the Google Books case as supporting precedent.
While we believe the ruling is headed in the right direction—recognizing both format shifting and transformative use—the court factored in destruction of the original physical books as part of the digitization process, a limitation we believe could be harmful if broadly applied to libraries and archives.
What It Means for Libraries
Libraries rely on fair use every day. Whether it’s digitizing books, archiving websites, or preserving at-risk digital content, fair use enables libraries to fulfill our public service missions in the digital age: making knowledge available, searchable, and accessible for current and future generations.
This decision reinforces the idea that copying for non-commercial, transformative purposes—like making a book searchable, training an AI, or preserving web pages—can be lawful under fair use. That legal protection is essential to modern librarianship.
In fact, the court’s analysis strengthens the legal groundwork that libraries have relied on for years. As with the Google Books decision, it affirms that digitization for research, discovery, and technological advancement can align with copyright law, not violate it.
Looking Ahead
This ruling is an important step forward for libraries. It reaffirms that fair use continues to adapt alongside new technologies, and that the law can recognize public interest in access, preservation, and innovation.
As we navigate a rapidly changing technological landscape, it’s more important than ever to defend fair use and support the institutions that bring knowledge to the public. Libraries are essential infrastructure for an informed society, and legal precedents like this help ensure they can continue their vital work in the digital age.
How is knowledge created, shared, and preserved in the digital age—and what forces are shaping its future?
We’re thrilled to announce the launch of Future Knowledge, a new podcast from the Internet Archive and Authors Alliance. Hosted by Chris Freeland, librarian at the Internet Archive, and Dave Hansen, executive director of Authors Alliance, the series brings together authors, librarians, policymakers, technologists, and artists to explore how knowledge, creativity, and policy intersect in today’s fast-changing world.
In each episode, an author discusses their book or publication and the big ideas behind it—paired with a thought-provoking conversation partner who brings a fresh perspective from the realms of policy, technology, libraries, or the arts.
We’re kicking off the podcast with a double feature—two episodes tackling copyright history and AI’s global impact:
Episode 1: The Copyright Wars
Historian Peter Baldwin joins copyright scholar Pamela Samuelson to unpack The Copyright Wars—a sweeping look at 300 years of trans-Atlantic copyright battles. From 18th-century publishing monopolies to today’s clashes between Big Tech, libraries, and the entertainment industry, this conversation reveals how history can illuminate the future of intellectual property in a digital world.
Episode 2: Copyright, AI, and Great Power Competition
Authors Joshua Levine and Tim Hwang sit down with Lila Bailey to discuss Copyright, AI, and Great Power Competition. Together they explore how artificial intelligence is transforming copyright law—and how global powers are using IP policy as a strategic tool in the race for technological dominance.
Whether you’re an author thinking about how to share your work, a librarian navigating digital access, or a curious listener exploring how knowledge shapes our world, Future Knowledge is for you.
Setting up a livestream is more complicated than just turning on a camera. That’s why the Internet Archive tapped into the expertise of Sophia Tung, a software engineer and online content creator, to help create the livestream for its microfiche scanning center, which launched May 21.
The 29-year-old garnered international media coverage for her livestream of robotaxis parked in a depot just below her San Francisco apartment as they jostled and honked – sometimes in the middle of the night.
“I put it up just sort of as a meme to get some attention. If I couldn’t do anything about it, then I might as well make the best of it,” Tung said of the livestream she posted on YouTube with Lo-fi music in the background. “People became fans of it and Brewster [Kahle, Internet Archive’s digital librarian] reached out to see if I could do something similar with the Internet Archive.”
An avid user of the Internet Archive for years, Tung said she was eager to visit its Funston Avenue headquarters and work with the staff on the project. As a sign of our tech-connected times, it’s become popular to have a mesmerizing scene with mellow music playing on a second monitor as people work. Tung said she could envision a relaxing, but informative, feed showing the preservation process.
Sophia Tung
Tung met with the team who take microfiche – flat sheets of film that hold miniaturized documents – and turn them into digital images that can be accessed online. The team is now digitizing U.S. Supreme Court case documents and government records from Canada dating back to the 1930s.
After assessing the space with five active microfiche digitization stations,Tung decided on a three-camera setup for the livestream. One is focused on an operator feeding microfiche cards under a high-resolution camera that captures multiple detailed images. Another is an up-close look of what actually happens on the machine. A third wide-angle camera covers the entire room and is blurred for security, but still conveys motion.
All team members are open to being on camera as they work, but Tung said she recognized privacy concerns may arise. She devised a pause button to be installed to stop the feed, momentarily dimming the “on air” sign in the room. Although initially concerned that employees might not like being on camera, Tung said staff were hired who agreed to the concept and they are on board with the livestream as a mixed media project.
Live activity with the scanners occurs Monday–Friday, 7:30am-3:30pm U.S. Pacific Time (GMT+8)—except U.S. holidays. Ambient Lo-fi music plays continuously. After hours, other Internet Archive content runs on the video feed including silent films, lost landscape footage from everyday life, and public domain photographs from NASA and other sources.
The project has required a combination of engineering to make the infrastructure work 24/7, plus physical design integrating signage and broadcasting lights, which Tung says she enjoyed. Her goal was two-fold: to recreate the excitement of her last livestream and to shine a light on the individuals working behind the scenes at the Archive.
“I always thought about the Internet Archive as just some mysterious entity, trying to preserve what we as individuals cannot. It’s an invaluable tool for journalists and, basically, everybody,” Tung said. “Now, preservation is more important than ever. I think people just assume that it happens. Actually, it takes money, effort, machinery and people. I think it’s important to highlight all the people-hours that go into it.”
Tung produced an explainer video about the microfiche livestream project on YouTube. “The reception has been great so far,” said Tung, who is working on more features and possible additional channels to add to the stream. “I hope the stream brings awareness to the effort it takes to preserve all this important material. If we don’t preserve it now, we are going to lose it.”
All microfiche materials are added to Democracy’s Library, the global project to collect, digitize, and provide free public access to the world’s government publications.
Rob Reich, performing at the Internet Archive’s annual celebration, October 2022.
We are deeply saddened by the passing of Rob Reich, a remarkable musician whose warmth, humor, and creativity touched the hearts of so many. Based in San Francisco, Rob was a frequent and beloved performer in our “Essential Music Concerts from Home” series at the Internet Archive. At the height of the pandemic in October 2020, when we all needed connection and comfort, Rob brought us both. He performed for us a total of eight times, including serving as the MC for two of our virtual holiday parties during the pandemic. His music lifted our spirits, and his presence made everything feel like a celebration.
Rob and his ensemble, Circus Bella, kicked off our October 2022 celebration with their signature whimsy and energy. He was a master of joy-infused musicianship—a true one-man band. Whether playing the accordion, piano, bells, whistles, or cymbals, Rob’s performances were always memorable. One Bastille Day, he performed in a striped shirt and beret, with an Eiffel Tower zoom backdrop, serenading us with French classics.
I once had the pleasure of seeing him perform at Zuni, a favorite restaurant in San Francisco, where he played timeless tunes as patrons enjoyed oysters, Caesar salad, and roasted chicken.You’d never have guessed he was also a circus performer—such was his versatility.
Rob was more than a performer—he was someone we could count on. He was reliable, kind, hilarious, serious, wildly creative, and most of all, genuine.
We are grateful for the joy Rob brought to us and to so many others. His loss leaves a silence, but his music and memory continue to resonate.