Anti-Hallucination Add-on for AI Services Possibility

Chatbots, like OpenIA’s ChatGPT, Google’s Bard and others, have a hallucination problem (their term, not ours). It can make something up and state it authoritatively. It is a real problem. But there can be an old-fashioned answer, as a parent might say: “Look it up!”

Imagine for a moment the Internet Archive, working with responsible AI companies and research projects, could automate “Looking it Up” in a vast library to make those services more dependable, reliable, and trustworthy. How?

The Internet Archive and AI companies could offer an anti-hallucination service ‘add-on’ to the chatbots that could cite supporting evidence and counter claims to chatbot assertions by leveraging the library collections at the Internet Archive (most of which were published before generative AI).

By citing evidence for and against assertions based on papers, books, newspapers, magazines, books, TV, radio, government documents, we can build a stronger, more reliable knowledge infrastructure for a generation that turns to their screens for answers. Although many of these generative AI companies are already, or are intending, to link their models to the internet, what the Internet Archive can uniquely offer is our vast collection of “historical internet” content. We have been archiving the web for 27 years, which means we have decades of human-generated knowledge. This might become invaluable in an age when we might see a drastic increase in AI-generated content. So an Internet Archive add-on is not just a matter of leveraging knowledge available on the internet, but also knowledge available on the history of the internet.

Is this possible? We think yes because we are already doing something like this for Wikipedia by hand and with special-purpose robots like Internet Archive Bot Wikipedia communities, and these bots, have fixed over 17 million broken links, and have linked one million assertions to specific pages in over 250,000 books. With the help of the AI companies, we believe we can make this an automated process that could respond to the customized essays their services produce. Much of the same technologies used for the chatbots can be used to mine assertions in the literature and find when, and in what context, those assertions were made.

The result would be a more dependable World Wide Web, one where disinformation and propaganda are easier to challenge, and therefore weaken.

Yes, there are 4 major publishers suing to destroy a significant part of the Internet Archive’s book corpus, but we are appealing this ruling. We believe that one role of a research library like the Internet Archive, is to own collections that can be used in new ways by researchers and the general public to understand their world.

What is required? Common purpose, partners, and money. We see a role for a Public AI Research laboratory that can mine vast collections without rights issues arising. While the collections are significant already, we see collecting, digitizing, and making available the publications of the democracies around the world to expand the corpus greatly.

We see roles for scientists, researchers, humanists, ethicists, engineers, governments, and philanthropists, working together to build a better Internet.

If you would like to be involved, please contact Mark Graham at mark@archive.org.

DLARC Radio Library Surpasses 75,000 Items of Ham Radio, Shortwave History

A cartoon of a huge library of books, with a tall ladder to reach the upper stacks. A person, who seems dwarfed by the shelves of books, sits on the floor reading.

Internet Archive’s Digital Library of Amateur Radio & Communications continues to expand its collection of online resources about ham radio, shortwave, amateur television, and related communications. The library has grown to more than 75,000 items, with new resources including newsletters, podcasts, and conference presentations.

DLARC has recently added hundreds of presentations recorded by RATPAC, the Radio Amateur Training Planning and Activities Committee, and dozens of talks given at the MicroHams Digital Conference.

DLARC is adding newsletters from amateur radio groups around the world: the latest additions include 1,400 news bulletins from Irish Radio Transmitters Society going back to 1998, and more than 600 newsletters from ​​the Worldwide TV-FM DX Association, a hobby club devoted to long-distance television and FM communications. The library has also added newsletters from regional groups across the United States, including the Anchorage (Alaska) Amateur Radio Club, Indianapolis (Indiana) Radio Club, the Pikes Peak (Colorado Springs, Colorado) Radio Amateur Association, and a dozen other organizations. Many of these newsletters have never been posted to the Internet before. All are full-text searchable, and can be read online or downloaded.

Internationally known radio host Glenn Hauser has allowed decades of his radio content to be archived in the DLARC library, including 1,200 episodes of World of Radio, which explores communications from around the world, especially shortwave radio; Informe DX and Mundo Radial, Spanish language translations of World of Radio; Continent of Media, a program about media around the American continent; and Hauserlogs, shortwave listening diaries. 

International Radio Report, a program about radio in Montreal Canada and around the world, has also been archived in the library with episodes going back to 2000. Many of these episodes, spanning May 2000 through March 2005, have not been available online for more than a decade, restoring access to important contemporary reporting.

DLARC continues to expand its collection of ham radio e-mail and Usenet conversations from the early days of the Internet, with the addition of nearly 3,500 QRP-L Digest mailings spanning 1993 through 2004. QRP-L was an early Internet e-mail list for discussion of the design, construction, and use of low-power radio equipment.

The collection of ham radio-related podcasts has reached 5,500 episodes with the additions of 100 Watts and a Wire, The World According to Elmer, and 30 episodes of The Rain Report that were thought to be lost. 

The Digital Library of Amateur Radio & Communications is funded by a grant from Amateur Radio Digital Communications (ARDC) to create a free digital library for the radio community, researchers, educators, and students. DLARC invites radio clubs and individuals to submit material in any format. To contribute or ask questions about the project, contact:

Kay Savetz, K6KJN
Program Manager, Special Collections
kay@archive.org
Mastodon: dlarc@mastodon.radio

The Easy Roll and Slow Burn of Cassette-Based Software

Patrons come to the Internet Archive’s software collections for many reasons, and among the major reasons are some manner of playing historical software in our in-browser emulation environment. Well over a decade old now, the Emularity gives near-instant access to functional versions of what would otherwise be dormant software packages. If a patron wants to go from idly thinking they’d like to try something to playing a 2011 Pac-Man clone running in an obscure DOS graphics resolution, they can be experiencing it in anywhere from seconds to under a minute.

Naturally, “near-instant” is a nebulous, and inaccurate, portrayal of the time required to spin up the Emularity’s environment – a Webassembly runtime with an emulator embedded in it will come through, followed by whatever the total time to download the software itself afterwards. This playable version of Apple Macintosh System 7.0.1 requires 10 additional megabytes to download the hard drive image it is booting. That data will either snap down instantly on your fast connection, or be achingly slow on a less robust one.

Cooked into everything digital and online this present day is the fact that speed and efficiency win out over authenticity and reality. We go from thinking we’d like to hear a piece of music to hearing it (or never hearing it, as we can’t find it), in brisk flashes, a few clicks and a momentary pause. But listen to a track of music written decades ago, and a mass of assumptions by the creators of how you would experience these works no longer apply.

We’re not indicating the only way to enjoy Dark Side of the Moon is to see it mentioned in a magazine or fanzine in 1973, wander down to your local record store, see them stacked up near the front of a rack, and then buy one and take it home, gently unpacking the stickers and poster while playing the album in headphones or on speakers, cross-legged on your shag carpet.

…but they probably thought you were going to at the time.

Such it is, in the emulated world of software, that the way these works will be primarily enjoyed through all of time to come is as discrete blocks, loaded into a waiting process or slot, and then turned on moments after being selected. This is right and good – it’s a decent argument that many people would not want to sit through the “realistic” amount of time it would have taken to boot up software at the time of its release.

But maybe some of you do.

Most people who use computers know that they once loaded from floppy disks, plastic cartridges with magnetic plastic rings inside that could hold some small amount of data. Slow, weird, but the aesthetic experience of the floppy disk has, to some small amount, bubbled up into the present day. It’s seen as a “save icon”, or a reference to times long past, and there’s even a notable amount of “old floppy disks” found in family storage, where younger generations find them in the same way you might find an old smoking pipe or a saved wedding invitation.

But nestled in a relatively short span of time is the era of cassette-based loading, where actual audio tapes could have data stored on them, and played back to load into computers.

In terms of adoption, the cassette-based software period is marked by people entering it and almost immediately clawing their way out of it as soon as they can afford to. The combination of time consuming playback, limited data storage, and lack of read-write ease ensured that as soon as anything better came along, a user would leap to it.

As a result, it has become the case that not only are there people who consider themselves computer enthusiasts who have only a light glancing memory or awareness of cassette-based software – there are people who are not aware this ever even happened.

So, let us begin.

In the wild and wooly days of kit computers, where one of the major options was to be sent a pile of parts and instructions to screw, solder and assemble them into a functioning beast, the option of saving your code into a cassette tape machine was one of the possible storage options. And by “cassette tape machine”, we’re bringing back a dozen memories of schoolchildren in the 1970s:

And this is exactly what it sounds like, using the headphone and microphone jack of these machines that would normally provide language education and field recordings, and attaching them to circuit boards either recording to or listening to standard audio cassettes.

So, an expensive ($27.99 is $151 in today’s dollars.) machine to hook up but due to the dropping costs of blank cassettes, themselves manufactured in the millions to satisfy a range of customers, you could now tinker and toy on these computer kits and save out your digital creations to a medium with a fairly high chance of recovering them again.

Combine the simplicity of the programs involved, the often-cheap tape medium being bought, and the use of labelmakers to create adhesive labels to describe the inside program, and you get a very memorable, very evocative 1970s-1980s aesthetic that will either confuse the new or warm the heart of the old:


As a side note on our introductory tour, it would be possible to save multiple programs on these tapes, but you had to be very careful about where on the tape counter each program was placed, so your works didn’t override each other, an art far above the head of the impatient or unwilling to rewind to the beginning of the tape and carefully watch a rising counter number to find fresh fields of data.

And the result, not obvious otherwise, are these glyphs providing you a possibly cryptic map of “dumps” (writing of the programs into the tape, following the counter as you would go). Note how in the label, a failed/broken program has to be left where it is on the tape; a combination of hearing the old program under an overwritten one, as well as not being able to predict how many feet down the tape a new program will take, means the road to data dumps is littered with broken memories.


But what is exactly ending up on these audio cassettes (now considered to be data cassettes, or datasettes)?

Particularly curious folks can read up about examples like the Tarbell Cassette Interface and Protocols here in the bitsavers collection. But in more general terms, a variety of standards (and not-so-standards) had emerged where pulses of sound went onto an audio cassette, and these sounds reflected individual data that could then be interpreted coming back off the tape. The distinction is important here – these were not digital signals, but digital information encoded as analog/audio recordings. Think of someone shouting the word “42!” instead of digitally encoding up-down on-off data like “101010”.


The questions that might come to mind are probably myriad.

This sounds like it’s incredibly slow. Why yes indeed. The tarbell format/interface linked above brought in data at a screaming 187 bytes per second, that is, a couple short sentences worth of words. Compare that with a capability of 200,000 bytes per second of an early floppy drive and you can see why people would jump. (Naturally, modifications to the Tarbell format and alternative cassette tape electronics could increase the transfer rate to 540 bytes per second and above, but you’re adding complications.)

What even is a “Tarbell”? Oh, you mean Mr. Don Tarbell, creator of the interface in question, who wrote an extensive article about his work in Kilobaud magazine. (RIP to Mr. Tarbell, who died in 1998.)

Won’t it take a very long time to dump and read this data, at such a slow speed? Why, yes! That is the crux of this already-long article, coming up.

Is this the only way people ended up saving and loading data to cassette tapes? Why, no!


In the late 1970s-early 1980s, a myriad of cassette-tape based storage systems started being sold as an option for the “home computer” market, the plastic-wrapped, cheaper but all-in-one-already-built computers being sold by various companies in a bid to become dominant in the market. (Even the ultimately-winning IBM PC had a cassette port, although the system was generally sold with a floppy drive and it’s unlikely any significant number of people used it.) Each of these systems had slightly different approaches to how they wrote data on tape, and read it back, with speed differing notably between them.

By the late 1980s, it was rapidly becoming unusual to depend on data tapes to read and load to home computers, and the jump to floppies and ultimately hard drives and CD-ROMs came with mass adoption in the 1990s.

But it wasn’t completely gone, either.

In this zone, this Venn Diagram of a market of computer end-users with this exhausting set of possible media input devices, commercial creators had a heck of time. Intent on reaching every market they reasonably could, multiple formats meant multiple releases of the same programs.

Does this mean that Activision had to spend the money to make a floppy disk version and a computer cassette version of the game Ghostbusters for Commodore 64? Yes, in fact they did. (One distributor, HES, even made a cartridge version, which was an Australia-only hack that put the contents onto a cartridge and forced a loading as if it was a floppy.)

For someone playing these games, the choice (or lack of choice) of medium meant their memories, experiences and consideration of the programs was very different, reflections of what configuration they could afford.

All these moments of time, washed away like spools of tape.

…unless you seek them out.

Emulation at the Internet Archive is designed to be fast, easy, and a snap between first thought and trying the software out. In general, this is the case; you decide you want to play a game and a very short time later, you’re playing that game.

This is an absolute gift, if your intention is to browse a lot of obscure, weird, or possibly-bad games, giving them minimal amounts of time while discovering what you were looking for. With a few clicks, you run down the massive lists of potential stops, watch the program come up, and give it a quick regard as to whether it’s worth your time or what you were seeking. As a finding and exploratory environment, it’s the way to go.

This comes at a cost, one which a lot of media/content is experiencing, now – your personal investment (time, patience, effort) is miniscule, meaning you will probably switch away in milliseconds if anything is annoying or non-intuitive. Annoying sound? Hard-to-figure-out key mappings? Slow title screen? Onto the next thing.

Loading cassette tapes of data is slow, slower than a modern person might consider reasonable. By using a variety of tricks, providers of older data tried to mitigate this by converting tape data, which would be a slow drip of data, into a single finished, packed set of instructions. The Z80 format, for example, is the result of a snapshot of the memory banks of a Sinclair computer, so if you visit the stacks of Sinclair ZX programs, you’re not loading in data the “old” way – you’re spinning up an already-loaded machine that is just starting off from the moment it all finally loaded in. Dropped off at the finish line, it still feels reduced in speed from today, but that’s just the (intended) experience of the machine it is running on.

Two platforms at the Internet Archive currently provide the experience of loading programs from cassette tapes: The Sinclair ZX-81 and the Commodore 64. Both of these home computers flourished in the early 1980s, with use of them continuing longer based on the available hardware to users up until the 1990s. The ZX-81 itself faded before the C64, and the C64 saw transition to a (slow-reading) floppy drive that made cassettes fall out of favor as the decade moved on.

In both cases, there is the original data provided in a straightforward file; for the ZX-81 a .p or a .tzx tape data compilation, very small and compact. For the Commodore 64 cassette version, a .tap file with the information less compact, because the programs on the Commodore 64 could be significantly bigger.

To this extent, loading a program on the ZX-81 emulation can be demonstrated on the program One Little Ghost, a 2012 retro-make version of Pac-Man that stuffs a whole lot of game into a very limited system. Going to this emulation and starting it, you are faced with not an instantly loading, black and white remake of Pac-Man, but this:

Incomprehensible! Mysterious! Uninformative! Welcome to home computing in the 1980s!

This is the gap between today’s fast-food version of computer history, and trying to get things running in the original days of the hardware. We’re not even addressing the peculiar aspects of the ZX-81’s RAMpack, an externally attached device for increasing memory, that was legendary for just falling off while using the machine.

To make this interaction of software-emulated machine and actual machine at all clear, instructions were added to the items added to the Archive’s collection, including this intimidating set of movements and keypresses to be able to set a cassette loading:

“Currently, emulation for this item does not auto-start. To load the ‘cassette’ this program is located on, press the following keys: j (which will appear as LOAD), shift-p shift -p (Which will appear as double quotes) and then ENTER/RETURN. Then press SCROLL LOCK on your keyboard, and the F2 key. If all is working properly, the system will bring up a box showing the cassette tape loading into memory. It will stop when complete and the emulation can be interacted with normally. Some games will run by themselves after cassette loading, while others can be started by pressing the r key and ENTER to run.”

As one last piece of intimidation, the instructions have to include a picture of what the ZX-81 keyboard even looked like:

Once you negotiate the instructions, press all the requested commands, and sit back, the emulation rewards you with a screen not unlike this:

What is happening here is an honest-to-goodness cassette loading sequence. The constantly flashing graphics are an ancient hack to allow the end-user to know things are “working”, that the data is being successfully read off the tape. An additional convenience is provided by the emulator: on the top left is an overlay window indicating that it is 16 seconds into the loading of the data, and that there are six minutes and fifty-two seconds of loading to be done, for a total of 412 tape tick counts. Yes, you are reading that correctly: loading this program will take seven minutes of real time.

Perhaps it becomes obvious why so many shortcuts begin to arrive in emulation and why so many people were willing to spend the equivalent of a short vacation to bring their machines into the floppy age.

You can browse the ZX-81 collection of programs and see preview screenshots of the games. The good news is that a human didn’t generate them – a script called SCREENSHOTGUN started up, “pressed PLAY on the cassette player”, and then waited until the whole thing was loaded, and then removed most of the “loading” screenshots to bring you the result. That is a lot of saved misery.

Which brings us to the Commodore 64.

The Commodore 64 stands to live in fame forever as the most-sold, unchanged home computer in history. From 1982 to 1994, with no substantial changes in its configuration or capabilities, the C64 plowed on through multiple generations of industry standards, providing an inexpensive and dependable on-ramp for families and individuals to acquaint themselves with this “computer in the home” nonsense taking over the general populace.

While much can be made of its shortcomings, most of these have faded into a quaint memory of working around them. The breadth of software, the utter domination of understanding of the ins and outs of the machine’s quirks, and the toil of many millions of users means that the Commodore 64 is a giant in computer history.

This long-lived history is reflected in the sheer mass of programs, games, applications and demonstration programs for the Commodore 64, numbering in the hundreds of thousands.

And to that end, a group has been working hard to preserve that most odd, increasingly rare-to-find subset of C64 works: the cassette-based commercial products that flourished in the beginning of its life.

And here we find ourselves, a significant number of paragraphs later, to what inspired this walk down tape-loading memory lane: the forgotten minutes-long time of the experience of loading these tapes on the Commodore 64, and the many different ways that companies tried to keep buyers from restlessly complaining the tapes weren’t “working” and giving up before the long loading times revealed the programs.

The Ultimate Tape Archive is an effort to preserve these cassette-based titles, including the data on the tape, scans of the tapes themselves, and of the cassette case inserts, papers inside the case, that provided instructions or cover art. The group behind this project have been tirelessly approaching the acquisition and preservation for years, and as of this writing, version 4.0 contains over 2,000 individual tapes.

To honor their work and provide the final step in experiencing them, it is possible to visit the Ultimate Tape Archive on the Internet Archive and emulate these items in the browser. Doing so, however, will be committing the slow burn, the steady and time-consuming cassette loading of the programs.

Luckily, for this essay, a machine stepped forward to do the work.

Utilizing the SCREENSHOTGUN program, the thousands of cassettes from this collection were “inserted”, “loaded”, and “played”, over the course of a month, to allow a later quick assessment of the experience of loading them. The reasoning behind this was that companies employed not just clever, but inspiring ways to distract end-users.

What follows are some of the highlights.

In most cases, the loaders will use the same trick we saw with the ZX-81, except with color: a rapidly pulsating, moving set of lines reflecting that something is happening, and data is coming in. A sense of “hold on, everything is working”, not requiring a lot of machine time to provide, and swapping between colors for a few minutes before a burst of music and graphics came at the end of a reward.

The main variations are the use of different lines or arrangement of lines to reflect this. For example, this thinner and more “high resolution”-feeling version:

This was more than sufficient for most cassette-based titles. But some wanted to go a little further to inform the user.

A significant upgrade was adding a sort of “loading screen” to the process, where the lines would still move, but around a “title card” that informed you what was “coming soon” to your computer. Some had nice graphics to accompany them, representing some attempt at recreating the printed artwork on the cassette insert.

An impressive upgrade, but not something to leave well enough alone to a generation of tinkerers.

To give an answer of “are we there yet?”, some titles included calculated countdown timers to tell users how long they had to go before the game would go:

And again, this would have been a favorable solution to the “sit and wait” problem, but a handful of instances exist of the most rare of tape-loading systems: The Loading Games. Games that would be booted up by the system, that would then load the actual programs in the background while you played an amusement, either a song or a smaller game while waiting.

These are harder to discern using the screenshotting system, but a few are clear:


Perhaps not surprisingly, this clear and available innovation in software loading was ignored when Namco created a “play while you load” system in the 1990s, and they were granted a software patent that killed innovation in the bud for years. The epic story of the software patent for a game while waiting for a game to load was covered extensively by the Electronic Frontier Foundation upon its expiration in 2015.

We are, as a whole, incredibly lucky at how the state of in-browser emulation, built on open standards and persisting through most modern browsers, allows us easy insight into historical software and either experiencing or re-experiencing these bright lights of brilliant code. But we must also realize that a lot of what we are interacting with in the present day are, top to bottom, towering piles of shortcuts and “we’ll just skip ahead”, trying to accommodate a type of user that has no time or patience for how things were once. And this new type of user, ignoring or unaware of the lost minutes any choice could be, find it harder and harder to relate to what came before, the environment that those creations were made in.

Perhaps if you have a spare few minutes, a lull in your day, you might consider browsing the stacks of the Ultimate Tape Archive and finding a compelling cassette tape cover, look over the JPG scans of the cassette inlay and printed instructions, and “press play on tape”, feeling some moments of anticipation as another gifted person’s efforts provided you with a chance to experience a very complex joy emanating from a very simple machine.

Internet Archive weighs in on Artificial Intelligence at the Copyright Office

All too often, the formulation of copyright policy in the United States has been dominated by incumbent copyright industries. As Professor Jessica Litman explained in a recent Internet Archive book talk, copyright laws in the 20th century were largely “worked out by the industries that were the beneficiaries of copyright” to favor their economic interests. In these circumstances, Professor Litman has written, the Copyright Office “plays a crucial role in managing the multilateral negotiations and interpreting their results to Congress.” And at various times in history, the Office has had the opportunity to use this role to add balance to the policymaking process.

We at the Internet Archive are always pleased to see the Copyright Office invite a broad range of voices to discussions of copyright policy and to participate in such discussions ourselves. We did just that earlier this month, participating in a session at the United States Copyright Office on Copyright and Artificial Intelligence. This was the first in a series of sessions the Office will be hosting throughout the first half of 2023, as it works through its “initiative to examine the copyright law and policy issues raised by artificial intelligence (AI) technology.”

As we explained at the event, innovative machine learning and artificial intelligence technology is already helping us build our library. For example, our process for digitizing texts–including never-before-digitized government documents–has been significantly improved by the introduction of LSTM technology. And state-of-the-art AI tools have helped us improve our collection of 100 year-old 78 rpm records. Policymakers dazzled by the latest developments in consumer-facing AI should not forget that there are other uses of this general purpose technology–many of them outside the commercial context of traditional copyright industries–which nevertheless serve the purpose of copyright: “to increase and not to impede the harvest of knowledge.” 

Traditional copyright policymaking also frequently excludes or overlooks the world of open licensing. But in this new space, many of the tools come from the open source community, and much of the data comes from openly-licensed sources like Wikipedia or Flickr Commons. Industry groups that claim to represent the voice of authors typically do not represent such creators, and their proposed solutions–usually, demands that payment be made to corporate publishers or to collective rights management organizations–often don’t benefit, and are inconsistent with, the thinking of the open world

Moreover, even aside from openly licensed material, there are vast troves of technically copyrighted but not actively rights-managed content on the open web; these are also used to train AI models. Millions, if not billions, of individuals have contributed to these data sources, and because none of them are required to register their work for copyright to arise, it does not seem possible or sensible to try to identify all of the relevant copyright owners–let alone negotiate with each of them–before development can continue. Recognizing these and a variety of other concerns, the European Union has already codified copyright exceptions which permit the use of copyright-protected material as training data for generative AI models, subject to an opt-out in commercial situations and potential new transparency obligations

To be sure, there are legitimate concerns over how generative AI could impact creative workers and cause other kinds of harm. But it is important for copyright policymakers to recognize that artificial intelligence technology has the potential to promote the progress of science and the useful arts on a tremendous scale. It is both sensible and lawful as a matter of US copyright law to let the robots read. Let’s make sure that the process described by Professor Litman does not get in the way of building AI tools that work for everyone.

National Library Week 2023: Brenton, user experience

To celebrate National Library Week 2023, we are introducing readers to four staff members who work behind the scenes at the Internet Archive, helping connect patrons with our collections, services and programs.

Brenton Cheng learned to program in BASIC on an Apple II Plus at age 9. His mother was one of the earliest computer programmers and his dad was a marketing consultant for technology products in Portola Valley, California. By age 12, Cheng had written a series of animated games that he put together in a hand-assembled software package. It sold about four copies.

Now, Cheng is a senior engineer at the Internet Archive, where he leads the user experience (UX) team. “Our goal is to give our patrons a great experience on the Archive.org website while making sure that under the hood, our technologies are as simple, robust and maintainable as possible,” said Cheng, who has been at the organization for seven years.

Despite his early computer exposure, Cheng wanted to study something more tangible in college. He pursued mechanical engineering and earned a bachelor’s degree from Princeton University and a master’s from Stanford University. Along the way, he developed a love of contemporary dance and improvisation. Inspired by the creativity of movement, he veered toward biomechanical engineering in graduate school. 

Entering the job market, Cheng said he wanted a flexible schedule so he would be able to take workshops and occasionally go on tour with dance companies. He was a freelance computer programmer for about a decade, then worked at Astrology.com and NBCUniversal for another 10 years. 

In 2016, Cheng said he was drawn to the Internet Archive by its mission, reputation and people. “Being in the dance world, I was constantly surrounded with all kinds of eclectic, eccentric, fascinating, brilliant people,” he said. “There were certain common elements in the way the Archive embraces and benefits from diversity. I found many artists and engineers working in novel ways. That felt very much at home.”

From his experience working with improvisation in dance, Cheng said he loves trying to create the conditions within which people contribute their best work and feel good about what they’re doing. His team is focused on fighting for users and constantly making the website better for the public. “I also serve the digital librarians who are collecting and providing content for our patrons,” Cheng said. “I am giving them the tools, platform and environment to do their magic.” 

Tell us something about your role at the Internet Archive that most people wouldn’t know about.
Simultaneously with supporting the Archive’s mission and helping our patrons, I am always holding in the back of my mind the subtext of a “small team, long term.” These ideas guide choices around process, technologies and architecture. We regularly discard choices that would entail too much complexity or require too much on-going, hands-on maintenance. And we try to resist rushing features out the door that will only add to our technical debt later.

What is the most interesting project you’ve worked on at the Internet Archive?
I set up a wiki to allow scholars to submit transcriptions of scanned Balinese palm leaves.

What has been your greatest achievement (so far) at the Internet Archive?
Creating a team that likes working together, is resilient through conflicts and pushes each other to keep getting better.

What are you reading?
The Sense of Style by Steven Pinker. It’s a contemporary writing style manual that incorporates cognitive science and linguistics and acknowledges the evolving nature of language.

National Library Week 2023: Caitlin, events

To celebrate National Library Week 2023, we are introducing readers to four staff members who work behind the scenes at the Internet Archive, helping connect patrons with our collections, services and programs.

If there’s an event at the Internet Archive, there’s a good chance Caitlin Olson had her hand in it. And with about 80 events last year, including 40 in-person, that keeps her plenty busy. 

“I’m a helper by nature and my role involves wearing a lot of hats,” said Olson, senior executive assistant for seven years. 

While not a librarian by training, Olson said she enjoys supporting librarians and their work. Olson provides support for webinars online and parties at the Internet Archive’s headquarters in San Francisco. She also assists Internet Archive’s founder Brewster Kahle in his work, helps staff with IT issues (including migrating to remote work during the pandemic), and pinch hits when needed. 

“I’m the go-to person for most questions because if I don’t know the answer, I likely know who will,” Olson said, who prefers working behind-the-scenes and is known as a fixer who keeps a calm head. “Brewster says I help soothe the organization. I often can jump in and solve a problem.”

After graduating from high school in a small town in northern California, Olson said she gravitated to the Bay Area for college, so has both the “country mouse” and “city mouse” experience. After a stint in journalism, she was drawn to the Internet Archive. “I wanted to work for a place where people felt passionate about what they were doing—and I found that here,” Olson said.

What’s an aspect of your job that you especially like?
I work with our ceramicist who creates all of our statues for the Archive. Fun Fact: after you work here for three years, you get a statue made in your likeness (if you want).

What is the most interesting project you’ve worked on at the Internet Archive? 
Our annual Public Domain Day events and the book talks we host in collaboration with The Booksmith

Favorite collection at the Internet Archive?
The Attention K-Mart Shoppers collection 

What are you reading?
Fuzz: When Nature Breaks the Law by Mary Roach, which is about what happens when animals commit crimes, and From Here to Eternity: Traveling the World to Find the Good Death by Caitlin Doughty, which is a book that explores death-care in different cultures and it’s written by a badass mortician. 

Book Talk: The Apple II Age

Join author Laine Nooney for an IN-PERSON reading from their new book, followed by a conversation with historian Finn Brunton.

REGISTER NOW

“The Apple II Age is a joy to read and an extraordinary achievement in computer history. A rigorous thinker and a bright and witty writer, Nooney offers a compelling account of the initial attempts to make computers inviting to the public. The Apple II Age, like the old microcomputer itself, is bound to intrigue both experts and newcomers to the subject.” ―JOANNE MCNEIL, author of ‘Lurking: How a Person Became a User’

Join us for an engrossing origin story for the personal computer—showing how the Apple II’s software helped a machine transcend from hobbyists’ plaything to essential home appliance.

6:00 PM — Reception
6:30 PM — Book Talk: The Apple II Age
7:30 PM — Book Signing 

Please note that this event will be held in person at the Internet Archive.

REGISTER NOW

If you want to understand how Apple Inc. became an industry behemoth, look no further than the 1977 Apple II. It was a versatile piece of hardware, but its most compelling story isn’t found in the feat of its engineering, the personalities of Apple’s founders, or the way it set the stage for the company’s multibillion-dollar future. Instead, historian Laine Nooney shows, what made the Apple II iconic was its software. The story of personal computing in the United States is not about the evolution of hackers—it’s about the rise of everyday users.

Recounting a constellation of software creation stories, Nooney offers a new understanding of how the hobbyists’ microcomputers of the 1970s became the personal computer we know today. From iconic software products like VisiCalc and The Print Shop to historic games like Mystery House and Snooper Troops to long-forgotten disk-cracking utilities, The Apple II Age offers an unprecedented look at the people, the industry, and the money that built the microcomputing milieu—and why so much of it converged around the pioneering Apple II.

Laine Nooney is assistant professor of media and information industries at New York University. Their research has been featured by outlets such as The Atlantic, Motherboard, and NPR. They live in New York City, where their hobbies include motorcycles, tugboats, and Texas hold ’em.

Book Talk: The Apple II Age
May 11 @ 6pm
IN-PERSON @ 300 Funston Ave., San Francisco
Register now for the free, in-person event

National Library Week 2023: Liz, donations

To celebrate National Library Week 2023, we are introducing readers to four staff members who work behind the scenes at the Internet Archive, helping connect patrons with our collections, services and programs.

Liz Rosenberg first worked with the Internet Archive in the early days of the Great 78 Project. She helped design the digitization workflow of 78rpm records and estimates transferring 30,000 sides of records herself.

The self-described “record lady,” Rosenberg said the project was the perfect entrée to the organization. She graduated from Drexel University with a degree in music industry technology, with a specialty in audio recording and production.

In 2020, Rosenberg was officially hired by the Internet Archive in patron services and later asked to lead the organization’s physical donation program. She continues with the Great 78 Project, overseeing monthly uploads, resolving metadata issues and coordinating digitization of donated collections with partners at George Blood LP.

“The Internet Archive is a place that I had always dreamed of working,” Rosenberg said. “I really looked up to the mission of the Internet Archives so when the opportunity came up to work for them directly, I couldn’t have said yes faster.”

As donations manager, Rosenberg receives inquiries from individuals and librarians about donating their physical media to the Internet Archive for preservation and digitization, from single items to collections of millions of objects. She has overseen the donations of small folk music collections, individual collectors’ passion projects, and college libraries including Bowling Green State University and the University of Hawaii. 

The individual collector contributions often are triggered by the death of a loved one. “Those tend to be sensitive situations for families,” she said. “But they are grateful to almost be able to spend time with them through the preservation of their collection and be able to go and visit whenever they want. That’s very special.”

Rosenberg keeps a “warm and fuzzy thank you file” on her computer from donors that she said keeps her motivated to encourage others to share their collections, like the message below:

Dear Liz,

You are amazing! Thank you for your kind guidance and generous ways. Seeing the dedication today has brought a difficult and costly task of storing these books over such a long period of time to this heartfelt moment and for such a worthy cause. I am in the middle of grading portfolios and preparing for a solo art exhibition so, as usual, I need to juggle the books in between. I will be in touch soon but, again, I just wanted to let you know how wonderful you and your organization are 🙂

in kindest regard, Karen

What is the most rewarding part of your job?
For me, it’s really about preserving stories. I feel such a connection to donors that I work with when I get to hear the story of how a collection was created. We want to preserve those stories alongside the media itself. And that’s really such a privilege.

What has been your greatest achievement (so far) at the Internet Archive?
Presenting on behalf of the Internet Archive at the 2022 Association for Recorded Sound Collections Conference. A recording of the presentation, as given to the Internet Archive staff shortly after the conference, can be found on the Internet Archive here.

What’s your favorite item at the Internet Archive?
This transcription recording of a child playing accordion: https://archive.org/details/78_four-leaf-clover_sonny-walikis-and-his-squeeze-box_gbia0001730a. We transferred this record without knowing who the performer was or anything about their history. The family of Sonny Walikis actually found the recording in our collection shortly after their family member had passed away and reached out to tell us the history of the recordings. I always think of this record as the best example of why we preserve media – to connect people to lost stories and help memories live on.

What’s your favorite collection at the Internet Archive?
The 78rpm record collection! archive.org/details/georgeblood

What are you reading?
The Tower of Swallows by Andrzej Sapkowski

What is your secret talent?
Morphing into a children’s choir! I was a recording studio intern and we had children booked to sing the part but they got too distracted in the booth. So I sang all of the parts slowed down 10% and we sped them up to make me sound “child-like”. The results are one of my only vocal credits: https://www.youtube.com/watch?v=WlKhVhuTiik.

AI Audio Challenge: Audio Restoration of 78rpm Records based on Expert Examples

http://great78.archive.org/

Hopefully we have a dataset primed for AI researchers to do something really useful, and fun– how to take noise out of digitized 78rpm records.

The Internet Archive has 1,600 examples of quality human restorations of 78rpm records where the best tools were used to ‘lightly restore’ the audio files. This takes away scratchy surface noise while trying not to impair the music or speech. In the items are files in those items are the unrestored originals that were used.

But then the Internet Archive has over 400,000 unrestored files that are quite scratchy and difficult to listen to.

The goal is, or rather the hope is, that a program that can take all or many of the 400,000 unrestored records and make them much better. How hard this is is unknown, but hopefully it is a fun project to work on.

Many of the recordings are great and worth the effort. Please comment on this post if you are interested in diving in.