A developer came to me a week ago with a project they’d been working on for over a year. The proposition of what they offered and the importance of what it would mean to historical software at Internet Archive was so compelling that within 48 hours, we’d announced it to the world.
More than a fascinating site, though, it represents some philosophies regarding the Archive’s stacks that are worth exploring as well.
The first thing that strikes a visitor to the site is either how strange, or how nostalgic it looks. The site is strikingly simple and references the first few years of the world wide web, when backgrounds were grey by default, and the width of the screen was almost always under 640 pixels. Same with the link colors, and use of (to the modern era) small icons next to the words and links. This is a version of the world wide web long gone.
However, underneath this simple exterior beats the heart of a powerful search engine and an astounding amount of processing that has analyzed millions of files to make them easy to interact with. If your area of research or interest is vintage/historical software, we’ve all been handed a top-class tool to discover long-lost files and bring them back instantly.
A Quick Reminder about CD-ROMs
From (very roughly) 1989 through to the early 2000s, CD-ROMs (and later DVD-ROMs) were one of the primary ways to transfer heaps of software or large-sized programs to end users. Instead of spending hours or literal days transferring software you may or may not have wanted after you received it, you could go to stores or on-line and purchase a plastic disc that contained between 600-700 megabytes of information on it.
The potential of this, in fact, was so strong, that there was an entire industry of providing databases, news summaries, and even all-digital magazines using this format. Booklets of CD-ROMs became resplendent, and libraries could allow patrons to check out these discs to do research with them.
Besides these more institutional compilations, an industry rose up of companies compiling software, artwork, music and more and selling them to end users. Companies with names like Walnut Creek, Wayzata, Valusoft, and Imagemagic would have catalogs of CD-ROMs to buy. Starting out with software from bulletin board systems and gathered from FTP sites, these CD-ROMs quickly ran out of easy-to-find material to fill, and an era of “shovelware” began, allowing these products to claim “thousands of files, gigabytes of materials” while pulling from more and more out-of-date sources.
As websites, torrents and other means of transport brought the era of physical media for software to a close, the world was left with a finite, contained pile of titles that had come out on CDs. And, as luck would have it, people have been uploading those out of date files to the Internet Archive for years.
The Final Piece
Therefore, sitting on the Archive, are tens of thousands of these CD-ROMs of the past. And for a very long time, it’s been possible to download a Disc image, analyze its contents, search for useful or potentially interesting items, and then find a way to make them work again.
That last piece, in fact, is the hardest – not just knowing where the files you’re looking for are located, but to be able to browse them without a massive host of helper applications scattered to the four winds. There are dozens of archive types, dozens and maybe hundreds of multimedia formats, and, even more frustrating, archives within archives – making everything that much harder to find.
DiscMaster has fixed this.
Within the search engine is the ability to find millions of files, categorized by type or size or date or extension, and then be presented them instantly. Three decades of computer software with layers upon layers of obfuscation are brought immediately to the top.
The developer wrote applications to grind through the contents of a CD-ROM and present them with previews that wouldn’t require anything but a browser to see. This can take hours to pull out of a single CD-ROM, but the results are breathtaking.
Audio and music files play in the browser. Flash, IFF, Bitmaps, Fonts and more display in preview. Macintosh, PC, Commodore, Atari and more are presented simply, without a mandate to track down the proper utility to figure out what they are.
In other words, vintage and historical software is back from the obfuscated darkness.
In the short time that Discmaster has been online, success stories are appearing. Authors are finding shareware programs they lost track of decades ago. Original versions of software that were thought impossible to track down just pop up in the search engine. And organizations dedicated to creating catalogs of now-dormant formats are suddenly handed a thousands-of-items to-do list on a silver platter.
The Philosophy of the Support Site
The ramifications and discoveries from Discmaster are going to be coming for a very long time – even if a researcher has a light memory of something they’re looking for, the search results will guide them in the right direction faster than ever before.
But beyond that, this site shows a different approach to the Internet Archive’s materials that’s worth seeing more of.
With over 100 petabytes of data, representing a mass of materials with all sorts of containers, metadata, and approaches by contributors, the Internet Archive has to be as general as possible. This generality extends to the presentation, search engine, and storage of the items.
It is a major effort to ensure the data stays secure, the metadata is searchable, and the ability to upload nearly anything results in a usable item details page.
But that’s kind of where it has to stop.
It’s asking an awful lot to both maintain an entity like this, and also design, say, a specifically-geared site for a relatively smaller set of people and needs. It can be done, but when energy and funding are limited, it’s sometimes best to stick to basics.
Perhaps seeing Discmaster in action will encourage others to interact with the Internet Archive as a pool, a container of resources that could receive some of the powerful analysis along specific lines. If they can then be fed back to the Archive at the end, even better; but let a hundred supporting sites bloom.
Meanwhile, enjoy the history of software – it just got a lot easier to find.
A Small Addendum Regarding Emulation
After this announcement came out, a not-insignificant amount of people have come forward to ask some form of:
You’re the Emulation In The Browser People – will DISCMASTER allow you to emulate the programs that are found in these floppies and CD-ROMS?
The short answer is no, there are no current plans to do emulated previews.
The longer answer is that the wonderful emulation in the browser that the Internet Archive has covers over the amount of work that needs to be done in selecting, refining, and in some cases modifying original programs to make them work. If a program requires all of Windows 3.1 installed, for example, someone went through the process of determining that, configuring the item to know to load Windows 3.1, and then added custom settings in the item to ensure it would all boot up correctly. Often this work can be automated to a degree, but the time involved is considerable.
Multiply these issues by the dozens of platforms that are emulated, and you can see why it would be more trouble than it would be worth. Additionally, some programs just don’t make sense to be emulated – running a printer utility “in the browser” will probably just show a prompt and nothing else, as it is loaded in the background – many, many programs of the past don’t make sense without additional context.
A much more likely scenario will be DISCMASTER revealing long-lost vintage software that is so interesting and/or fun that it will get uploaded to Internet Archive separately and those configurations done to allow it to be played in the browser.
If you find interesting items along DISCMASTER’s millions, feel free to contact me, Jason Scott, or take a shot at uploading the program yourself and doing the configurations.