Category Archives: Software Archive

archive.org download counts of collections of items updates and fixes

Every month, we look over the total download counts for all public items at archive.org.  We sum item counts into their collections.  At year end 2014, we found various source reliability issues, as well as overcounting for “top collections” and many other issues.

archive.org public items tracked over time

archive.org public items tracked over time

To address the problems we did:

  • Rebuilt a new system to use our database (DB) for item download counts, instead of our less reliable (and more prone to “drift”) SOLR search engine (SE).
  • Changed monthly saved data from JSON and PHP serialized flatfiles to new DB table — much easier to use now!
  • Fixed overcounting issues for collections: texts, audio, etree, movies
  • Fixed various overcounting issues related to not unique-ing <collection> and <contributor> tags (more below)
  • Fixes to character encoding issues on <contributor> tags

Bonus points!

  • We now track *all collections*.  Previously, we only tracked items tagged:
    • <mediatype> texts
    • <mediatype> etree
    • <mediatype> audio
    • <mediatype> movies
  • For items we are tracking <contributor> tags (texts items), we now have a “Contributor page” that shows a table of historical data.
  • Graphs are now “responsive” (scale in width based on browser/mobile width)

 

The Overcount Issue for top collection/mediatypes

  • In the below graph, mediatypes and collections are shown horizontally, with a sample “collection hierarchy” today.
  • For each collection/mediatype, we show 1 example item, A B C and D, with a downloads/streams/views count next to it parenthetically.   So these are four items, spanning four collections, that happen to be in a collection hierarchy (a single item can belong to multiple collections at archive.org)
  • The Old Way had a critical flaw — it summed all sub-collection counts — when really it should have just summed all *direct child* sub-collection counts (or gone with our New Way instead)

overcount

So we now treat <mediatype> tags like <collection> tags, in terms of counting, and unique all <collection> tags to avoid items w/ minor nonideal data tags and another kind of overcounting.

 

… and one more update from Feb/1:

We graph the “difference” between absolute downloads counts for the current month minus the prior month, for each month we have data for.  This gives us graphs that show downloads/month over time.  However, values can easily go *negative* with various scenarios (which is *wickedly* confusing to our poor users!)

Here’s that situation:

A collection has a really *hot* item one month, racking up downloads in a given collection.  The next month, a DMCA takedown or otherwise removes the item from being available (and thus counted in the future).  The downloads for that collection can plummet the next month’s run when the counts are summed over public items for that collection again.  So that collection would have a negative (net) downloads count change for this next month!

Here’s our fix:

Use the current month’s collection “item membership” list for current month *and* prior month.  Sum counts for all those items for both months, and make the graphed difference be that difference.  In just about every situation that remains, graphed monthly download counts will be monotonic (nonnegative and increasing or zero).

 

 

Mirroring the Stone Oakvalley Music Collection

soamc_logo

The Internet Archive has begun mirroring a fantastic collection of music called the “Stone Oakvalley Music Collection”. When you visit one of their websites, the archive.org mirror is one of the choices for download. Going forward, the Archive will offer a full backup of the entire site (over a terabyte) for permanent storage.

Why the Stone Oakvalley Collection is important

Manufactured from the early 1980s to the mid 1990s, the Commodore 64 computer was a revolutionary piece of hardware and a critical introduction to programming for generations. It also had, within its design, a very well-regarded sound chip: the 6581/8580 SID (Sound Interface Device), whose unique properties in wave generation and effects gave a special sound in the hands of the right developers and musicians.

MOS_Technologies_6581

 

This successful piece of hardware was manufactured in the millions across the life of the C64, and in the late 1980s, the introduction of the Commodore Amiga computer brought to life an improved chipset for generating sound; the 8364, or PAULA. With a range of improvements to what sounds and music could come out of this chip, the Amiga soared with capabilities that took years to match in other machines.

paula8364The Archive hosts many examples of music generated by these chips: our C64 Games Archive has videos in the hundreds of games played on a Commodore 64, and searching for terms like “Amiga Music”, “Chiptunes” and “C64 Music” will yield a good amount of sound to enjoy.

But nothing comes close to the Stone Oakvalley Collection in terms of breadth, dedication, and craft in ensuring the unique sound of these chips can be enjoyed in the future.

setup01

The process, which is documented here, involved setting up a large amount of Commodore hardware connected to servers which would reboot the machines, over and over, playing thousands of pieces of music in different configurations, and automatically cataloging and saving the resulting waveforms. Considerations for modifications of the chipset over the years, of stereo versus mono recordings, and verification of the resulting 400,000 files have provided the highest quality of snapshots of this period.

Browsing the Collection

Currently, there are two websites for Stone Oakvalley’s collection – one based around the C64, and the other based around the Amiga.  Impeccable work has been done to catalog the music, so if there are songs or games you remember, they are likely to be saved on the site (and powered from Archive.org’s servers). Otherwise, browse the stacks of the sites and enjoy a soundscape of computer history.

The Internet Archive strives to provide universal access to the world’s knowledge. Through mirroring, hosting and gathering of data, our mission allows millions to gain ad-free, fast access to information and materials. Be sure to check our many collections on our main site.

Inviting the Internet Over to Play

B062ql8CMAErlkJ

At our Annual Event last week, the Archive announced a variety of new projects and plans, including our new beta interface, our compact book scanner, and our progress in tracking political ads on television. The event (full video is here) went very well, with lots of activities and social gathering before and afterwards, and included the first public unveiling of our newest project, the Internet Arcade.

Photo by Kyle Way

Photo by Kyle Way

It was obvious we were on to something – the smallish room with the two stations set up to play emulated arcade games from the collection was constantly packed. Players young and old tried out classic video games, including parents showing their children games they’d played in their own teenage years. All of it was running off the Archive’s own web pages through standard web browsers, with no special plug-ins – and it held up well. We even tracked high scores.

B1F2VfECYAAHKr3

The party, of course, was just the beginning – over the weekend, we quietly announced that the Internet Arcade was available through the main site. With over 900 arcade machines in the collection, most every major machine released between 1976 and 1988 was included. (The emulation system we use, JSMESS, is a Javascript port of a long-running emulation project called MESS/MAME, which has had hundreds of contributors over the years – we salute them.)

After an initial tweet or two, the Arcade’s existence went from a mention by Waxy and Laughing Squid, to sites like Hacker News and Mashable, and from there it hit larger and larger audiences. Within a few hours news had spread to a whole range of sites, including Joystiq, The Verge, Engadget, CNN, PC World, Gizmodo, Ars Technica… and, well, let’s just say a very large amount of sites were reporting on this story.

And that’s when the world showed up.

We’re still counting, but we know hundreds of thousands of people came, many of them all at once, to play.

And as these thousands of curious visitors and first-time callers came to the Archive to try out our collection, minor inefficiencies became showstoppers and the site was temporarily crushed. Our brave administration team persevered, repairs were made, and the site settled in for the new reality:

That's a lot of new visitors!

Everything’s fine and normal… then we crash and fix things… and WOW that’s a lot of new visitors!

This crush of new visitors are coming to the Internet Archive, possibly for the first time ever, and we welcome them with open arms. After all, that’s what we were founded for –  our stated purpose is to function as the Internet’s Library, with stored websites, digitized texts, music, movies and software.  It’s our mission as a non-profit library: make as much of culture and information available to as many people as possible. You can lose a workday or a whole winter in our virtual stacks, and our users often do.

Meanwhile, the story continues to have legs, appearing in newspapers, on radio shows, video podcasts, and message boards around the world.

And then we made it to TV news:

B1hYwH-CQAAb5_f

So now that we have (apparently) the world’s attention… ahem ahem..

Even we don’t know where this story is going to lead. But one thing is sure – video games and software are as important a part of history and culture as books, movies and music have been in the past.  And we’re dedicated to bringing all of this to you, the Internet. Sure, it can be a bit surprising when the entire internet comes over to play, but we wouldn’t have put out the welcome mat if we didn’t want you to visit.

As a non-profit, we depend heavily on user donations to stay afloat – we even take Bitcoin and subscriptions. Keeping 20 petabytes of information flowing, fast and free, is what we’re working on day and night and the positive messages and feedback we’ve gotten this past week (and over the years) tell us we’re doing the right thing.

The JSMESS emulation project is one of many open-source projects the Internet Archive is involved with, and while a lot of it is fun and games we’ve got a serious side too, gathering up disappearing web resources and important historical events into our archives to preserve for next generations. We hope that after you relive your childhood or live out a second new one, you’ll stick around and see what else we have here. It’s quite a place.

Game on!

 

 

 

 

 

Free the Screenshots!

As the Archive moves more widely into the archiving of software, it quickly becomes apparent that there’s going to be an awful lot of programs online without much indication of what they are. With many thousands of programs or program collections to choose from, determining what might be inside becomes a pretty involved task.

In the case of movies, images and texts, there are previews that help show what is contained in the files in a given item. These are extremely helpful, as they not only show the quality or style of the works, but give all sorts of information that might not be reflected in the metadata.

Starting now, the same will be true for many types of software.

screenshot_01

The Atari 800 graphical masterpiece Astro Chase.

Using a combination of the JSMESS emulator and screen capturing software, the Archive has begun automatic “playing out” of sets of programs, snagging shots of what the software does, and then providing it as a guidepost of what is to come with that program.

For example, work has just been completed on the playable Sega Genesis Library,  where the directory view of the items in the collection show helpful screenshots, and individual games show animated playthroughs of the beginning of the cartridge.

00_coverscreenshotThe process is still evolving – currently it requires real-time capture (that is, capturing the first five minutes of a program takes an actual five minutes), but with multiple machines moving through collections, screenshots will be available for huge amounts of programs in coming weeks and months.

Along with the obvious graphical prettiness comes an even greater cultural benefit: the freeing of screenshots.

As these shots have often been done manually or have been gathered by hand, there has risen a tendency to put watermarks or credits with the images to indicate who did the work. While it’s an understandable urge to want some kudos for the effort, it meant that the very work being lauded (the graphics of the program) was being vandalized to ensure credit where credit was due.

None of the screenshots we are generating will have watermarks, and can be used freely for other purposes as you see fit.

To celebrate this, we’ve created a compilation of all the Sega Genesis screenshots generated by the project so far. The compilation is here. Be warned – it’s 4.3 gigabytes of 16,900 screenshots of 573 cartridges! (There’s a way to browse it at this link.)

Many screenshots are simply informative, but many more are truly works of art, as artists and programmers strained the edges of these underpowered machines to create the most evocative images possible. With this screenshotting effort underway, that work will hopefully get a new life and respect on the web.

Free the Screenshots!

screenshot_36

 

 

The Internet Archive Declares Spacewar!

spacewarLike everything else in history, debate rages about when the “first” video game came into being.  Games and demonstrations such as “Tennis for Two” (1958), “NIM” (1951) and “Mouse in the Maze” (1959), played on million dollar equipment for the amusement and experimentation of limited audiences.

One contender in this group is “Space War!”, a 1962 collaboration of multiple students at the Massachusetts Institute of Technology. Playing off the cathode-ray tube of a Digital Equipment PDP-1 (of which less than 60 were sold), this two-player space-battle game has been lauded as a major advancement in computer gaming for over 50 years.

Now, it’s possible to play it at the Internet Archive.

As part of our larger Historical Software collection, there is now an entry for Space War!

sw1This entry covers the historical context of Space War!, and instructions for working with our in-browser emulator. The system doesn’t require installed plugins (although a more powerful machine and recent browser version is suggested).

The JSMESS emulator (a conversion of the larger MESS project) also contains a real-time portrayal of the lights and switches of a Digital PDP-1, as well as links to documentation and manuals for this $800,000 (2014 dollars) minicomputer.

sw2You’re going to need a friend to play – the game requires two human players on the same keyboard. And don’t worry, everyone gets sucked into the star in the center the first few times. You’ve got to have your orbital dynamics down before you’re truly ready to be a space warrior.

With over a half-century of history behind it, Space War! still holds up as a great example of what would become a dominant form of media in the decades since – the space video game.

The Internet Archive continues to add more historical software frequently – bringing the computing past to the computer future. Stay tuned!

Magazine in movie “WarGames” is discovered using an Internet Archive Collection

 

 

01_Title

An intrepid researcher wanted to figure out what magazine was used in movie WarGames and using the Internet Archive collection found it was Creative Computing.  (which was a key magazine for me in the 70’s when I sold personal computers during the pre-Apple ][, kit days).

Reading the gory details of this hunt is fun.  http://mw.rat.bz/wgmag/

 

Microcomputer Software Lives Again, This Time in Your Browser

The miracle is now so commonplace that it’s invisible: we have the ability to watch video, listen to music, and read documents right in our browsers. You might get a hankering to hear some old time radio, or classic television programs, or maybe read up some classic children’s books, you’re just a couple clicks away from having them right there, in front of you. Not so with classic software. To learn and experience older programs, you have to track down the hardware and media to run it, or download and install emulators and acquire/install cartridge or floppy images as you boot up the separate emulator program, outside of the browser. Unlike films or video or audio, it was a slower, more involved process to experience software.

Until now.

logo

JSMESS is a Javascript port of the MESS emulator, a mature and breathtakingly flexible computer and console emulator that has been in development for over a decade and a half by hundreds of volunteers. The MESS emulator runs in a large variety of platforms, but is now able to run embedded in most modern browsers, including Firefox, Chrome, Safari and Internet Explorer.

Today, the Internet Archive announces the Historical Software Archive, a collection of prominent and historically notable pieces of software, able to be run immediately in your browser.  They range from pioneering applications to obscure forgotten utilities, and from peak-of-perfection designs to industry-crashing classics.

Lemonade_Stand_1979_Apple_itemimage

Turning computer history into a one-click experience bridges the gap between understanding these older programs and making them available in a universal fashion. Acquisition, for a library, is not enough – accessibility is where knowledge and lives change for the better. The JSMESS interface lets users get to the software in the quickest way possible.

We asked a number of people to look at the Historical Software section, and here were their comments:

“Bringing microcomputer software back from floppy drives and cassette tapes is an important task not just for nostalgia but so we can learn from the good work of tens of thousands of people in our not-so-distant past.   The Internet Archive’s first steps towards bringing it up in a web browser is very encouraging and we at DigiBarn look forward to working with the Archive to bring the best of that era back again.”

– Dr. Bruce Damer, Curator, DigiBarn Computer Museum

“We have come a long way in digital and software preservation – far enough along that problems of discovery and access are looming on the horizon.  It’s comforting to know that the Internet Archive is developing solutions for these problems, so that people can use the software we save.”

– Henry Lowood, Curator for History of Science & Technology Collections, Stanford University Libraries

The Internet Archive has given us a remarkable opportunity to make the past present once again through its in-browser emulation. Now enthusiasts, students, scholars, historians from all corners of the globe can quickly and easily access software that would normally require fairly sophisticated technological expertise. I expect we will soon recognize this as a crucial development in digital preservation and access.”

– Lori Emerson, Media Archaeology Lab at the University of Colorado

“Emulation in a browser means embedding digital history in the everyday experience of surfing the Web. Not as screenshots or scans, but as living history, dynamic and interactive, inviting and even seductive. I look forward to weird wormholes and portals into our past appearing everywhere.”

– Matt Kirschenbaum, Associate Director, Maryland Institute for Technology in the Humanities (MITH)

“The team at the Internet Archive have managed not just to preserve some of the most memorable bits and bytes of the last 3 decades of personal computing, they have given us all a way to execute them in a browser.  The past is now  playable at a stable URL.”

– Doug Reside, Digital Curator for the Performing Arts, NYPL
“The Internet Archive is one of the most interesting and important new repositories for historians, curators and anyone interested in the preservation of recent culture.  The emulator is an exceptional new tool that will make possible all kinds of investigations that heretofore were limited to specialists.  It is a wonderful achievement.”
– Deborah Douglas, Director of Collections, MIT Museum

Many, many individuals have contributed to the JSMESS project. The project makes extensive use of the Emscripten compiler project, headed by Alon Zakai at Mozilla.org. JSMESS is a non-affiliated port of the MESS emulator. MESS is the result of years of effort by hundreds of contributors, a number of them anonymous, who have continued to work daily to provide the most accurate emulation of historical machinery. JSMESS and MESS are not affiliated projects. The JSMESS team includes Justin de Vesine, John Vilk, Andre D, Justin Kerk, Vitorio Miliano, and Jason Scott; countless others have contributed documentation, testing and feedback about the functioning of the project. Integration with the Internet Archive’s internals are the result of efforts by Alex Buie, Hank Bromley, Samuel Stoller and Tracey Jaquith. 

Update: The introduction of the Historical Software Collection and JSMESS has been covered in The Register, Engadget, PC World, Slashgear, and The Verge (twice!)

Archive of Historical Computer Software is here

Thanks to Jason Scott, lots of deep collecting communities, and volunteers, Jason is announcing that the Internet Archive now hosts some very large software and computer documentation collections, maybe the largest overall host.

Yippie!

Now we all have to make it larger, more findable, and re-usable– please help, please donate money, time, anything– this is our history, lets write it well.

 

Hard Drive Archaeology – And Hackerspaces

Two different, but somewhat related additions to the archive you might want to check out.

First, I was contacted earlier this week about a project to recover information off of an old Cray-1 supercomputer hard drive. Unlike, say, trying to get your old floppies to read or pulling an old mix tape off of a cassette, with something as old as a Cray-1 (a computer once called the “World’s Most Expensive Love Seat“), you don’t even have a place to really plug it in: functioning Cray-1 machines are rare as you can get, and even if you were to get the hard drives spinning up and read off of – where would you get the data off the Cray?

Researcher Chris Fenton has a thing about Cray supercomputers – he built a tiny homebrew version of one that used emulation to allow you to experience some aspect of Crays, from his desktop. So when he found himself with a 80 Megabyte CDC 9877 disk pack, which was quite a lot for the early 1970s, it wasn’t just a matter of hooking it up to USB. (Actually, we have a brochure for the behemoth you would put this disk pack into to read it.)  Here’s what a nearly-the-same CDC 9987 looks like:

Ultimately, Fenton got the information off of the disk pack using a whole variety of techniques and experiments, as part of a research project this summer. He wrote a paper about the process, entitled Digital Archeology with Drive-Independent Data Recovery: Now, With More Drive Dependence!” and it’s now mirrored here at the archive. If nothing else, be sure to browse through the paper just to see the customized stepper motor and reader he build to pull the magnetic data off the platters. And I was kind of understating things… ultimately he did hook it up to USB.

From this careful, forensic-quality magnetic scan of the drive, Fenton has produced a large image of the disk, one far larger than the data on it but allowing further experimentation and reading from the image without having to build a robot in your basement. And now, we’re offering this image on the archive. Remember, you won’t be able to pull this data down and go back to the 1970s, instantly – you should be reading up documentation of disk formats, learn about how pull information off of magnetic flux recording, and a whole other host of material and knowledge…. but hey, weekends are for having fun, right?

Even ten years ago, the idea of offering several gigabytes of something (that expands out to about 20 gigabytes of something) online was beyond crazy – that we’ve come so far in offering this much to so many people speaks how much the world has changed since the era of this disk pack.

Fenton is associated with the NYC-based hackerspace, NYC Resistor and it was their mailing list that got in contact with me to get this disk image up to the archive.

Coincidentally, this was also the week that two NYC Resistor members released a book, for free, which you might really enjoy. Bre Pettis and Astera Schneeweisz hatched a plan to make a book on hackerspaces at the end of 2008. They wanted to put it together in less than two weeks, and as people submitted photos, essays and other material, the project increased in size, more folks were brought in, and this month the end result was released for free.

Entitled “Hackerspaces: The Beginning”, this photo-filled book is available at the archive to read online or download. A worldwide view of hackerspaces throughout the world as of 2008, it also includes memories of spaces past and dreams of spaces future. It’s an excellent snapshot of a beautiful, technological world well worth browsing this weekend (and weekends to come).

So if you’re in the mood for advanced research or just to check out some great photos, the archive’s got something for you!

Time for Wine

The Internet Archive is located in San Francisco, and so to celebrate our proximity to some of the most renowned wine regions in the world, we bring you this entry about our favorite ferment. This entry contains aromas of cedar, apple and gunsmoke …

virtualwine.jpgVirtual Wine video blogs
Ben Llewelyn and James Booth started Virtual Wine as a way for people to learn and chat online about wine and wine culture. The Internet Archive holds many of their short video blog tutorials and discussions, including Wine Storage, Buying a Wine Glass, Ordering and Returning Restaurant Wine, Matching Food and Wine, How Long to Keep Opened Wine, and How to Decant Old Wine.

Tasting Vlog from The Wine Vibe, featuring four wines (Verdejo, Viognier, Tempranillo and Zinfandel) in the $10 price range that were made from organically-grown grapes. Featured wineries are Casamaro Winery from the Rueda region of Spain, and Cline Winery from Sonoma. *NOTE* The actual tasting begins two minutes into the clip.

Wine Country Live! Episode – Stupid Wine Questions
Why can’t you make red wine from white grapes? Why do wineries grow rose bushes near the road? What exactly do corks DO for wine? This episode of Wine Country Live! is devoted to people’s stupid questions that they’ve been afraid to ask. Wine Country Live! is produced in Sonoma County, hosted by Michael DeLoach, and featuring Daryl Roberts, publisher of WineX Magazine and Robinson Olmstead with current wine-related news.
*NOTE* To play the Real Media file, right click on the link (hold down option + click for Macs) and click on “Copy Link Location,” then open Real Player, go to File, Open Location and paste in the link.

vintagewinej.jpgWine and the wine trade (1921)
Andre L. Simon wrote this book in 1921, dedicated to the history and making of wine. Includes 21 vintage photos of wine ephemera like corking machines, bottle testing and various vineyards and varietals.

winemasterj.jpgThe WineMaster freeware helps you keep track of your favorite wines and make tasting notes on your handheld device.

For fun:

Prison Wine! This vlogger teaches us how to make wine with a plastic bag, moldy bread stuffed into a sock (to replace yeast, which is highly contraband in prison), fruit juice, sugar and raisins. *NOTE* The Internet Archive advocates this as entertainment only – try at home only at your own risk …

Written by: Stephanie Sapienza