Monthly Archives: September 2016

Guest Post: Preserving Digital Music – Why Netlabel Archive Matters

The following entry is by Simon Carless, who worked for the Internet Archive in the early 2000’s before moving on to work in media and conferences, while simultaneously maintaining collections at the Internet Archive and running the for-free game information site Mobygames.

netlabelsIt’s fascinating that the early Internet era (digital) data can sometimes be trickier to preserve & access than pre-Internet (analog) data. A prime example is the amazing work of the Netlabel Archive, which I wanted to both laud and highlight as ‘digital archiving done right’.

Created in 2016 by the amazing Zach Bridier, the Netlabel Archive has preserved the catalogs of 11 early ‘netlabels’ and counting, a number of which involve music that was either completely unavailable online, or difficult to listen to online. One of these netlabels is the one that I ran from 1996 to 2009, Mono/Monotonik. So obviously, I’m particularly delighted by that project. But a number of the other netlabels are also great and previously tricky to access, and I’m even more excited for those. (Reminder: all these netlabels freely distributed their music at the time, which makes it a great thing to archive and bring back.)

The nub of the problem around early netlabels  – particularly from 1996 to 2003 – is due to PCs & the Internet (& pre-Internet BBSes!) just not being fast enough or having enough storage to support MP3 downloads at that time.

So this early netlabel music – on PCs and even other computers like Commodore Amigas – was composed in smaller (in kB!) module files, which was composed and played on computers by using sample data and MIDI-style ‘note triggering’ with rudimentary real-time effects. This allows 5-minute long songs to be just 30kB-300kB in size, versus the 5mB or more that a MP3 takes.

For the more recent history of netlabels, I founded the Netlabels collection at the Internet Archive back in 2003, and that’s grown to hold over 65,000 individual music releases – and hundreds of thousands of tracks – by 2016. But the Internet Archive’s collection was largely designed to hold MP3 and OGG files, and so the early .MODs, .XMs and .ITs were not always preserved as part of this collection – and they were certainly not listenable to in-browser.

Additionally, there were a number of netlabels that used their own storage instead of the Internet Archive’s, even after 2003. But if it disappeared, their data disappeared with it, and music files are generally large enough not to be archived by the saintly Wayback Machine.

So if early netlabel archives exist, it was as ZIP/LHA archives on Scene.org or other relevant demoscene FTP sites. (Netlabels were spawned from the demoscene to some extent, since demo soundtracks use the same format of .MODs and .XMs.) And tracker music is annoyingly hard to play on today’s PCs and Macs – there are programs (such as VLC & more specialist apps) which do it, but it’s not remotely mainstream & not web browser-streamable.

So what Zach has done is keep the original .ZIP/.LHA files, which often had additional ASCII art & release info in them, save the .MODs and .XMs, convert everything to .MP3, painstakingly catalog all of the releases, and then upload the entire caboodle (both original and converted files) to both the Internet Archive and additionally to YouTube, where there are gigantic playlists for each label. So there’s now multiple opportunities for in-browser listening & the original files are also properly preserved.

This means we can now all easily browse and listen to the complete catalog of Five Musicians, a seminal early global PC tracker group/netlabel, as well as the super-neat Finnish electronic music netlabel Milk, the aggressive chiptune/noise label mp3death, and a host of others. And I recently uploaded a rare FTP backup from 1998 which allowed him to put up the 10 releases (that we know about!) from funky electronic netlabel Cutoff. These may have been partially online in databases like Modland, but certainly weren’t this accessible, complete, or well-collected.

What’s somewhat crazy about this is that we’re not even talking about ancient history here – at most, these digital files are 20 years old. And they’re already becoming difficult to access, listen to, or in a few cases even find.

For example, I had to dig deep into backup CD-ROMs to find some of the secret bootleg No’Mo releases that we deliberately _didn’t_ put on the Mono website back in 1996 – opting to distribute them via BBSes instead. These files literally didn’t exist on the Internet any more, despite being small and digital-native.

I think that’s – hopefully – the exception rather than the rule. But without diligent work by Zach (much kudos to him!) & similar work by other citizen digital activists like the 4am Apple II archiver, Jason Scott (obviously!) and a host of others, we’d have issues. And we may need more help still – some of this digital-first materials may disappear permanently, as the CD-ROMs or other media they are on become unreadable.

But we’re still doing a PRETTY good job on preservation, especially with CD-ROMs being ingested in massive amounts onto the Internet Archive regularly. (I’m working with MobyGames & another to-be-announced organization on preserving video game press CD-ROMs on Archive.org, for example, and Jason Scott’s CD-ROM work is many magnitudes larger than mine.)

Yet I actually think contextualization and access to these materials is just as big a problem, if not bigger. Once we’ve got this raw data, who’s available to look through it, pick out the relevant stuff, and make it easily viewable or streamable to anyone who wants to see it? That’s why the game art/screenshots on those press CD-ROMs is also being extracted and uploaded to MobyGames for easy Google Images access, and why Netlabel Archive’s work to put streamable versions of the music on Archive.org and YouTube is so vital. (And why playable-in-browser emulation work is SO very important!)

In the end, you can preserve as much data as you want, but if nobody can find it or understand it, well – it’s not for naught, but it’s also not the reason you went to all the trouble of archiving it in the first place. And the fact the Netlabel Archive does both – the preserving AND the accessibility – makes it a gem worth celebrating. Thanks again for all your work, Zach.

 

Persistent URL Service, purl.org, Now Run by the Internet Archive

purl

OCLC and the Internet Archive today announced the results of a year-long cooperation to ensure the future of purl.org. The organizations have worked together to build a new service hosted by the Internet Archive that will manage the persistent URLs and sub-domain redirections for purl.org, purl.com and purl.net.

Since its introduction by OCLC Research in 1995, purl.org has provided a source of Persistent URLs (PURLs) that redirect users to the correct hosting location for documents, data, and websites as they change over time.

With more than 2,500 users including publishing and metadata organizations such as Dublin Core, purl.org has become important to the smooth functioning of the Web, data on the Web, and the Semantic Web in particular.

Brewster Kahle of the Internet Archive said “We share a common belief with OCLC that what is shared on the Web should be preserved, so it makes perfect sense for us to add this important service to our set of tools and services including the WayBack Machine as part of our mission to promote universal access to all knowledge.”

Lorcan Dempsey of OCLC welcomed the announcement as “a major step in the future sustainability and independence of this key part of the Web and linked data architectures. OCLC is proud to have introduced persistent URLs and purl.org in the early days of the Web and we have continued to host and support it for the last twenty years. We welcome the move of purl.org to the Internet Archive which will help them continue to archive and preserve the World’s knowledge as it evolves.”

All previous PURL definitions have been transferred to Internet Archive and can continue to be maintained by their owners through a new web-based interface located at here.

About OCLC:
OCLC is a nonprofit global library cooperative providing shared technology services, original research and community programs so that libraries can better fuel learning, research and innovation. Through OCLC, member libraries cooperatively produce and maintain WorldCat, the most comprehensive global network of data about library collections and services. Libraries gain efficiencies through OCLC’s WorldShare, a complete set of library management applications and services built on an open, cloud-based platform. It is through collaboration and sharing of the world’s collected knowledge that libraries can help people find answers they need to solve problems. Together as OCLC, member libraries, staff and partners make breakthroughs possible

About Internet Archive:
The Internet Archive (archive.org) is a 501(c)(3) non-profit that was founded to build an Internet library, with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format.

Tales from the TV News Archive presidential debate near real-time livestream

During last night’s presidential debate, the Internet Archive’s TV News Archive experimented with something new: a near real-time live stream of the first presidential debate. This online video stream is editable, embeddable, and shareable on social media. We were the only public library of the debate capturing these clips within minutes, while the candidates were still duking it out. The debate is preserved on the TV News Archive site for posterity. And when the vice presidential candidates, Tom Kaine and Mike Pence, meet for their debate on October 4, the TV News Archive will be making this live stream available to  journalists and the general public.

During the debate, we matched up TV debate video with fact checks from our Political TV Ad Archive partners at FactCheck.org and PolitiFact. Here are some representative tweets and links from last night’s debate:

Minute 15: Hillary Clinton said, “Donald thinks that climate change is a hoax perpetrated by the Chinese.” “I do not say that,” said Trump. “Mostly True,” read the fact check posted by PolitiFact reporters. Jessica Clark, founder of Dot Connector Studio and a consultant to the TV News Archive, was able to link the two here:

Minute 20: Donald Trump said, “I was against the war in Iraq.” FactCheck.org posted this timeline of Trump’s statements about the Iraq war, pointing out that Trump had voiced support for the war in 2002 in an interview with “shock jock” Howard Stern. I tweeted that here:

Minute 36: Donald Trump said, “You learn a lot from financial disclosures” as opposed to tax returns. “False,” posted PolitiFact, “Trump has not released his tax returns, which experts say would offer valuable details on his effective tax rate, the types of taxes he paid, and how much he gave to charity, as well as a more detailed picture of his income-producing assets.” This sort of information is not included on financial disclosure forms. I linked to the fact check in this tweet:

 

Minute 44: Hillary Clinton said: “The gun epidemic is the leading cause of death of young African American men, more than the next nine causes put together.” “True,” posted PolitiFact. Roger Macdonald, TV News Archive director, tweeted the following link to the TV debate clip, along with the fact check.

Overall, fact checking was a crucial part of last night’s debates, as Clark noted:

The near real-time live stream experiment was part of our collaboration around the debates with the Annenberg Public Policy Center, to bring context to the 2016 presidential debates. Stay tuned: today we are drilling down on how TV news is covering the debates. Which video clips are they picking up from the debates in post-debate analyses? We’ll be making that information available to the public, as well as to academic researchers at the Annenberg Public Policy School for integration into their post-debate surveys.

The Internet Archive Turns 20!

HOW TO BUILD AN ARCHIVE--Banner
For 20 years, the Internet Archive has been capturing the Web– that amazing universe of images, audio, text and software that forms our shared digital culture.  Now it’s time to celebrate and we’re throwing a party! Please join us for our 20th Anniversary celebration on Wednesday, October 26th, 2016, from 5-9:30 pm.

Annual Celebration 2014 exterior

Get your free tickets here.

We’ll kick off the evening with cocktails, tacos trucks and hands-on demos of our coolest tools. Come scan a book, play in a virtual reality arcade, or try out the brand new search feature in the Wayback Machine. When you arrive, be sure to get your library card.  “Check out” all the stations on your card and we’ll reward you with a special gift commemorating our 20th anniversary.

Tracy Demo Station 2015

Starting at 7 p.m., we’ve commissioned Paul D. Miller aka DJ Spooky — composer, author and multimedia artist — to create a short musical montage drawn from the Internet Archive’s audio collections. We’ll look back on some of the defining digital moments of the past 20 years, and explore how media and messaging captured in our Political TV Ad Archive is impacting the 2016 Election.

And to keep you dancing into the evening, DJ Phast Phreddie the Boogaloo Omnibus, will be spinning 45rpm records from 8-9:30. We hope you can join our celebration!

Event Info:Gaming Booth 2015                    Wednesday, October 26th
5pm: Cocktails, tacos, and hands-on demos
7pm: Program
8pm: Dessert, Dancing and more Demo stations

Location:  Internet Archive, 300 Funston Avenue, San Francisco

Be sure to reserve your ticket today!

 

Dear Congress: Please Don’t Make It More Difficult And Dangerous To Be A Library

copyrightoffice1Last Friday, the Internet Archive and several of our library, archive, and museum partners sent a letter to House Judiciary Committee Chairman Bob Goodlatte (R-VA) urging him not to make it more difficult and dangerous to be a library.

As we wrote about over the summer, the U.S. Copyright Office is proposing to completely rewrite Section 108, the part of the law that is designed to support traditional library functions such as preservation and inter-library loans. Although the proposal has not been made public yet, we understand from our meeting with them that the Copyright Office wants to redefine who gets to be a library, making it harder for small players and virtual libraries to be protected under the law. The proposal is also likely to be damaging to fair use and may add new, burdensome regulations on libraries who archive the web (among other things).

Thankfully, the Copyright Office does not write the law–that is up to Congress. Our letter explains that now is not the time to scrap the old law, which is working well. The Copyright Office’s proposal is not only unnecessary, but potentially harmful to library efforts to increase access to information. We hope Congress will take the strong objections of the library community seriously when considering the Copyright Office’s proposal to rewrite the law that applies to libraries.

SAVE THE DATE — The Internet Archive Turns 20!

View from last year's annual celebration. Our 20th anniversary is coming up and we’re throwing a party! Please save the date and join us for our annual celebration on Wednesday, October 26th, 2016.

We’ll kick off the evening with cocktails, tacos and hands-on demo stations. Come scan a book, play in a virtual reality arcade, search billions of Web pages in our Wayback Machine and so much more! Then check out the interactive new media projects by talented artists working with our collections.

Starting at 7 p.m., Paul D. Miller aka DJ Spooky — composer, author, teacher, electronics DJ and multi-media artist — will perform a short, original musical retrospective of the Internet Archive’s audio collections. We’ll look back on some of the defining moments of the past 20 years, and explore how media and messaging is impacting the 2016 Election.

And to keep you dancing into the evening, DJ Phast Phreddie the Boogaloo Omnibus, will be spinning 45rpm records from 8-9:30. We hope you can join our celebration!
Event Info:

Wednesday, October 26th
5pm: Cocktails, tacos, and hands-on demos
7pm: Program
8pm: Dessert and Dancing

Location: Internet Archive, 300 Funston Avenue, San Francisco, CA 94118

Rock Against the TPP is Coming to San Francisco…TOMORROW!

tpp
On Friday, September 9th hip hop icons Dead Prez, actress Evangeline Lilly, punk legend Jello Biafra, Grammy winners La Santa Cecilia, and others will play a free concert at the Regency Ballroom in San Francisco to protest the Trans-Pacific Partnership (TPP).

The TPP is a contentious trade agreement that is getting quite a bit of negative press in the 2016 U.S. election cycle. Among many other issues, the TPP would govern how signatory countries protect and enforce intellectual property rights. The TPP could have a large negative impact on libraries by increasing copyright term limits and neglecting the essential limitations on copyright law that libraries around the world rely on. Many different groups have vocally opposed the TPP, both for its substance and for the secrecy of the negotiations process.

tppmorrelloOrganized by Fight for the Future and Rage Against the Machine guitarist Tom Morello, the  tour is designed to pull new audiences into the fight against the TPP. See more details and a full lineup at https://www.rockagainstthetpp.org/san-francisco-ca

The concert will be followed by a teach-in on “How to Fight the TPP” on Saturday, Sept. 10th from 1pm – 3pm at 1999 Bryant Street, hosted by experts from a wide range of organizations opposing the TPP.
tppaudience

Saving the 78s

Written by B. George, the Director of ARChive of Contemporary Music in NYC, and Curator of Sound Collections at the Internet Archive in San Francisco.

While audio CDs whiz by at about 500 revolutions per minute, the earliest flat disks offering music whirled at 78rpm. They were mostly made from shellac, i.e., beetle (the bug, not The Beatles) resin and were the brittle predecessors to the LP (microgroove) era. The format is obsolete, and the surface noise is often unbearable and just picking them up can break your heart as they break apart in your hands. So why does the Internet Archive have more than 200,000 in our physical possession?Music

A little over a year ago New York’s ARChive of Contemporary Music (ARC) partnered with the Internet Archive to focus on preserving and digitizing audio-visual materials. ARC is the largest independent collection of popular music in the world. When we began in 1985 our mandate was microgroove recordings – meaning vinyl – LPs and forty-fives. CDs were pretty much rumors then, and we thought that other major institutions were doing a swell job of collecting earlier formats, mainly 78rpm discs. But donations and major research projects like making scans for The Grammy Museum and The Ertegun Jazz Hall of Fame placed about 12,000 78s in our collection.

For years we had been getting calls offering 78 collections that we were unable to accept. But when space and shipping became available through the Internet Archive, it was now possible to begin preserving 78s. Here’s a short history of how in only a few years ARC and the Internet Archive have created one of the largest collections in America.

Our first major donation came from the Batavia Public Library in Illinois, part of the Barrie H.Thorp Collection of 48,000 78s.

We’re always a tad suspicious of large collections like these. First thought is, “Must be junk.” Secondly, “It’s been cherrypicked.” But the Thorp Collection was screened by former ARC Board member Tom Cvikota, who found the donor, helped negotiate the gift and stored it. That was in 2007. Between then and our 2015 pickup Tom arranged for some of the recordings to be part of an exhibition at the Greengrassi Gallery, London, (UK, Mar-Apr, 2014) by artist Allen Ruppersberg, titled, For Collectors Only (Everyone is a Collector).

What makes the Thorp collection unique is the obsessive typewritten card catalog featured in a short film hosted on the exhibition’s webpage. Understanding why you collect and how you give your interests meaning is a part of Allen’s work – artworks that focus on the collector’s mentality. One nice quote by Allen referenced in Greil Marcus’ book, The History of Rock n’ Roll in Ten Songs is, “In some cases, if you live long enough, you begin to see the endings of things in which you saw the beginnings.”

Philosophical musings aside, there are 48,000 discs to deal with. That meant taking poorly packed boxes — many of them open for 20 years — and re-boxing them for proper storage. The picture below shows an example of how they arrived (on the right), and how they were palletized (on the left.)

PalletizedThe trick to repacking in a timely fashion is to not look at the records. It’s a trick that is never performed successfully. Handling fragile 78s requires grabbing one or just a few at a time. So we’re endlessly reading the labels, sleeving and resleeving, all the time checking for rarities, breakage and dirt.

Now we didn’t do all this work on our own. Working another part of the warehouse was two-and-a-half month old Zinnia Dupler — the youngest volunteer ever to give us a hand. Mom also helped a bit.

mom

A few minutes after the snap I found this gem in the Thorp collection. Coincidence? I don’t think so…burpinthebaby

“Burpin” is a country novelty tune from out of Texas by Austin broadcaster and humorist Richard “Cactus” Pryor (1923 – 2011). It came from a box jam-packed with country and hillbilly discs. This was a pleasant surprise, as we expected the collection to be like most we encounter – big band and bland pop. But here was box-after-box of hillbilly, country, and Western swing records. Now, I use’ta think I knew a bit about music. But with this collection, it was back to school for me. Just so many artists I’ve never heard of or held a record by. As we did a bit of sorting, in the ‘G’s alone there’s Curly Gribbs, Lonnie Glosson and the Georgians. Geeez! Did you know that Hank Snow had a recordin’ kid, Jimmy, and he cut “Rocky Mountain Boogie” on 4 Star records, or that Cass Daley, star of stage and screen, was the ‘Queen of Musical Mayhem?” Me neither.  The Davis Sisters, turns out, included a young Skeeter Davis(!) and not to be confused with the Davis Sister Gospel group, also in this collection. Then there’s them Koen Kobblers, Bill Mooney and his Cactus Twisters, and Ozie Waters and the Colorado Hillbillies. No matter they should be named the Colorado Mountaineers, they’re new to me.

For us this donation is a dream: it allows us to preserve material that was otherwise going to be thrown away; it has a larger cultural value beyond the music; and it contained a mountain of unfamiliar music, much of it quite rare. And most of it is not available online.

It was a second large donation that prompted the Internet Archive to move toward the idea that we should digitize all of our 78s. The Joe Terino Collection came to us through a cold call, the collection professionally appraised at $500,000. The 70,000 plus 78s were stored in a warehouse for more than 40 years, originally deposited by a distributor. Here’s the kicker: they said that we could have it all, but we had to move it – NOW! Internet Archive did and it came in on 72 pallets, in three semis, from Rhode Island to San Francisco, looking like this…JoeTernino

So Fred Patterson and the crackerjack staff out in our Richmond warehouses (Marc Wendt, Mark Graves, Sean Fagan, Lotu Tii, Tracey Gutierrez, Kelly Ransom, and Matthew Soper) pulled everything off the ramshackle pallets and carefully reboxed this valuable material.

boxes

How valuable? Well, we’re really not so sure yet, despite the appraisal, as just receiving and reboxing was such a chore. One hint is this sweet blues 78 that we managed to skim off the top of a pile.

muddywaters

The next step is curating this material, acquiring more collections and moving towards preservation through digitization. Already we have a pilot project in the works with master preservationist George Blood to develop workflow and best digitization practices.

We’re doing all this because there’s just no way to predict if the digital will outlast the physical, so preserving both will ensure the survival of cultural materials for future generations to study and enjoy. And, it’s fun.