Category Archives: Audio Archive

The first Netlabel Day – Join the event

The Internet Archive has a large (over 58,000 items) and growing collection of netlabels. Recently we received a message asking to help announce a new global event, Netlabels Day. Please support it if you are part of the netlabels world.

netlabelsThe Record Store Day was created on 2007 to celebrate the record stores on the USA and the rest of the world. In that celebration, independent bands and labels releases music exclusively for that day on vinyl, seizing the revival of that format. This was the base of the Netlabel Day, a sort of distant relative of RSD, that pretends to install a new tradition releasing digital music every 14 July from now on.

This initiative was born in Chile thanks to Manuel Silva, from M.I.S.T. Records, and it reunites more than 50 labels from all over the world. All genres are present: Rock, pop, electronic, noise, ambient and many many more, free and just for you.

We will upload every single release on Archive.org, because we love this platform. We always use it and we’ve never experimented any issues with it. Every album will be available for free on WAV and FLAC via direct download, or torrent as well.

The most important thing is to include everyone in this idea. We will close the call on June 1, so if you have a netlabel and you want to be part of this, please email us to contact.netlabelday@gmail.com. If you are an independent artist without any label associated, you can release your music with us too and be listened by every participating netlabel, so just contact us from May 15 to June 1.

Everyone is invited. Be part of this madness!

Links:
http://netlabelday.blogspot.com
http://www.facebook.com/netlabelday
http://www.twitter.com/netlabel_day

archive.org download counts of collections of items updates and fixes

Every month, we look over the total download counts for all public items at archive.org.  We sum item counts into their collections.  At year end 2014, we found various source reliability issues, as well as overcounting for “top collections” and many other issues.

archive.org public items tracked over time

archive.org public items tracked over time

To address the problems we did:

  • Rebuilt a new system to use our database (DB) for item download counts, instead of our less reliable (and more prone to “drift”) SOLR search engine (SE).
  • Changed monthly saved data from JSON and PHP serialized flatfiles to new DB table — much easier to use now!
  • Fixed overcounting issues for collections: texts, audio, etree, movies
  • Fixed various overcounting issues related to not unique-ing <collection> and <contributor> tags (more below)
  • Fixes to character encoding issues on <contributor> tags

Bonus points!

  • We now track *all collections*.  Previously, we only tracked items tagged:
    • <mediatype> texts
    • <mediatype> etree
    • <mediatype> audio
    • <mediatype> movies
  • For items we are tracking <contributor> tags (texts items), we now have a “Contributor page” that shows a table of historical data.
  • Graphs are now “responsive” (scale in width based on browser/mobile width)

 

The Overcount Issue for top collection/mediatypes

  • In the below graph, mediatypes and collections are shown horizontally, with a sample “collection hierarchy” today.
  • For each collection/mediatype, we show 1 example item, A B C and D, with a downloads/streams/views count next to it parenthetically.   So these are four items, spanning four collections, that happen to be in a collection hierarchy (a single item can belong to multiple collections at archive.org)
  • The Old Way had a critical flaw — it summed all sub-collection counts — when really it should have just summed all *direct child* sub-collection counts (or gone with our New Way instead)

overcount

So we now treat <mediatype> tags like <collection> tags, in terms of counting, and unique all <collection> tags to avoid items w/ minor nonideal data tags and another kind of overcounting.

 

… and one more update from Feb/1:

We graph the “difference” between absolute downloads counts for the current month minus the prior month, for each month we have data for.  This gives us graphs that show downloads/month over time.  However, values can easily go *negative* with various scenarios (which is *wickedly* confusing to our poor users!)

Here’s that situation:

A collection has a really *hot* item one month, racking up downloads in a given collection.  The next month, a DMCA takedown or otherwise removes the item from being available (and thus counted in the future).  The downloads for that collection can plummet the next month’s run when the counts are summed over public items for that collection again.  So that collection would have a negative (net) downloads count change for this next month!

Here’s our fix:

Use the current month’s collection “item membership” list for current month *and* prior month.  Sum counts for all those items for both months, and make the graphed difference be that difference.  In just about every situation that remains, graphed monthly download counts will be monotonic (nonnegative and increasing or zero).

 

 

Music Analysis Beginnings

As mentioned in our recent Building Music Libraries post, we are working with researchers at Columbia University and UPF in Barcelona to run their code on the music collection to help their research and to provide new analyses that could help with exploration and understanding.

We are doing some pilot runs to generate files which some close observers may see in the music item directories on archive.org.  Audio fingerprints from audfprint are .afpt and music attributes from Essentia are in _esslow.json.gz (download sample) and _esshigh.json.gz.

Spectrogram of a Grateful Dead track

Spectrogram of a Grateful Dead track

We are also creating image files showing the audio spectrum used.  We hope this is useful for those that want to see if files have been compressed in the past (even if they are posted as flac files now).  There is also a .png for each audio file of a basic waveform that is being used in the archive’s beta site as eye candy.

More as it happens, but we wanted you know there is some progress and you will see some new files.  If you have proposed other analyses that would benefit from being run over a large corpus, please let us know by contacting info at archive dot org.

Thank you to the researchers and the Archive programmers who are working together to make this happen.

 

Archive of Contemporary Music and the Internet Archive Team up to Create a Music Library

bobgeorgeWhen the personal record collection of music producer Bob George hit 47,000 discs, he knew something had to be done.  “I wanted to give them away, but they were mostly punk, reggae and hip-hop,” he recalled, “and no established library or archive was interested.” The only thing to do, it would seem, was to turn his collection into a non-profit archive in New York called the ARChive of Contemporary Music.  29 years later, the ARC is one of the largest popular music collections in the world, with some three million sound recordings, 19,000 music-related books, and millions of photos, press kits and artifacts.  Now this rich musical resource—used primarily by musicologists and the entertainment industry—is teaming up with one of the largest digital libraries in the world, the San Francisco-based Internet Archive, to create a music library that will preserve and provide researcher access to a wide range of music and the rich materials that surround it.

ACMdigitizationPowered by teams of volunteers, the two archives are partnering to digitize CDs and LPs and then use audio fingerprinting to match tracks with metadata from catalogs and other services.  Using Internet Archive scanners, the ARC is digitizing its books and photographs at its New York facility.  When complete, this music library will be a rich resource for historians, musicologists and the general public.

Listening Room

Listening Room

Starting today, the public can listen to millions of tracks for free, including many that are not available in Spotify or iTunes, at the Internet Archive’s new listening room in San Francisco.  “The Internet Archive has allowed us to move forward at unprecedented speed, originally with book scanning and now with the digitization of a wide range of audio formats,” said Bob George.  “The physical records from around the world that the ARC has archived are a unique treasure,” said Brewster Kahle, founder and digital librarian of the Internet Archive. “Soon these records will be studied in new ways because they will be digital as well.”

ACMpullquoteSince 1985, George, the ARC’s co-founder and director, has run the organization in Tribeca, New York City, supported by friends in the music industry including Paul Simon, David Bowie and Nile Rodgers.  The Rolling Stones guitarist Keith Richards endows a collection of blues and R&B recordings there. Filmmakers Martin Scorsese and Jonathan Demme stop by when trying to track down hard-to-find songs.  Yet for most of its almost three decades, the ARC has been a decidedly “analog” experience:  records, CDs and cassette tapes line its walls; to experience a song you usually have to drop a needle into a pristine vinyl groove.  The collaboration with the web-based Internet Archive represents a new direction.  “We feel that our primary mission, to collect and preserve this material, is near completion,” said Bob George. “Now we are seeking ways to allow greater access to this incredible collection.”

Scanning an LP cover

Scanning an LP cover

The Internet Archive may be best known for the 435 billion web pages in its Wayback Machine, but this digital library has always been a place where live music collectors go to preserve concerts on the web.  Its audio collections include some 130,000 live concerts by bands such as the Grateful Dead, Jack Johnson and Smashing Pumpkins—many with more than a million plays. Recently, the ARC shipped 46,000 seventy-eight rpm recordings to the San Francisco-based non-profit, and has donated tens-of-thousands of long-playing records. Music labels Music Omnia and Other Minds are making their entire collections searchable on www.archive.org, in part because the Internet Archive is one of the few online platforms that preserves audio, texts, musical manuscripts, photos and films and makes them accessible forever, for free.

The Internet Archive listening room is now open to the public for free on Fridays from 1-4 pm, holidays excepted, and by appointment at 300 Funston Avenue, San Francisco, CA.  Those interested in donating physical music collections to the ARC or Internet Archive should contact info@arcmusic.org or donations@archive.org.

 

Create playlists with CratePlayer

rudolphI find great stuff on the Internet Archive all the time, and now I can use a tool called CratePlayer to create playlists from archive.org movie and audio files.  For example, I want to play a bunch of old Christmas movies at my holiday party this year so I found some cartoons and added them to a Crate.  Now all I have to do is hook my computer up to the TV, press play, and poof!  Instant entertainment!

crateplayerCratePlayer is a curation tool that lets you gather audio and video content from online sources into collections that can be played and shared.  When they approached us about incorporating Internet Archive items into their platform, we said “yes!” and gave them some pointers about accessing archive.org content.  Off they went, and in short order they had it all working.

Try using their bookmarklet as you’re poking around among archive.org audio and video content.  It’s easy to use and might help you keep track of all the great things you find.

Over 7,000 Free Audio Books: Librivox and its New Look!

Librivox logoIn 2005, Hugh McGuire asked:
“Can the net harness a bunch of volunteers to help bring books in the public domain to life through podcasting?”

The answer is yes. Thanks to the help of many, LibriVox, the nonprofit organization he leads, has made tremendous progress in producing and distributing free audiobooks of public domain work.

The LibriVox site has recently undergone a major facelift, making it far easier to browse and find great public domain audiobooks. In addition, the underlying software that helps thousands of volu

nteers contribute to LibriVox has been completely rebuilt. This rebuild project was funded by the

Andrew W. Mellon Foundation, and donations from the public. LibriVox continues to use the Internet Archive to host all it’s audio and web infrastructure.

Thanks to:

The thousands of volunteer readers who bring over 100 new books a month originally in Project Gutenberg, and other public domain sources (including, of course, the Internet Archive) to the listening public.

With over 7,000 audio books, LibriVox is one of the largest publishers of audiobooks in the world, and certainly the largest publisher of free public domain audiobooks.

The Millions of Listeners who download over three million LibriVox audiobooks every month.

The Andrew W. Mellon Foundation, and Don Waters at their Scholarly Communications and Information Technology programme, for providing funding for the revamp of the LibriVox website, and underlying technology that runs the project.

Free Hosting by the Internet Archive.

Pro bono Legal services from Diana Szego of Orrick, Herrington & Sutcliffe.

And the relentless good cheer of Hugh McGuire who over the last eight years has created this fabulous service, and continued to make contributions to open (e)book publishing with PressBooks.com@hughmcguire

Please donate!
This project needs ongoing support for servers and software upgrades.

new video and audio player — video multiple qualities, related videos, and more!

Many of you have already noticed that since the New Year, we have migrated our new “beta” player to be the primary/default player, then to be the only player.

We are excited about this new player!
It features the very latest release of jwplayer from longtailvideo.com.

Here’s some new features/improvements worth mentioning:

  • html5 is now the default — flash is a fallback option.  a final fallback option for most items is a “file download” link from the “click to play” image
  • videos have a nice new “Related Videos” pane that shows at the end of playback
  • should be much more reliable — I had previously hacked up a lot of the JS and flash from the jwplayer release version to accommodate our various wants and looks — now we use mostly the stock player with minimal JS alterations/customizations around the player.
  • better HD video and other quality options — uploaders can now offer multiple video size and bitrate qualities.  If you know how to code web playable (see my next post!) h.264 mp4 videos especially, you can upload different qualities of our source video and the viewer will have to option to pick any of them (see more on that below).
  • more consistent UI and look and feel.  The longtailvideo team *really* cleaned up and improved their UI, giving everything a clean, consistent, and aesthetically pleasing look.  Their default “skin” is also greatly improved, so we can use that now directly too
  • lots of cleaned up performance and more likely to play in more mobile, browsers, and and OS combinations under the hood.

Please give it a try!

-tracey

 

For those of you interested in trying multiple qualities, here’s a sample video showing it:

http://archive.org/details/kittehs

To make that work, I made sure that my original/source file was:

  • h.264 video
  • AAC audio
  • had the “moov atom” at the front (to allow instant playback without waiting to download entire file first) (search web for “qt-faststart” or ffmpeg’s “-movflags faststart” option, or see my next post for how we make our .mp4 here at archive.org)
  • has a > 480P style HD width/height
  • has filename ending with one of:   .HD.mov   .HD.mp4   .HD.mpeg4    .HD.m4v

When all of those are true, our system will automatically take:

  • filename.HD.mov

and create:

  • filename.mp4

that is our normal ~1000 kb/sec “derivative” video, as well as “filename.ogv”

The /details/ page will then see two playable mpeg-4 h.264 videos, and offer them both with the [HD] toggle button (seen once video is playing) allowing users to pick between the two quality levels.

If you wanted to offer a *third* quality, you could do that with another ending like above but with otherwise the same requirements.  So you could upload:

  • filename.HD.mp4       (as, say, a 960 x 540 resolution video)
  • filename.HD.mpeg4   (as, say, a 1920 x 1080 resolution video)

and the toggle would show the three options:   1080P, 540P, 480P

You can update existing items if you like, and re-derive your items, to get multiple qualities present.

Happy hacking!

 

 

 

getting only certain formats in .zip files from items — new feature

Per some requests from our friends in the Live Music Archive community…

You can get any archive.org item downloaded to your local machine as a .zip file (that we’ve been doing for 5+ years!)
But whereas before it would be all files/formats,
now you can be pick/selective about *just* certain formats.

We’ll put links up on audio item pages, minimally, but the url pattern is simple for any item.
It looks like (where you replace IDENTIFIER with the identifier of your item (eg: thing after archive.org/details/)):

http://archive.org/compress/IDENTIFIER

for the entire item, and for just certain formats:

http://archive.org/compress/IDENTIFIER/formats=format1,format2,format3,….

Example:


wget -q -O - 'http://archive.org/compress/ellepurr/formats=Metadata,Checksums,Flac' > zip; unzip -l zip
Archive: zip
Length Date Time Name
--------- ---------- ----- ----
1107614 2012-10-30 19:49 elle.flac
44 2012-10-30 19:49 ellepurr.md5
3114 2012-10-30 19:49 ellepurr_files.xml
693 2012-10-30 19:49 ellepurr_meta.xml
602 2012-10-30 19:49 ellepurr_reviews.xml
--------- -------
1112067 5 files

Enjoy!!

LibriVox Free Audiobook Project Receives Generous Mellon Support for Upgrade

LibriVox.org, the world’s largest producer of free public domain audiobooks, and the Internet Archive are pleased to announce a generous grant from the Andrew W. Mellon Foundation, on the heels of a recent landmark achievement: 100 million downloads of the over 5,000 free LibriVox audiobooks from the Internet Archive.

The Mellon grant will go towards rebuilding LibriVox’s technical infrastructure, and improving accessibility of the LibriVox website.

“It’s fantastic to get this support from the Mellon Foundation,” said LibriVox founder Hugh McGuire. “It will be put to good use, helping our hard-working volunteers create many more free audiobooks.”

LibriVox, a volunteer project of the Internet Archive, gets volunteers from around the world to make audio recordings of public domain texts, and gives those recordings away for free. All LibriVox audiobooks are hosted at the Internet Archive.

Founded in 2005, LirbiVox has to date produced 5,371 free audiobooks, in 31 languages. Popular audiobooks include “The Adventures of Sherlock Holmes,” by Arthur Conan Doyle, “Adventures of Huckleberry Finn” by Mark Twain, and “Jane Eyre,” by Charlotte Brontë. In addition to novels, the LibriVox collection includes numerous texts of importance from philosophers such as Kant, Descartes, and Hume, political documents such as the “Universal Declaration of Human Rights,” and scientific texts including Einstein’s “Relativity,” and Darwin’s “Origin of the Species.”

“The LibriVox collection is one of the most popular on the Internet Archive,” said Brewster Kahle, Founder and Director of the Internet Archive. “100 million downloads is awesome. LibriVox is an integral part of our commitment to making important texts available to the world in the best format for people, and we are thrilled at the support from the Mellon Foundation.”

Cori Samuel, a long-time LibriVox volunteer, who has recorded some of the project’s more popular books, was in shock at the numbers. “It’s hard to believe that what started out as a small project among some passionate people on the web has turned into something so big. It’s incredible to imagine that we could have touched the lives of 100 million listeners.”

For more information, please contact Hugh McGuire, LibriVox founder: hughmcguire@gmail.com.   Job posting.

Archive-It Team Encourages Your Contributions To The “Occupy Movement” Collection

Since September 17th, 2011 when protesters descended on Wall Street, set up tents, and refused to move until their voices were heard, an impassioned plea for economic and social equality has manifested itself in similar protests and demonstrations around the world. Inspired by “Occupy Wall Street (OWS)”, these global protests and demonstrations are collectively now being referred to as the “Occupy Movement”.

In an effort to document these historic, and politically and socially charged, events as they unfold, IA’s Archive-It team has recently created an “Occupy Movement” collection to begin capturing information about the movement found online. With blogs communicating movement ideals and demands, social media used to coordinate demonstrations, and news related websites portraying the movement from a dizzying variety of angles, the presence and representation of the Occupy Movement online is both hugely valuable to our understanding of the movement as a whole, while constantly in-flux and at-risk.

The value of the collection hinges on the diversity, depth, and breadth of our seeds and websites we crawl. We are asking and encouraging anyone with websites they feel are important to archive, sites that tell a story about the movement, to pass them along and we will add them to the Occupy Movement collection. These might include movement-wide or city-specific websites, sites with images, blogs, YouTube videos, even Twitter accounts of individuals or organizations involved with the movement. No ideas or additions are too small or too large; perhaps your ideas or suggestions will be a unique part of the movement not yet represented in our collection. IA Archive-It friends and partners are already sending in seeds, which we greatly appreciate.

The web content captured in this collection will be included in the General Archive collection at http://www.archive.org/details/occupywallstreet
which has been actively collecting materials on the Occupy Movement for a few months.

Please send any seeds suggestions, questions, or comments to Graham at graham@archive.org.