As might be expected, the Internet Archive has lots of data in its virtual stacks. Besides the books, movies and stored webpages, there are datasets provided from the Internet at large or from individual contributors.
But datasets are just big clumps of data unless someone does something with them. Obviously we’re keeping these around no matter what (our current goal is “forever”), but without folks tinkering, experimenting and using the data sets, they’re just piles clogging up hard drives.
So, in the name of experimentation, we’ve put together one million album cover images from a variety of sources, and put them into this item. The total size is 148 gigabytes (!) of .JPG, .GIF and .PNG images. (There is a torrent on the item, allowing you a more flexible way to download that amount of imagery.)
The albums are somewhat-arbitrarily split according to filename, with .TAR (tape archive) files for the letter a, b, c, etc. The goal here is experimentation – these have not been curated, overly quality checked, or any differently-sized doubles removed. If you’re writing programs or doing analysis, these are the sorts of oddness or strangeness you should be aware of.
(If you just want to play around a bit, there’s a link to a set of a mere 1200 album covers, for a total of 200 megabytes.)
We’ve included some suggestions for using the data, and some projects that might be interesting to get into, either as a hacking project or just because you’re learning computer science.
Let us know how it works for you!
What a potentially wonderful source. You could for example, if you could identify the type of album, jazz in this example, do some serious research. Given that there is a combination of advertising art, fashion, marketing, and so on, thus a relationship between sleeve and audience, you could analyze that relationship and the developing styles over time. And across national cultures. For jazz that would enable you to trace its significance and evolving status over time, in different regions. It could be fascinating.