Library Leaders Forum 2015 — Exploring the Future of Digital Libraries

Brewster for BlogFrom October 21-23, the Internet Archive will convene our annual Library Leaders Forum 2015 in San Francisco. We’re bringing together an intimate group of  leaders from the library world to explore how together, we can build the digital libraries of the future.  It’s a  chance to listen to our closest partners, share ideas, launch collaborations, and share new tools.

The Forum kicks off on the evening of Wednesday, Oct 21, with our big Annual Celebration for 500+ friends of the Archive.  This year we will be highlighting the transformative work of our partners, hackers and historians alike, who are doing amazing things with the Archive’s collections.  Guests will also be able to scan a book, listen to a vintage recording, or drop a quarter into a virtual video game in a new  3-D Internet Arcade via the Oculus Rift.

Annual Celebration 2014 exterior

View from outside “Building Libraries Together”–the Internet Archive’s Annual Celebration in Oct. 2014.








The next day, about 50 of our top partners will gather back at the Internet Archive headquarters to get to know each other and our growing Archive staff.  On Day One we will share some of the strategic goals and projects the Internet Archive is pursuing, and break into Roundtables to dive deeper.  The goal is to find ways to collaborate and fine tune our plans with our partner’s needs in mind.  Next, we’ve planned a series of hands-on workshops that will enable partners to test drive our new Table Top Scribe, film digitization lab, data visualization tools and upload and download features. Researcher, Kalev Leetaru, will demonstrate new ways to visualize and analyze the texts of more than 8 million books.  We will also break into small groups to have lunch at the staff’s favorite neighborhood restaurants, and end the evening with a picnic in the Presidio surrounded by San Francisco’s best food trucks.

On October 23rd, we plan to gather in the Archive’s Great Room to hear a keynote presentation from Founder and Digital Librarian, Brewster Kahle.  He’ll share his vision for bringing entire libraries digital and some of the tools we’re building to make that a reality. Several of our partners, including Tom Blake of Boston Public Library, will share  the projects that are moving digital libraries forward; experts in copyright will weigh in on the policies that support their work.  Silicon Valley CTO, Greg Lindahl, will unveil Project Visigoth–which applies modern web search technology to IA’s 3 million digital texts.  Stuart Snydman, Associate Director of Digital Strategy at Stanford University Libraries, will dive into the International Image Interoperability Framework, showing us the dynamic potential to enrich the presentation of our visual works.

At the end of the day, the Library Leaders Forum is not a conference, and not your typical library meeting.  This invitation-only event is meant for top managers and library leaders to chart a path for the digital future of our organizations.  Situated in San Francisco, we are surrounded by the builders of new tools and platforms–so it is also a peek into what lies ahead.  Our goal: to listen, to share, and to build something enduring together.

NOTE:  Library partners– check your email for an invitation to register.

Posted in News | 2 Comments

2016 Political TV Ad Tracker: with Analysis & Fact-checking Citizens Can Trust

KNC project illustrationThe Internet Archive is honored to receive today a Knight News Challenge grant to support our collaborative efforts to help citizens make sound decisions in the 2016 U.S. elections; for the best interests of themselves, their communities and future generations.

Experts are predicting 2016 election spending will be double, or more, that of 2012. Much of that money will be spent on TV advertising. Local stations across the country will be raking in enormous sums to air these ads. But how well will the stations educate us on the issues; and offer critical analysis?  If not they, then who?

To help citizens navigate their way towards informed choices amidst the flood of political messaging, we will be building on journalism partnerships to present digital library reference pages for political ads.  Our journalism launch partners include Politifact, and the Center for Public Integrity.

We will be capturing all TV programming in select 2016 primary election locales, front-loaded to reflect early-state candidate winnowing. We hope to apply lessons learned during the primaries, to key general election battleground states in the fall.  In addition to our regular TV news research library interface, we’ll be creating an online reference page for each unique-content political ad.  These pages will present journalist fact-checking and other analysis.   Accompanying these assessments will be information about ad sponsors, campaign financial transparency data as well as dynamically updated tracking on each ad’s plays, including frequency, locale, etc.  

Our 2016 Political Ad Tracker project is informed by extensive collaborative experiments conducted during the 2014 general elections in the Philadelphia-region where there were a number of hotly contested Congressional and state elections.  For more on these pilot collaborations, see Philly Political Media Watch Project and Political Ads Win Over News 45 to 1 in Philly TV News 2014.

We are continuing to refine our approaches to facilitating advanced analysis of regional inventories of television political ads.  To get a sense of the degree of their granularity, explore this interactive search visualization, created by Kalev Leetaru, derived from last year’s experiments: Philly 2014 Political Ad Trends Viewer.

Ad_FingerprintingAnother outgrowth of our political ad experiments last year was applying audio fingerprinting to algorithmically find all other instances of an ad, once a single one had been identified.  We used the audfprint tool developed by Dan Ellis at the Laboratory for the Recognition and Organization of Speech and Audio at Columbia University.

The Internet Archive and Kalev Leetaru recently took ad-finding a step further and prototyped a new way of tracking “memes” on television.  For example, everyone can now chart how the President’s 2015 State of the Union address was excerpted and discussed across U.S. and select international television over the following two weeks.  You could think of it as a TV news seismometer, tracking the propagation of key news sound bites throughout complex TV news media ecosystems, including the context in which they were presented.  We expect to apply this approach to 2016 election debates, speeches, etc.

We are humbled by the challenge of getting the word out about how our Political Ad Tracker information resources can be used.  As librarians, archivists, and technologists….market outreach is not our strength.  We’d like your help.

We are incredibly excited with the prospects of working in concert with diverse journalists, scholars and civic organizations.  Together, we hope to help balance the forces of Big Money with reason & insight, resting on sound data.  To inform and engage citizens better than ever before!

We are deeply appreciative of the Knight Foundation, Rita Allen Foundation, Democracy Fund and Hewlett Foundation for their support!

Posted in Announcements, News, Television Archive | 10 Comments

Tracking Politics on Television: Campaign Advertising and the State of the Union Going Viral

Today the GDELT Project and the Internet Archive debut two exciting new interactive visualizations of the TV News Archive, one tracing the flow of money through campaign advertising in Philadelphia in the 2014 election cycle, and the other introducing a whole new way of tracing what “goes viral” on television by charting how the President’s 2015 State of the Union address was excerpted and discussed across American and select international television over the following two weeks.

Media & Money: Political Advertising in Philly’s 2014 Races


As part of the Philly Political Media Watch Project, from September 1, 2014 through the election of November 4, 2014, 7 television stations in the Philadelphia market were monitored to identify all politically-related advertisements.  In all, 74 distinct political advertisements were identified which collectively aired 13,675 times during the 65 day monitoring period, with Archive staff scoring them for the time each devoted to supporting, attacking, and defending a candidate.  A combination of human review and computerized analysis was used to identify every broadcast of each of the 74 ads over the 65 days, along with the sponsor paying for that particular airing.  The end result is an interactive visualization that allows you to explore the television advertising landscape of Philadelphia last fall, comparing any pair of candidates, parties, races, status, win/lost, sponsor, sponsor type, television channel, or even keywords found in the transcripts, or any combination therein.  The ability to exhaustively identify every single airing of a political advertisement during the key campaigning period and determine who paid for each broadcast offers an incredible new tool for understanding the impact of media and money in the political campaigning process.

For example, you can compare ads focusing on Tom Corbett that were paid by Tom Corbett for Governor vs those paid for by Tom Wolf for Governor. Or, compare all ads mentioning the two candidates from any sponsor.  Or ads focusing on candidates that ultimately won vs lost. Or, compare the ads run by the Philadelphia Federation of Teachers vs those run by the House Majority PAC. Or, those mentioning “school” vs “job” in the transcript of the ad. Or, simply, view the overall trends for all 13,675 advertisement airings.

A New Approach to Measuring Virality on Television: State of the Union 2015


Turning from local to national television, the second visualization explores how American and select international television excerpted and discussed the President’s January 20, 2015 State of the Union (SOTU) speech.  The social media era has profoundly altered the political communications landscape, ushering in a fixation on tracking emerging political “memes” and which pieces of political discourse are “going viral” at the moment.  Yet, we lack metrics for measuring what “goes viral” on television – a critical gap considering that television is still a dominate source of political news for 37% to 60% of Americans.  Thus, the “State of the Union 2015: Tracking ‘Going Viral’ on Television” project was born to prototype a brand-new way of tracking “memes” on television – the ability to take a speech or other television show, select a short clip of it, and instantly see every instance of that clip that was aired anywhere across the landscape of the world’s television monitored by the Archive.

Using the audfprint tool developed by Dan Ellis at the Laboratory for the Recognition and Organization of Speech and Audio at Columbia University, the 2015 State of the Union speech was broken into sentence-long soundbites, with each soundbite scanned against all news television shows archived by the Internet Archive from the evening of the January 20, 2015 speech through February 4, 2015 (two weeks later). The non-commercial audfprint tool scans the audio track of each show, so it is not dependent on closed captioning, which is extremely noisy and entirely absent from many foreign language broadcasts.  The tool is also extremely sensitive, able to detect brief excerpts even when they are overdubbed by a commentator and/or other sound effects. In total, 13,082 news shows totaling 649 hours of programming were scanned, and excluding “gavel-to-gavel” coverage (broadcasting the entire speech from start to finish), 208 distinct shows played an excerpt from the speech over 524 broadcasts.  An interactive visualization allows you to scroll through the speech passage by passage to see how each was excerpted and discussed and you can even watch short preview clips of each mention.

What you are seeing here is a first glimpse of a whole new way of exploring television, using enormously powerful computer algorithms as a new lens through which to explore the Internet Archive’s massive archive of television news.

Posted in News, Television Archive | Comments Off on Tracking Politics on Television: Campaign Advertising and the State of the Union Going Viral

You are invited: SF Premiere of “Life on Bitcoin” Documentary


Join us Wednesday, July 22nd from 6:30-10:30 p.m. for a the San Francisco Premiere of the documentary film “Life on Bitcoin,” at the Internet Archive.

The film covers the experience of newlyweds Austin and Beccy Craig who struggle for 100+ days to live entirely on the upstart currency.

Austin and Beccy will attend and conduct a Q&A session following the film.

The couple began the experiment when their credit cards, debit cards and cash were confiscated at the airport upon arrival from their honeymoon.

“It felt like learning to swim by jumping in the deep end of the pool,” said Austin Craig.

For the next three months, the Craigs tested Bitcoin (and their marriage) by relying solely on the crypto­currency for every expense, including gas, rent, groceries, speeding tickets, and insurance. For every transaction, they had to evangelize the currency to survive. Ultimately their adventure took them on a road trip across the United States and into Europe and Asia.

“Honestly, when I first heard that Beccy and Austin were going to live on bitcoin for three months, I was pretty worried for them” said Kashmir Hill, senior editor at Hill would know the difficulty of living on bitcoin. She tried it herself for a week in May 2013 in San Francisco, and wrote about her experiences for Forbes. The challenges of just one week were clear. “I lost 5 pounds and had to move out of my house, but I survived.” she concluded in her column.

This film sheds light on the practical strengths and early limitations of bitcoin technology and mixes it with a large dose of entertainment and fun.

 All are welcome – the “bitcoin-­curious” as well as the long time fan.

DATE: Wednesday, July 22
TIME: Doors open for mixing and refreshments at 6:30 PM. Seating begins at 8:15. The Archive has large windows, and the film will begin after sunset at 8:30.
LOCATION: Internet Archive, 300 Funston Ave, San Francisco, CA 94118

Buy tickets here (Pay What You Can With $5 Minimum of the equivalent in bitcoin).

Pay with bitcoin link is available here.

Posted in Announcements, News | 12 Comments

Experimenting with One Million Album Covers

Rising to the challenge to create an image search engine using a corpus of one million album covers,  Professor Trenary of Western Michigan University lead a class project that found many exact matches (same file) and many near matches.

Their algorithm matched some that were not the same because it used rough shape matching, and many images were just of the CD or LP label which matched.

Screen Shot 2015-06-30 at 6.49.39 PM

While not at a point of being ready for production use for the Archive, they wrote a nice report on their findings that might be useful to others.   The Internet Archive hopes to enable many more studies using the data in the collection.

Thank you to Brandon Arrendondo,  James Jenkins, Austin Jones, and Professor Trenary.

Posted in Audio Archive | Comments Off on Experimenting with One Million Album Covers

NEW at the Archive Store! MS-DOS “Game Not Over!” T-Shirt


Designed by Jason Scott, the new “Game Not Over!” T-shirt is a celebration of the over 2,000 MS-DOS games that are once again available to play on The shirt is currently available in a number of sizes at the Internet Archive store.

All proceeds go to the Internet Archive. Go to to get yours.

Posted in Cool items, Emulation, Games, News, Software Archive | 2 Comments

Experiment with One Million Album Covers

coversAs might be expected, the Internet Archive has lots of data in its virtual stacks. Besides the books, movies and stored webpages, there are datasets provided from the Internet at large or from individual contributors.

But datasets are just big clumps of data unless someone does something with them. Obviously we’re keeping these around no matter what (our current goal is “forever”), but without folks tinkering, experimenting and using the data sets, they’re just piles clogging up hard drives.

So, in the name of experimentation, we’ve put together one million album cover images from a variety of sources, and put them into this item. The total size is 148 gigabytes (!) of .JPG, .GIF and .PNG images. (There is a torrent on the item, allowing you a more flexible way to download that amount of imagery.)

The albums are somewhat-arbitrarily split according to filename, with .TAR (tape archive) files for the letter a, b, c, etc.  The goal here is experimentation – these have not been curated, overly quality checked, or any differently-sized doubles removed. If you’re writing programs or doing analysis, these are the sorts of oddness or strangeness you should be aware of.

(If you just want to play around a bit, there’s a link to a set of a mere 1200 album covers, for a total of 200 megabytes.)

We’ve included some suggestions for using the data, and some projects that might be interesting to get into, either as a hacking project or just because you’re learning computer science.

Let us know how it works for you!


Posted in Announcements, News | 1 Comment

The first Netlabel Day – Join the event

The Internet Archive has a large (over 58,000 items) and growing collection of netlabels. Recently we received a message asking to help announce a new global event, Netlabels Day. Please support it if you are part of the netlabels world.

netlabelsThe Record Store Day was created on 2007 to celebrate the record stores on the USA and the rest of the world. In that celebration, independent bands and labels releases music exclusively for that day on vinyl, seizing the revival of that format. This was the base of the Netlabel Day, a sort of distant relative of RSD, that pretends to install a new tradition releasing digital music every 14 July from now on.

This initiative was born in Chile thanks to Manuel Silva, from M.I.S.T. Records, and it reunites more than 50 labels from all over the world. All genres are present: Rock, pop, electronic, noise, ambient and many many more, free and just for you.

We will upload every single release on, because we love this platform. We always use it and we’ve never experimented any issues with it. Every album will be available for free on WAV and FLAC via direct download, or torrent as well.

The most important thing is to include everyone in this idea. We will close the call on June 1, so if you have a netlabel and you want to be part of this, please email us to If you are an independent artist without any label associated, you can release your music with us too and be listened by every participating netlabel, so just contact us from May 15 to June 1.

Everyone is invited. Be part of this madness!


Posted in Audio Archive, Event, News | 1 Comment

Thank you, Robert Miller, for 2.5 million Books for Free Public Access

Robert MillerI am both sad and happy that Robert Miller has accepted another position so will be leaving the Internet Archive after 10 years of fantastic achievements. He joined to help create a mass movement of libraries bringing themselves digital by scanning books, microfilm, and other media. He has succeeded in doing this by creating positive relationships and distributed teams, working in 30 libraries in 8 countries, to help libraries go digital.

And thank you to Robert, for building organizational and partnership structures that will continue bring more collections online, long into the future. His endless energy and ability to forge long term relationships to create processes that are both efficient and library-careful have been miraculous to behold. The future looks bright and brighter because of his work.

Working with 1000 contributing libraries, the Internet Archive has digitized and offered free public access to over 2.5 million literary works, we are now on our way to the goal of 10 million books, being served by our sites and the sites of thousands of libraries.

With thousands of libraries serving digital materials in new and different ways to their different communities, we can achieve the diverse but coordinated access and preservation opportunity of our digital age. We look forward to the next steps in the programs that have been started with gusto and relish.

Thank you, Robert. We expect more great things in coming years.

Founder, Digital Librarian

Posted in Books Archive, News | 5 Comments

Making Your DOS Programs Live Again at the Internet Archive

MSDOSSince the beginning of the year, the Internet Archive has been making a large amount of DOS-based games and programs run in the browser, much like our Console Living Room and Internet Arcade collections. Many thousands of people have stopped by and tried out these programs, enjoying such classics as Llamatron 2112 or Dangerous Dave. With countless examples of DOS programs going back spanning 30 years, there’s lots of great software to try out and experiment with. Here’s a great place to start.

If you want to just try out the software, we’re done here. Go into our stacks and have a great time!

However, some people have asked about adding DOS software they created or which they have which isn’t part of our collections, and especially how to make these programs boot in a window like our currently available programs do.

This is a quick guide to getting your DOS programs up and emulating in the browser. If any of these instructions are unclear to you, please contact the Software Curator at

Please note: these instructions are for DOS programs, not Windows programs.

First, you should register for your Internet Archive library card if you haven’t already.

getcardNext, you should upload your DOS software as a .ZIP file. It is important that your program and any support files be inside a single .ZIP file and not uploaded separately.

uploadWhen you upload, you’ll be asked to fill out all sorts of information about your program. Be sure to be as complete as possible, including the description, date of creation, who the author or authors were, and so on. You’re the curator of this software – help the world understand why they should look at it!

Set the “Collection” to Community Software.

Finally, at the bottom of this upload screen, there is an add additional metadata option.

metadataAdd these two metadata pairs:

  • Set “emulator” to “dosbox”.
  • Set “emulator_ext” to “zip”.

Finally, and this is very important … inside the .ZIP file you uploaded is the program that starts the program running. It might be an .EXE, .BAT or .COM file.  For example, if your ZIP file has a single file in it, called LEMON.EXE, then that’s the program that “starts” your program.

  • Set “emulator_start” to this program.

After double-checking your work, click on “Upload and Create your Item” and the system will upload your program to the Archive, and if all goes well, your program will be emulated in our pages after a few minutes.

Again, if you have any questions or experience any issues, contact Jason Scott, the software curator at the Archive, at

Let’s bring the DOS prompt back! And let a thousand programs bloom!


Posted in Software Archive | Comments Off on Making Your DOS Programs Live Again at the Internet Archive

Help Free PACER–Cast your Vote for Free Court Records at the Internet Archive this Friday!

Public Resource Postcard  Internet activist and founder of, Carl Malamud is launching a national campaign to free millions of court documents in PACER–Public Access to Court Electronic Records–the technologically backwards federal electronic system that charges Americans 10 cents per page to access court files in the public domain.  This Friday, you can come by the Internet Archive “polling place” at 300 Funston Avenue., San Francisco from 8 a.m. to 5 p.m. to “cast your vote” for free court records.  Carl will be on hand with inspiring postcards addressed to Chief Judge Thomas of the Ninth Circuit Court of Appeals.  By sending His Honor hundreds of handwritten postcards asking him to grant a PACER fee-exemption,  we can save tax-payers millions of dollars, while freeing court documents crucial to understanding and interpreting the law.

This is just one prong in a multi-faceted campaign to free PACER.  Carl outlines Friday’s strategy in a memorandum of law called, “Yo, Your Honor.”  His request of us:

May 1 is Law Day, and I’m asking people to come in and write a brief postcard about why you think that access to PACER is important. More specifically, you’ll be writing a postcard to Chief Judge Thomas of the Ninth Circuit of the U.S. Court of Appeals in support of my request that the Court grant us free access to PACER for several courts in the Ninth Circuit. It would be a really big deal if the Court said yes, we’re trying to show public support in a way the judges can relate to.

Photo of PACER PostcardsYou can also send your postcard directly if you can’t make it to the Internet Archive on Friday:

Clerk of the Court
Attn: Docket 15-80056
United States Courts of Appeals
James Browning Courthouse
95 7th Street
San Francisco, CA 94103


In 2008, Aaron Swartz downloaded millions of PACER documents, and worked with Malamud to make them accessible for free on the Internet Archive through the RECAP Project.  This is just one more step toward providing everyone with free access to all knowledge–the great promise of the Internet and our mission at the Internet Archive.



Posted in Announcements, News | 7 Comments

The Evolving Internet Archive


The new site

The new version of the site has been evolving over the past 6 months in response to the feedback we’ve received from thousands of our awesome users.

If you haven’t been following along, you can review a little bit of the journey through these blog posts:

Why change the site at all?  The posts above help answer that, but in brief:

  • 35% of our ~3 million daily users are on mobile/tablet devices, and the classic site is not easy to use on small formats.
  • The new tools we want to offer our users would be difficult to implement in the old site architecture.
  • The classic site was built a long time ago, using methods that are outdated.  Finding programmers who have the skills to work in that environment is becoming increasingly difficult, and the ramp up time for new employees is painful.  The redesign has given us an opportunity to start pulling the front end (what you see) apart from the back end, so they can evolve separately.
percent of users viewing the new site

Blue represents people in classic (v1), red represents people in the new version (v2)

Currently about 85% of users are in the new version. Over the next few weeks we will be asking the remaining 15% to try it out.  For the time being, users will be able to exit exitthe new and return to the “classic” version — but the classic will not always be available or supported, so please give the new version a try and give us feedback if there are things on the site that you don’t like, can’t find, or that seem like bugs.  (When you click “exit” you will have an opportunity to give us feedback.)

We have made several video tours that introduce you to the new site. I recommend starting with the site tour, below.


The original download button

In the past few months we have received more than 16,000 feedback emails from people using the new version.  The redesign team reads every single one of them.  Some just say, “I love it!” and some immediately say, “I hate it!”  But a great many of you have also taken the time to share a little more – something you missed from the old site, a question about the new tools, concern about accessibility, suggestions for how to adjust things, etc.

Download menu open by default

Download menu open by default

We took that input — along with information from user tests, interviews with some of our power users, chats with partners — and tried to identify areas of the interface that seemed to be working well, and other areas that were not.

The evolution of downloading files from items is a great example of the process we’ve been following.  The original design for item pages de-emphasized download as a feature. Our conversations with users told us that most people wanted to hit a play button, not download a file.

You could still download in the original design, of course, but you had to click a button to get options and then click again if you wanted specific files.

But when we opened the new site up to more users, we got many comments from people who either disliked the extra clicking, didn’t like leaving the page to get individual files, didn’t understand what the options represented, or couldn’t find the download options at all.

The first thing we tried was just opening up the download menu by default.  Instead of just seeing the black download button on the page, you now also saw a menu of options.  More people saw the download, but feedback made it clear that users still had issues.

What if we make it blue?  (Nope!)

What if we make it blue? (Nope!)

We thought perhaps if we increased the visibility of the download options by turning the Download header blue that people would see it faster.  We did an A/B test with 50% of users seeing each option — neither option really won.  And the feedback about this feature continued to be negative.

It became clear that we needed to rethink the design of the download options all together, trying to keep it clean-looking and easy to use while also satisfying the concerns of our most advanced users.

We set some goals for the download changes based on the feedback we had received:

  • must be able to download an individual file without leaving the item page
  • if there is only one file in a particular format, you should only need one click to download it
  • improve the ability to download groups of files (e.g. “just give me all the FLAC files”)

The current version of downloads allows you to consume individual media files without leaving the page and gives you a lot more options for downloading groups of files from an item.  Since we released the new Download Options feature, the negative feedback about this feature has dropped off almost entirely.  So we think we’re on the right track!  We have created a short video tour for the downloads feature if you want to learn more.

New Download Options feature, illustrating how to display individual files

New Download Options feature, illustrating how to display individual files

The download changes are just one example of how much your feedback has helped us identify areas of confusion on the site and understand how to improve things.  Here are a few more examples:

  • A-Z filters available when sorting by title or creator
  • better experience for people with javascript disabled
  • fixes to improve software emulation
  • default search results to List view (instead of image-based Thumbnail view)
  • pull user page images from gravatar if available (if user has not uploaded one)

We have a lot more in store for the new site – better accessibility for sight disabled people, tools for creating your own collections, improved playback for multimedia items, etc.  As these features trickle into the site, we hope you will continue to share your questions and ideas with us – you are truly helping us to make the archive a better place for everyone.

This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.

Posted in Announcements, Archive Version 2, News | 31 Comments

Two Grants Announced Supporting Web Archiving

We are excited to announce Internet Archive’s participation in two new grant-funded collaborative projects to advance the field of web archiving! Our Archive-It service, which works with libraries, archives, museums and others to provide the tools for institutions to create their own web archives, will partner with New York University and Old Dominion University on two separate areas of work. We thank both The Andrew W. Mellon Foundation and the Institute of Museum and Library Services (IMLS) for their recognition of the value of web archiving and their support for the continued development of tools and initiatives to expand the quality, accessibility, and extensibility of these collections. We also thank our awesome collaborative partners on these projects, New York University Libraries, NYU’s Moving Image Archiving and Preservation (MIAP) program, and Old Dominion University’s Web Science and Digital Libraries Research Group and look forward to working with them as part of our broader initiative for “Building Libraries Together.”

For the project “Archiving the Websites of Contemporary Composers,” led by NYU Libraries and funded with a grant of $480,000 from The Andrew W. Mellon Foundation, we will work with the Libraries and MIAP.  This project will archive web-based and born-digital audiovisual materials, and research and develop tools for their improved capture and discoverability. Contemporary musical works, as well as the rich secondary materials that accompany them, are increasingly migrating to the web. We outlined a number of current challenges to capturing and replaying online multimedia, such as dynamic and transient URL generation and adaptive bitrate streaming, as well as a need for continued research and development around the integration of web archives and non-web collections.

We have two specific pieces of work in the grant. First, we will build tools to improve the crawling and capture of web-based audiovisual materials, addressing the increasing complexity of streaming audiovisual materials, especially on third-party hosting and sharing platforms. This development work will build on our experience creating “Heritrix helper” tools like Umbra. Our second area of work will explore methods to integrate discovery of high-quality, non-web multimedia content held in external repositories into the Archive-It platform. Linking Archive-It collections with non-web institutional content has great potential to integrate web and non-web archives. This work will build on NYU’s creation of an API for their preservation repository, our increased use of API-based systems integration in Archive-It 5.0, and our continued work on improved content discovery for web collections. See NYU’s press release for more details.

The second recently-announced grant project is being lead by Old Dominion University’s Web Science and Digital Libraries Research Group, which received a $468,618 National Leadership Grant for Libraries from IMLS for the project, “Combining Social Media Storytelling With Web Archives” (grant number LG-71-15-0077). Readers not familiar with ODU’s great history of research and development around web archives are encourage to check out projects such as WARCreate/WAIL, their work on visualizations and Archive-It, and our recent favorite, the #whatdiditlooklike tool. In this project ODU will be building tools and processes to assimilate user-focused, online storytelling methods, such as Storify, to 1) summarize existing collections and 2) bootstrap new or expand existing web archive collections. The project will provide new ways to create unique topical and thematic collections through URLs shared via social media and storytelling platforms.

We will be working with them to integrate these tools in Archive-It, conduct user testing and training, and explore other ways that storytelling and user-generated materials can help build narrative pathways into large, often diffuse, collections of web content. We are excited to work with ODU and continue our increased focus on new models of access for web archives, as many institutional web collections are now of a breadth, volume, and operational maturity to begin focusing on novel ways their web archives can be studied and better understood by users and researchers.

Thanks again to Mellon Foundation and IMLS for supporting these cooperative efforts to advance web archiving and we are excited to work with our great partners and the broader community to keeping preserving and expanding access to the rich historical and cultural record documented on the web.

Posted in Announcements, Archive-It | 3 Comments

Will We Let Congress Vote to Fast-Track Secret Trade Deals?

Yesterday, legislation was introduced in the US Senate that would enable Congress to fast-track approval of secret trade agreements by Republican Orrin Hatch and Democrat Ron Wyden. The timing is important because the President is currently pushing for the approval of the Trans-Pacific Partnership, an agreement negotiated in secret meetings with international lawmakers that has serious ramifications for a host of important issues, Internet privacy and intellectual property among them.

We are worried that Congress’ and the public’s ability to review, discuss, and debate proposed agreements would be significantly limited by this bill. It would also force Congress to have a strict yes/no vote on the presented agreement, with no ability to make amendments beforehand.

The impacts of these agreements and the international rules that they impose upon citizens and Internet users across the globe are too sweeping to be coordinated behind closed doors and then presented in a short window for a straight up and down vote.

There is still time for concerned individuals and organizations to resist this push, as we did with SOPA, PIPA, and the threat to net neutrality.

For more information and organized ways to take action, see the Electronic Frontier Foundation’s write-up and the Internet Vote campaign.

Posted in Announcements, News | 3 Comments

Internet Archive and CADAL Partner to Digitize 500,000 Academic Texts

The Internet Archive and the Chinese Academic Digital Associative Library (CADAL), are pleased to announce that 500,000 English-language, academic books will be digitized through a partnership that leverages strengths from both organizations. This furthers an initiative begun in 2009, The China-US Million Book Digital Library Project, seeking to bring one million texts into the public domain.

“We are working together with a valuable global partner, CADAL, to create a digital library of high quality, academic, eBooks for use in China, North America and the world at large; I couldn’t be happier!” Robert Miller, General Manager of Digital Libraries for the Internet Archive, remarked on the collaboration.

The Chinese Academic Digital Associative Library (CADAL) is a consortium of over 70 Chinese University Libraries. CADAL will provide access to a leading set of libraries, the technical resources to display, and share the books inside China, as well as the staff needed for digitization. The Internet Archive will select the books, and provide equipment and processing resources. Both organizations will offer access and discovery tools for both scholars and citizen-scholars. Together, CADAL and the Internet Archive are contributing to a growing, global digital library.

Chen Huang, Digital Librarian and Deputy Director of Administrator Center for CADAL, shared the vision for the project: “We are pleased to be working with the Internet Archive. Together, we have developed a program that will allow Chinese university students to have access to materials that will enhance both specific knowledge, and exposure to broad trends and ideas.”

This phase of the partnership will last about 3 years and involve teams in the US, Shenzhen, China and ZheJiang University in Hang Zhou, China.

The Internet Archive is a non-profit library with over 6 million texts online and a popular global website, with 34 million downloads a month. Their mission is “Universal Access to All Knowledge”

Contact for more information.

The China Academic Digital Associative Library (CADAL) is a long term project of the Ministry of Education of China. The consortium aims to construct an academic digital library with high-level technology and abundant digital resources that are multidisciplinary, multilingual, and categorically diverse.

Contact for more information.

Posted in Books Archive, News | 5 Comments

Sharing Data for Better Discovery and Access

The Internet Archive and the Digital Public Library of America (DPLA) are pleased to announce a joint collaborative program to enhance sharing of collections from the Internet Archive in the Digital Public Library of America (DPLA).

ia-logo-220x221The Internet Archive will work with interested libraries and content providers to help ensure their metadata meets DPLA’s standards and requirements. After their content is digitized, the metadata would then be ready for ingestion into the DPLA if the content provider has a current DPLA provider agreement.

The DPLA is excited to collaborate with the Internet Archive in this effort to improve metadata quality overall, by making it more consistent with DPLA requirements, including consistent rights statements. Better data means better access. In addition to providing DPLA compliant metadata services, the Internet Archive also offers a spectrum of digital collection services, such as digitization, storage and preservation. Libraries, archives and museums who chose Internet Archive as their service provider have the added benefit of having their content made globally available through Internet Archive’s award winning portals, and

“We are thrilled to be working with the DPLA”, states Robert Miller, Internet Archive General Manager of Digital Libraries. “With their emphasis on providing not only a portal and a platform, but also their advocacy for public access of content, they are a perfect partner for us”.

Rachel Frick, DPLA Business Development Director says, “The Internet Archive’s mission of ‘Universal Access to All Knowledge’, coupled with their end-to-end digital library solutions complements our core values.”

Program details are available upon request. Please contact:
Rachel Frick – DPLA Business Development Director,
Robert Miller – General Manager of Digital Libraries,

Posted in Announcements, Books Archive, News, Open Library | 1 Comment

You are invited to a Party for GETDecentralized–Wednesday April 1 at the Internet Archive

Screen Shot GETD Logo



Help Us Lift the Fog on Decentralization!

The GETDecentralized community wants to do something fundamental: “To transform bureaucratic hierarchies into technology-driven networks” (Fred Wilson).

The Internet Archive and Jolocom invite you to GETDecentralized! An evening of conversation, celebration and community-building around new ideas in decentralization.

GETD Party will be Wednesday, April 1st at the Internet Archive in San Francisco!

Location: The Internet Archive, 300 Funston Avenue, San Francisco, CA 94118

6:00 — 7:00 pm, Reception
7:00 — 7:30 pm, Speakers (including Brewster Kahle and Markus Sabadello)
7:30 — 8:30 pm, Reception and Tours of the Internet Archive

Markus Sabadello, a long-time decentralization activist and hacker, will take us on a tour of the new technologies of decentralization. Learn what “decentralization” means and how we can all benefit from it. Markus runs his own open-source effort “Project Danube,” which is based on XRI/XDI technology and experiments with user-centric identity, personal data storage and Vendor Relationship Management.

Also Brewster Kahle, Founder & Digital Librarian, Internet Archive, will share his ideas about a “Locking the Web Open” through decentralized technologies. He’ll lead a tour of this digital universal library — 20 Petabytes of our culture’s books, films, music, software and Web pages. Hope to see you next Wednesday!

RSVP Today!

Posted in News | Comments Off on You are invited to a Party for GETDecentralized–Wednesday April 1 at the Internet Archive

Political Ads Win Over News 45 to 1 in Philly TV News 2014

[press: Columbia Journalism Review, USA Today, BloombergPolitics, Washington Post]

Study finds 842 minutes of political Ads compared to 18.7 minutes of political news stories in large sample of Philadelphia TV news programs archived by the Internet Archive in a joint project.

In the closing eight weeks of the 2014 campaign, political candidates and outside groups bombarded viewers of Philadelphia’s major TV stations with nearly 12,000 ads designed to sway voters in the Nov. 4 elections. But the stations that benefited from political advertisers’ $14 million spending spree also appear to have devoted little time to political journalism. A study of a representative sampling of newscasts on those stations put the ratio of time devoted to political advertising and spent on substantive political news stories at 45:1.

Political Ads & Local TV News – Philly 2014, by Danilo Yanich

These are the findings of a University of Delaware team lead by Associate Professor Danilo Yanich. The university’s Center for Community Research and Service researchers collaborated with the Internet Archive, The Sunlight Foundation, and the Committee of Seventy – the 100+ year-old Philadelphia-based political watchdog organization.

Our joint pilot project, Philly Political Media Watch, worked to open a library of all television news from stations based in and around Philadelphia and index the political ads presented in their newscasts. The ads were joined with information on who paid how much for them.  The Sunlight Foundation was able to unearth those financial data from being buried in PDF disclosures every TV stations is required to submit to the Federal Communications Commission. The experimental project was supported by individual contributors and grants from the Democracy Fund and the Rita Allen Foundation.

Philly TV Market AreaThe Philadelphia television market was chosen as a 2014 laboratory to experiment how the interaction between news media and political money; to learn lessons that could be taken to scale across the nation in 2016. The Philadelphia region is the nation’s 4th largest TV market, 19% African American, and includes parts of three states. In 2014, important contests in the region included races for: Pennsylvania governor, a Delaware U.S. Senate seat, two open congressional seats in New Jersey and an open state Senate seat in suburban Philadelphia.

The six major Philadelphia metro TV stations carried 8,003 political ads in their news broadcasts between September 8 and Election Day. As Yanich’s report notes, political strategists have long acknowledged that they try to place ads during or near news programming because it attracts the highest proportion of likely voters.

Here is a sample program from the Delaware study.  This 60-minute WCAU, a NBC affiliate, program aired at 5:00pm the day before the elections.  It offered two substantive political stories.  One about election day poll hours and the other about the leading candidates for governor commenting on their attack ads.  Good set up.  Questions of incumbent elicit an unequivocal assessment of opponent’s assertions.   Followed by other candidate asked if his ads are negative.  Seemingly timely and germane.  Quiz: Can you find WCAU’s mistake followed sometime later by an unacknowledged correction?

Although WCAU clearly addressed important election issues, that same 60 minute program was also stuffed with 24 political ads.  Here is one, below.  Quiz: Can you spot the word “EBOLA”?  And for extra credit: which is more toxic to our Republic, this kind of ad or the disease?

Although local TV station marketing directors are more than happy to accommodate the needs of political ad buyers, the  local news directors appear to take a less supportive view of their audience’s interest in politics. Yanich and his research team looked at a representative sample of the news programs (390 of 1,256) and found politics taking a back seat to other types of stories in terms both of time and placement in the broadcast. The Delaware researchers found that many of the political stories aired were blandly informational, describing candidate schedules or appearances. Isolating political stories that focused on substantive political issues, Yanich’s team found that during the broadcasts they analyzed, there 18.7 minutes of those stories, compared to 842 minutes of political ads, a ratio of 45:1.

Next Steps

With so much heat, where will citizens find the light they need to navigate through this onslaught of political messaging?

Internet_Archive 2016 Political Ad TrackerThe Internet Archive has begun to welcome new collaborators to join us in tackling the challenge of creating timely information resources for the 2016 U.S. election cycles. Data individuals and civic organizations can trust when considering how to participate in some of their community’s most important decision making. Reliable information they can use to hold television stations accountable for the choices they make in balancing obligations to serve the information needs of their communities and the allure of one of their biggest revenues sources: political advertising.

How might we better inform voters and increase civic participation before, during and after elections?



Posted in Announcements, News, Television Archive | 2 Comments

Open Source Housing for Good

This is from a talk given by Brewster Kahle,  Founder and Digital Librarian of the Internet Archive, at Commonwealth Club panel titled Open Source Housing for Good on March 9th.  [covered by KQED public radio]

Foundation Housing

Foundation Housing

Our employees are being driven from their homes by rising rents; they are commuting great distances because of the lack of affordable housing; they are living in insecurity because of the fluctuation in rent and home prices.

Internet Archive - Non-Profit Library

Internet Archive – Non-Profit Library

I believe it is becoming harder to attract and keep good people working in nonprofits, including the Internet Archive, because of this problem.

Our employees spend an average of 30-60% of their income on housing. 30-60%.

That is a lot more than the “spend less than 25% on housing” that HUD recommends. Turns out that this is not just our employees, and not just the bay area. According to a Harvard study, the average American renter pays 30-60% of their incoming on housing. Similarly, homeowners pay about the same, except for those lucky few that own their houses outright.

The Bay Area is particularly problematic because rents and house prices have been rapidly rising, which is causing dislocations or people feeling locked into apartments and jobs. Nonprofits are particularly hit because their funding does not rise and fall as fast as the market fluctuations. Further, when the market is down, it is exactly the time you want non-profit services to be strong.

So the Internet Archive, and I would say other nonprofits as well, have an existential problem: affordable and stable employee housing.

The Internet Archive and the Kahle/Austin Foundation are trying a new model to help. Foundation Housing as a name for a new housing class : Permanently Affordable housing for non-profit workers.

In this model, a new nonprofit, the Kahle/Austin Foundation House, has been set up to purchase apartment buildings. These rental units are then made available to employees of select nonprofits at a “debt free” rate– basically equivalent the condominium fee and taxes. Typically, the debt makes up about 2/3 of the cost of a building and the other costs (tax+maintenance+insurance) makes up about 1/3.    Since the employee does not pay the debt part, the monthly fee is now about $850-1000/month rather than $2700-3000 current market rent.   This way, the fee to those employees is about 1/3 of the cost of market rent, and we believe more stable than market based rents.

Walking Distance To Work

Foundation Housing Residents

Currently, this is being tried with an 11 unit apartment building in San Francisco 6 blocks from the Internet Archive. As apartments have become available through normal attrition — we do not force the existing tenants out– the Foundation house has made units available to 2 nonprofits, and there are now 3 employees living there. Having a walking commute, lower housing cost, and a nice neighborhood has been well received.

Roxanna used to commute over an hour each way from Bay View on 3 buses, and raising her 8 year old daughter in a building that had drug dealers actively dealing.  Now she walks 6 blocks to work, pays less, and feels safer.

Michelle is a librarian who was being evicted from her apartment and would have left San Francisco and probably would not be now working at the Internet Archive.

And Samantha worried that her rent was continuously on the rise, thinking she might have to leave the city in a few years, likes that the building is feeling more like a community and less like than an anonymous number in an apartment building.

Having housing provided as part of an employee benefit is similar to faculty housing, military, monasteries, and some hospital housing. But having to leave your apartment upon leaving your job is a negative aspect of this model. We have not seen the effect of this because no one has left yet.

So we think we have a model… but how do we make it permanent, and how do we finance it? To help make it permanent, we are borrowing ideas from the free and open source world and creative-commons licenses.  “Some Rights Reserved” rather than “All Rights Reserved“. “Share and Share Alike” rather than “Get Off My Property”.  With free-and-open-source software, the writer is giving up some of the profit potential in return for increased community participation. In the Foundation House, the supporters are giving up the ability to flip the building for a profit in return for making a permanent asset for the public good.

To finance the creation of these, we have thought of 4 ways, and are trying 3 of them already:

We built a credit union with this idea in mind, called the Internet Credit Union. It has plenty of deposits to start creating Foundation Housing, but alas, the credit union regulators (indirectly controlled by the banks) are not allowing us to make mortgages. This is a sad state of affairs for our nations new credit unions, but is not the subject of this talk.

We have tried the “endowment” approach with the current Foundation House, where we appealed to major donors for an endowment in the form of a building. The attraction is that it is much like an endowment, but instead of having money in a Goldman Sachs account, where they do their magic to make some return, the building-as-endowment is both good deal financially, and helps the nonprofit support their employees.

Beyond this, we would like to look into raising money through a low-interest bond, say for $100 million, to government and local investors, to fund the purchase of these houses, then using market based renters to pay off the bond. This way the buildings would slowly transition into debt-free Foundation Housing.   We have not tried this yet.

Lastly, and maybe most promisingly, there are people that are looking for new answers and participating in conversations like this.  A number of people in the Bay Area are starting co-working spaces and group houses .  When these are being started can be a good time to set up a structure to work off debt and keeping it off — then use the benefits to perpetuate a mission. While still in formation, there seems to be interest from people like Jessy Kate and others.    This could be helped by creating a Foundation Housing License that others could adopt or remix.

With about 10% of all employees in the US working in the non-profit sector, maybe we could hope for 5% of US housing to become Foundation Housing to provide stable, affordable housing for those dedicating themselves to service.

Lets create more debt-free Foundation Housing for non-profit workers!


[Other pieces on this]

Posted in News | 2 Comments

You are Invited to a Party: Victory for the Net

The event was a success, with resulting video and press.









Dear Friend of the Open Internet,



FCC Chairman, Tom Wheeler, wants to do something monumental: reclassify broadband access providers under Title II of the Communications Act.

Translation: we’ve made huge progress in the fight to protect the Open Internet. And it’s time to celebrate!

The Internet Archive & Electronic Frontier Foundation invite you to VICTORY FOR THE NET! An evening of celebration, conversation, and sharing what’s next. The party will be Thursday, February 26 at the Internet Archive, 300 Funston Avenue, San Francisco, from 6-9 p.m.

The FCC still has to vote on Chairman Wheeler’s proposal and we don’t know the exact details yet. What we do know is that we’ve all worked hard to get the agency on the right track at last. We’re not done yet, but we have a lot to celebrate.

We are joining hands with our friends and co-hosts from:
Free Press, 18 Million Rising, Center for Media Justice–home of the Media Action Grassroots Network, Common Cause, Daily Kos, Demand Progress, Fight For the Future, Media Alliance, Progressive Change Campaign Committee, Public Knowledge, San Francisco Bitcoin, San Francisco Mayor’s Office of Civic Innovation, The Greenlining Institute, The Utility Reform Network and to take stock of how far we have come, and where we are headed in the movement to protect the Open Internet.

Hope to see you next Thursday! RSVP Today!

Brewster Kahle
Founder & Digital Librarian
Internet Archive

Posted in Announcements, News | Tagged , | 24 Comments