Reader Privacy at the Internet Archive

Reader Privacy(NYTimes article, Video of Announcement with Daniel Ellsberg)

The Internet Archive has extended our reader privacy protections by making the site encrypted by default.   Visitors to archive.org and openlibrary.org will https unless they try to use http.

For several years, the Internet Archive has tried to avoid keeping Internet Protocol (IP) addresses of our readers.   Web servers and other software that interacts with web users record IP addresses in their logs by default which leaves a record that makes it possible to  reconstruct who looked at what. The web servers on Archive.org and OpenLibrary.org were modified to take the IP addresses, and encrypt them with a key that changes each day making it very difficult to reconstruct any users behavior.  This approach still allows us to know how many people have used our services (now over 3 million a day!)  but restricts the collection of the reader’s IP address.  (We may collect IP addresses in our main web logs for instance if we are diagnosing an attack on our services or errors occur.  There are other systems that may collect IP addresses, though we try to limit them. For more information please see our privacy policy.)  For books that are checked out from our Open Library service, we record which patron has checked out the book but generally not the IP address of their computer.

Today we are going further than this.  Based on the revelations of bulk interception of web traffic as it goes over the Internet,  we are now protecting the reading behavior as it transits over the Internet by encrypting the reader’s choices of webpages all the way from their browser to our website.   We have done this by implementing the encrypted web protocol standard, https, and making it the default.  It is still possible to retrieve files via http to help with backward compatibility, but most users will soon be using the secure protocol.

Users of the Wayback Machine, similarly will use the secure version by default, but can use the http version which will help playback some complicated webpages.

This is in line with the principles from the ALA and a campaign by the EFF.

[updated in 2021, thanks to an attentive reader, to describe more specifically when IP addresses are collected and to link to our privacy policy. -brewster]

9 thoughts on “Reader Privacy at the Internet Archive

  1. Nemo

    Fantastic! Thank you for this. In a time when even previously privacy-friendly entities are expanding their collection of users behaviour data, it’s awesome to see the Internet Archive leads those going in the opposite direction! I hope many will follow your lead.

  2. Pingback: With over 3 million users per day, the Internet Archive switches to HTTPS connections by default | Alternative News Alert!

  3. Pingback: With over 3 million users per day, the Internet Archive switches to HTTPS connections by default | socialwebsiteanalyzer.com

  4. Pingback: With over 3 million users per day, the Internet Archive switches to HTTPS connections by default | paschoal.net - notícias de tecologia

  5. Pingback: With over 3 million users per day, the Internet Archive switches to HTTPS connections by default

  6. Dirty Dingus Mcgee

    Thank you for supplying a great deal of Internet security and privacy for those of us who use your wonderful services. For myself, Internet privacy is my main focus, and I am always looking for sites like yours, that state publicly the importance of Internet privacy, and do their best to supply just that.

    I am beginning to develop a positive habit, that of visiting and utilizing the Internet Archives sites. I find the work here worthy, and definitely worth my time in learning more about all that you offer. It is quite an education all in itself to peruse your site!

    Thanks!

  7. oneyoudontknow

    And what about the safety of the submitter? It is a bit strange how easily one can browse for E-mail addresses of the submitters. Why is this kind of metadata not hidden?

  8. Pingback: With over 3 million users per day, the Internet Archive switches to HTTPS connections by default - The Headlines Now - Live News India, World, Business, Technology, Sports, Fashion, LifeStyle & Entertainment

  9. Serge Laguerre

    It is very difficult to trust any web provider,because of the technology.Every body can spy not only the Government Agencies ,the private corporates do it too for money,they do it to advertise their products.The Gov do it protect the National Security ,it is somewhat understandable because they do not make money on it.The corporates do make a lot of money,they are worse than the the Gov.

Comments are closed.