Author Archives: Brewster Kahle

Apple Pie Potluck and Constitutional Law Teach-In — Friday Feb 17th 5:30-9PM


Initial information — more details to come:
In honor of the General Strike:

Constitutional Law Teach-in at the Internet Archive with EFF and Others

EFF and other lawyers will lead a conversation about the current issues and threats in constitutional law. Focusing on specific sections and amendments we will talk about current cases on censorship, surveillance, search and seizure, and more.

Workshops on using encryption tools and maybe musical performances will accompany.
If you want to present, perform, or have other ideas, please email us.

When: Friday, February 17th 5:30pm-9pm (program 6-8)
Where: Internet Archive
300 Funston Ave. SF, CA 94118
Potluck-style: Please bring apple pie or other food
Reserve your free ticket here
Streamed via Facebook Live
Donations welcome

Lawyers Attending:

  • Cindy Cohn – Executive Director of EFF
  • Corynne McSherry – Legal Director of EFF
  • Victoria Baranetsky – First Look Media Technology Legal Fellow for the Reporter’s Committee for Freedom of the Press
  • Geoff King – Lecturer at UC Berkeley, and Non-Residential Fellow at Stanford Center for Internet and Society
  • Bill Fernholz – Lecturer In Residence at Berkeley Law

For those who cannot attend in person, we will stream the event on Facebook Live, so make sure you’re following us on Facebook.

Upgraded Secure Communications Applications I am Now Using

I am upgrading the security of my communications while still being easy to use. I thought I would share what I currently use in case it is helpful to copy and I would appreciate comments.

I want end-to-end encryption so nobody can intercept what I am saying (unless they have infected my phone or computer, but that is another issue), and bonus points for making it so that it is unknown who I am communicating with and when (private metadata and traffic). Skype, phonecalls, sms/texts, slack and email are now known to not be private (at least by default) thanks to Edward Snowden. This is too bad since I still use these. (Slack is not end-to-end encrypted even for direct messages, which it could and should.) So far I have only partially achieved the first step: end-to-end encryption. I am migrating to:

  • txt and sms replacement, somewhat phonecalls: Signal for point-to-point instant messaging replacing sms and skype. Free software, free of cost, and open source, works on smart phones.I have donated.
  • skype texting replacement: Signal for laptops and with a chrome-based desktop Signal app on my Mac (which is what I mostly use). It uses phone numbers as identifiers, which is kind of a pain. EFF friend called this “best of breed” for security. Small development staff.   There is a tip for updating it to have names rather than phonenumbers: go to the … menu, go to settings, at the bottom is update contacts.
  • skype video/slack audiovideo replacement:    appear.in for 1-on-1 and small group video chat that is end-to-end encrypted replacing Skype for me. This does not require a download or an account. Go to the homepage, type a bunch of characters to make a meeting room, then send the resulting url to someone and they can use that throw-away meeting room. Super easy. Uses webrtc (now standard in browsers), and https with it, they say it is end-to-end encrypted. They have a iphone app as well, but don’t know about security. This does not seemed designed for super high security, but seems to be pretty good.
  • webex replacement:   zoom.us for larger group video chats replacing Webex for me. Free of cost for most of my uses, easy to use (requires download, but is super easy) . It says it is end-to-end encrypted with a little lock icon when in use and encrypted.
  • Facetime occasionally on my iphone replacing cellphone calls to friends with an iphone. Apple says that it is end-to-end encrypted.
  • Thunderbird + Enigmail to sign all email, receive encrypted email, and sometimes sending encypted Email, with an organizational email server (archive.org not gmail). Enigmail is moderately hard to set up, I had help in a meetup. Cost free, and I believe free and open source software. I am donating.
  • encrypted notes file (the mac Notes app) on my mac for high priority secure notes. It syncs the encrypted file with my iphone via icloud.
  • Breadwallet, bitcoin wallet on my iphone, for small amounts of bitcoin for casual purchases. Super easy and a full wallet (does not hang off a server). Love this wallet. Cost free. I invested a tiny amount of money in the company– great guys.
  • Torbrowser for private web browsing beyond Firefox’s Private browsing feature. Free and open source software, cost free. I have donated.
  • On Macintosh os/x it’s easy to turn on full disk encryption (FileVault). Go to the “Security and Privacy” setting and turn on FileVault. If you do, be sure *not* to accept its offer to store the key in iCloud. Write down the “recovery key”, and hide it somewhere away from the computer. The security of this approach is based on the security of your normal login password, so if it’s lame, change it to something that can’t be guessed or brute forced easily.  (from a commenter, Eric Blossom)
  • Web search: DuckDuckGo or StartPage.com. (from a commenter, Reinout)

Any comments or ideas are welcome. I realize have traded off security for ease of use. I hope stronger tools get easier and I suggest we all invest in tools based on donations and development help. I wish I knew my mac and iphone were not compromised. Not sure how to do that.

I have tried ricochet as an instant messaging client that secures who I am talking to via Tor, easy to use, but few I know use it, so I don’t use it often. I have tried encrypting my email using pgp via enigmail but have run into trouble with others being able to read it, so I do not encrypt email by default. As an aside, encryption is related in a funny way to content-addressible systems, which is a different subject, but this is magic and the future.

(earlier version of this post is on http://brewster.kahle.org )

Micropayments to Archive.org by using the Brave Browser (and bitcoin)

I hope Ted Nelson is proud. The Internet Archive just signed up for getting micropayments from participating Brave Browser users.  Brave Browser is an alt-browser for controlling ads, mostly, but they added a micropayments feature (beta).

You need put in some bitcoin that will then be distributed to the sites you visit in a month. Cool! (they help you get bitcoin)

We don’t expect it will raise the money we need to make a copy of archive.org in Canada, but we are glad to participate in this program.  Thank you, Brave, and our intrepid users.

Would Like to Archive Government Web Services, not just Web Sites– Please help

Archiving .gov and .mil websites is going on now, with lots of help—but what if we could archive full government web services? This would mean keeping interactive sites that include databases and forms, available for future use even if the original website changes or is removed.

We like this idea because we would preserve how websites worked, not just what they looked like. As websites become more database driven and interactive, this would be a bigger help than the already helpful Wayback Machine.

We believe this is possible now given the increased use of virtual machines and cloud services. Webmasters are adjusting to having their systems work in an isolated environment and one that can be snapshot’d.

What we need are some webmasters who would like to try this. We think that government websites would be perfect because they tend to change as administrations change and the datasets are often public data.

If you run a website and would like to participate in this experiment or would like to help on the receiving end, please send a note to info@archive.org or reply to this post.

Archiving web services could usher in a completely new age in archiving of Internet resources.

 

 

Help Us Keep the Archive Free, Accessible, and Reader Private

The Web Needs a MemoryThe history of libraries is one of loss.  The Library of Alexandria is best known for its disappearance.

Libraries like ours are susceptible to different fault lines:

Earthquakes,

Legal regimes,

Institutional failure.

So this year, we have set a new goal: to create a copy of Internet Archive’s digital collections in another country. We are building the Internet Archive of Canada because, to quote our friends at LOCKSS, “lots of copies keep stuff safe.” This project will cost millions. So this is the one time of the year I will ask you: please make a tax-deductible donation to help make sure the Internet Archive lasts forever. (FAQ on this effort).

On November 9th in America, we woke up to a new administration promising radical change. It was a firm reminder that institutions like ours, built for the long-term, need to design for change.

For us, it means keeping our cultural materials safe, private and perpetually accessible. It means preparing for a Web that may face greater restrictions.

It means serving patrons in a world in which government surveillance is not going away; indeed it looks like it will increase.

Throughout history, libraries have fought against terrible violations of privacy—where people have been rounded up simply for what they read.  At the Internet Archive, we are fighting to protect our readers’ privacy in the digital world.

We can do this because we are independent, thanks to broad support from many of you. The Internet Archive is a non-profit library built on trust. Our mission: to give everyone access to all knowledge, forever. For free. The Internet Archive has only 150 staff but runs one of the top-250 websites in the world. Reader privacy is very important to us, so we don’t accept ads that track your behavior. But we still need to pay for the increasing costs of servers, staff and rent.

You may not know this, but your support for the Internet Archive makes more than 3 million e-books available for free to millions of Open Library patrons around the world.

Your support has fueled the work of journalists who used our Political TV Ad Archive in their fact-checking of candidates’ claims.

It keeps the Wayback Machine going, saving 300 million Web pages each week, so no one will ever be able to change the past just because there is no digital record of it. The Web needs a memory, the ability to look back.

If you find our work has been useful to you, please take a minute to donate whatever you can afford today. Help ensure the Internet Archive lasts forever.  I promise you—It will be money well spent.

[modified 2021 to remove a statement about IP address collection which was surfaced by a reader, which is not correct for we may collect IP addresses in our main web logs for instance if we are diagnosing an attack on our services or errors occur.  There are other systems that may collect IP addresses, though we try to limit them. For more information please see our privacy policy.   -brewster]

US Election Results

I am a bit shell shocked– I did not think the election would go the way it did.   I want to reassure everyone– we are safe– funding, mission, partners have no reason to change.   I find this reassuring, hopefully you do as well.

As we take the next weeks to have this sink in, I believe we will come to find we will have new responsibilities, increased roles to play, in keeping the world an open and free environment.

We are well positioned, with our mission of Universal Access to All Knowledge, to help inform the public in turbulent times, to demonstrate the power in sharing and openness.

I look forward to working with our staff, our partners, and the new partners that this creates, to see what our role should be to build the best damn library we can to serve the Maximum Public Good.

Over the next couple of weeks, please think through what we might do.  Looking forward to your ideas.

yours,

Brewster Kahle
Digital Librarian
brewster@archive.org

Election Night at the Internet Archive

The Internet Archive is informally open to our employees, their families and friends, and our community to watch the election results next Tuesday night. This is a spur-of-the-moment invitation and an experiment. If there are enough people interested, we will use the great room.

To cover the cost of pizza and soda, please purchase a $10 “ticket” on our Eventbrite.

The event will run from 6pm until the election is called — 11pm at the latest. We will limit the number of people and we reserve the right to ask anyone to leave for any reason.

If you are interested in volunteering to help that evening, please contact Salem at salem@archive.org.

Decentralized Web Server: Possible Approach with Cost and Performance Estimates

At the first Decentralized Web Summit Tim Berners-Lee asked if a content-BK and TBLaddressable peer-to-peer server system scales to the demands of the World Wide Web. This is meant to be a partial answer to a piece of the puzzle.  For background, this might help.

Decentralized web pages will be served by users, peer-to-peer, but there can also be high-performance super-nodes which would serve as caches and archives. These super-nodes could be run by archives, like the Internet Archive, and ISPs who want to deliver pages quickly to their users. I will call such a super-node a “Decentralized Web Server” or “D-Web Server” and work through a thought experiment on how much it would cost to have one that would store many webpages and serve them up fast.

Web objects, such as text and images, in the Decentralized Web are generally retrieved based on a computed hash of the content. This is called “content addressing.” Therefore, a request for a webpage from the network will be based on its hash rather than contacting a specific server. This object can be served from any D-Web server without worrying that it will be faked because the contents will be checked to make sure it is the right content by rehashing it and checking to make sure it was right.

For the purposes of this post, we will use the basic machines that the petabox-in-great-roomInternet Archive currently uses as a data point. These are 24-core, 250TByte disk storage (on 36 drives), 192GB RAM, 2Gbit/sec network, 4u height machines that cost about $14k. Therefore:

  • $14k for 1 D-Web server

Let’s estimate the average compressed decentralized web object size is 50KBytes (an object is page, javascript, image, movie—things that make up a webpage). This is larger than what the Internet Archive web crawl average, but it’s in the ballpark.

Therefore, if we use all the storage for web objects, then that would be 5 billion web objects (250TB/50KB). This would be maybe 1 million basic websites (each website would have 5 thousand web pieces which I would guess is much more than the average WordPress website, though there are of course notable websites with much more). Therefore, this is enough for a large growth in the decentralized web and it could keep all versions. Therefore:

  • Store 5 billion web objects, or 1 million websites

How many requests could it answer? Answering a decentralized website request would mean to ask “do I have the requested object?” and if yes, to then serve it. If this D-Web server is one of many, then it may not have all webpages on it even though it seems we could probably store all pages for a long part of the growth of the Decentralized Web.

Let’s break it into two types: “Do we have it?” and “Here is the web object”. “Do we have it?” can be done efficiently with a Bloom Filter. It is done by taking the request, hashing it eight times and looking up those bits up in RAM to see if they are there. I will not explain it further than to say an entry can take about 3 bytes of RAM and can answer questions very, very fast. Therefore, the lookup array for 5 billion objects would take 15GB, which is a small percentage of our RAM.

I don’t know the speed this can run, but it is probably in excess of 100k requests per second. (This paper seemed to put the number over 1 million per second.) A request is a sha256 hash, which, if recorded in binary, is 32 bytes. So 3.2MBytes/sec would be the incoming bandwidth rate, which is not a problem. Therefore:

* 100k “Do We Have It?” requests processed per second (guess).

The number of requests able to be served could depend on the bandwidth of the machine, and it could depend on the file system. If a web object is 50KB compressed, and served compressed, then with 2Gbits/second, we could serve a maximum of 5,000 per second based on bandwidth. If each hard drive is about 200 seeks per second, and a retrieval is four seeks on average (this is an estimate), then with 36 hard drives, that would be 1,800 retrieves per second. If there were popular pages, these would stay in ram or an SSD, so it could be even quite faster. But assuming 1,800 per second, this would be about 700Mbits/sec which is not stretching the proposed machines. Therefore:

* 1,800 “Here is the web object” requests processed per second maximum.

How many users would the serve? To make a guess, maybe we could use the use of mobile devices use of web servers. At least in my family, the web use is a small percentage of the total traffic, and even the sites that are used are unlikely to be decentralized websites (like YouTube). So if a user uses 1GByte per month on web traffic, and 5% of those are decentralized websites, so 50MB/month per user of decentralized websites could give an estimate. If the server can serve at 700Mbits/sec, then that is 226Terabytes/month. At at the 50MB usage that would be over 4 million users. Therefore:

* Over 4 million users can be served from that single server (again, a guess.)

So, by this argument, a single Decentralized Web Server can serve a million websites to 4 million users and cost $14,000. Even if it does not perform this well, this could work well for quite a while.

Obviously, we do not want just one Decentralized Web Server, but it is interesting to know that one computer could serve the whole system during early stages, and then more can be added at any time. If there were more, then the system would be more robust, could scale to larger amounts of data, could serve users faster because the content could be brought closer to users.

Performance and cost do not seem to be a problem—in fact, there may be an advantage to the decentralized web over current web server technology.

Geez, Now Internet Insurance?

We seem to make some people mad.

The Internet Archive, a non-profit library, hosts many things. Many, many things. Billions of old webpages, lots of concerts, nostalgia computer games, TV, books, old movies, contributed books, music, and video, and much more.

But some of it seems to make some people mad. China is blocking us, Russia recent stopped blocking us, and India took a crack at blocking us last year. And then there are the occasional denial-of-service attacks by who-knows-who? One recent DDoS attack was apparently claimed by some Anonymous-linked group. Another one seemed to ask for a bitcoin to turn it off. Yup, “Pay us $400 and we will put you back on the air.” Really?  (We didn’t give it to them.)

Each time this happens, it causes a bunch of engineers and managers to run around to deal with it. Thankfully, a bunch of people donated this last time, out of sympathy, I guess — thank you!

We have tried to handle these without architectural changes, but it is getting hard. This last time we had to call a vacationing engineer in the middle of his night… Zeus knows we have enough self-inflicted screwups and growing pains to deal with. But now this?

One change we could make would be to send our traffic through CloudFlare, or similar, to filter out unwelcome packets as an “Insurance against Internet attackers.” Some people go to “cloud services” that have the sysadmins filter out the zealous ones. Both of these solutions would mean that our traffic would go through someone else’s hosts, which means $, privacy loss, and general loss of the end-to-end Internet. It is like converting to Gmail because there are so many spammers on the net and Google is capable of filtering out those losers.

The Internet Archive is trying to demonstrate that an affordable, end-to-end strategy works:

  •     we protect our reader’s privacy by running our own servers, and try not to log IP addresses;
  •     we don’t want to have co-location centers that control physical access to our servers, so we build our own;
  •     we don’t like having someone else run our email servers, but we get deluged with spam;
  •     we do not want to have someone else control our IP addresses, so we have our own ASN;
  •     we want the web to be even more resilient against the censors and the rot of time, so we pioneer the Decentralized Web.

Having our traffic filtered by a third party only when we are attacked may not be so bad, but it shows it is harder and harder for normal people to run their own servers.

Let’s work together to keep the Internet a welcoming place to both large and small players without needing insurance and third-party protectors.

Optimistically yours,

Brewster Kahle

Founder and Digital Librarian

Decentralized Web Summit: Towards Reliable, Private, and Fun

group3

[See coverage by the NYtimes, Fortune, Boing Boing, other press]

vint-cerf_tim-berners-lee-brewster_kahle

Internet Archive Founder, Brewster Kahle, the father of the Internet, Vint Cerf and Sir Tim Berners-Lee, “father of the World Wide Web,” at the first Decentralized Web Summit in San Francisco.

More than 300 web architects, activists, archivists and policy makers gathered at the Internet Archive for the  first Decentralized Web Summit, where I was honored to share a stage with internet pioneers, Vint Cerf, and Sir Tim Berners-Lee. We wanted to bring together the original “fathers of the internet and World Wide Web” with a new generation of builders to see if together we could align around–and in some cases reinvent–a Web that is more reliable, private, and fun.  Hackers came from Bangkok to Boston, London and Lisbon, New York and Berlin to answer our call to “Lock Open the Web.”

Building a web that is decentralized— where many websites are delivered through a peer-to-peer network– would lead to a the web being hosted from many places leading to more reliable access, availability of past versions, access from more places around the world, and higher performance. It can also lead to more reader-privacy because it is harder to watch or control what one reads.  Integrating a payments system into a decentralized web can help people make money by publishing on the web without the need for 3rd parties.  This meeting focused on the values, technical, policy, deployment issues of reinventing basic infrastructure like the web.

Mitchell BakerFirst in the opening welcome, Mitchell Baker, head of Mozilla, reported that Mozilla, the company that made open main-stream, is going back to the core values, focusing on what users want the Web to be.  Mitchell said Mozilla is rethinking everything, even what a browser should be in the coming age. She highlighted four principles we need to think about when building a Decentralized Web:  that the Web should be Immediate, Open, Universal and have Agency–that there are policies and standards that help users mediate and control their own Web experiences. Talking about the values that need to baked into the code turned out to be the dominant theme of the event.

 

vint1Next, Vint Cerf, Google’s Internet Evangelist and  “father of the Internet,” called for a “Self-Archiving Web” in the first keynote address.  He described a “digital dark age” when our lives online have disappeared and how a more advanced Web, one that archives itself throughout time, could help avoid that outcome.  Over the three days of events, how to actually build a Web that archives itself came to seem quite doable.  In fact,  several of talented groups, including IPFS and the Dat Project, demonstrated pieces of what could make a Decentralized Web real.

Tim Berners-Lee (father of the Web) opened by saying the current technology and protocols could and should etimvolve to incorporate what we want from of our Web. He told us he created the
Web to be decentralized, so that anyone could set up their own server or host their own domain. Over time the Web has become “siloized” and we have “sold our soul of privacy in order to get stuff for free.” When Tim said rethinking the HTTP specification is feasible–the possibilities for change and improvement opened up for everyone.

 

bk2Brewster Kahle of the Internet Archive (me) ventured we wanted a Web that baked our values into the code itself– Universal Access to all
Knowledge, freedom of expression, reliability, reader privacy, and fun.

To build reliable access requires serving websites from multiple places on the net. We heard proposals to build “multi-home” websites using content-addressible structures rather than contacting a single website for answers. There were demonstrations of ZeroNet, IPFS, and DAT that did this.

Protecting reader privacy is difficult when all traffic to a website can be monitored, blocked, or controlled. The security panel that included Mike Perry of Tor and Paige Peterson of MaidSafe, said that having one’s requests and retrieved documents “hopping around” rather than going straight from server to client can help ensure greater privacy. Combining this with multi-homed access seems like a good start.

We can start making a smooth transition from the current Web to leverage these ideas by using all of our current infrastructure of browsers and URL’s–and not requiring people to download software. While not ideal, we can build a Decentralized Web on top of the current Web using Javascript, so each reader of the Decentralized Web is also a server of it, allowing the Web naturally to scale and reinforce itself as more readers joined in. The Internet Archive has already started supporting this projects with free machines and storage.

BK and TBL“Polyfill” was final bit of advice I got from Tim Berners-Lee before he left.  Polyfill, he said is a kind of English version of Spackle, that is used to fix and patch walls. In this case, Polyfill is Javascript.  He said that almost all proposals to make a change to the Web are prototyped in javascript and then can be built in as they are debugged and demonstrated to be useful.

There we have it: let’s make polyfill additions to the existing Web to demonstrate how a Reliable, Private, and Fun Web can emerge.

Congratulations to the Internet Archive for pulling this together.

Arms Raised Group Shot Builders Day