Want to help build a distributed web?

Posted on February 15, 2012 by Brewster Kahle

Isn’t the web distributed now? No really, let me illustrate– ever IM your friend that is near you “Hey, wanna see a cool video? check out this URL”? Then they download the same video you just downloaded from the original server even though it might be a long way away, rather than from your machine. This is slow, expensive, wasteful, and well, dumb.

What if, with no browser or server config other than maybe downloading a plug-in:

all bigger files come from the folks near you or the original server, whatever is faster?
What if the website gets to keep download counts, and keep their website up-to-date.
Website gets get reduced bandwidth bills, and get superstar user satisfaction because of faster speed than YouTube
Web users, even in remote countries, get that “I am sitting on a gig-e network in palo alto” feel.
Less money goes to monopoly phone companies.

Is a real problem? Yes:

Internet Archive servers 2million people each day. Egyptians and Japanese are two of our most popular user communities.
They download the same files over and over. There is someone with the file that is closer to them than us.
the 20gigabits/sec of bandwidth costs us a fortune.
others want to serve video, but don’t because of the cost.
others host on youtube, or amazon, or archive.org but would rather not.

Would be great, right? What it takes:

A browser plug-in, and eventually get the browsers to do it natively.
When a user clicks, the browser starts downloading from a site (the site then gets the download credit)
Website serves unique hash for the file and the length of the file in the header and then serves the file as normal (archive.org and other sites do this already)
Browser looks up the hashcode in a “trackerless p2p” system, I think bittorrent can be used for this.
If others have it via p2p, then it gets it from those users as well, so it is not slower than getting it from the website
After the browser downloads it, they offer it to others via p2p.

What do we get?

Less expense for web site owners operators, but keeps them in control and in the loop
Faster and less expensive for users
More sites taking control of their own stuff (don’t need to give your files to remote organizations)
Being far from the server is not as much of a penalty

Who can help?

people that can help debug the idea (and maybe it is already done…)
browser plug-in programmers
p2p super distributed trackerless hashcode knowledgable folks
the Internet Archive will seed all of its files for this system.
we need enthusiasm, a cool logo/mascot, and coffee.

Please comment on this post as a first round to see if we can debug the idea and get critical mass.

-brewster

18 thoughts on “Want to help build a distributed web?”

Casse February 15, 2012 at 6:10 pm

You’re probably thinking about something like Freenet — it’s been around for a while, but never really achieved mainstream popularity (perhaps because no one wants to accidentally cache illegal content?). I’ve always thought that, were it to take off, it’d be one of the cooler things to happen to the Internet; but if it were going to happen, it would’ve already.
Now if you’re just talking about filesharing, not automatic caching, then take a look at Opera Unite: it’s a one-click server built into a web browser, which is pretty much as simple and fast as anything could be. In theory, people would create and consume content directly in the browser, then share it back and forth as they please. In practice … well, it’s Opera. Even with Mini’s success, few people even realize there’s a desktop app, too!
tef February 15, 2012 at 6:39 pm

To some extent, The Pirate Bay have been doing this already: Links to files have been replaced with a Magnet URI (a uri with a hash digest). When a user clicks on it, the torrent client starts the download. The clients form a DHT to share and lookup magnet information.

http://torrentfreak.com/bittorrents-future-dht-pex-and-magnet-links-explained-091120/

Unlike the pirate bay, IA can run peers in the swarm, as well as a tracker (the magnet links can point to it), to have less reliance on the DHT and Peer Gossip.

http://en.wikipedia.org/wiki/Magnet_URI_scheme
http://bittorrent.org/beps/bep_0009.html

This seems to solve many of the problems you outline above – The remaining problem is how to upgrade the existing downloads/links with magnet information.

Unfortunately, there is no established way to augment a link with a magnet link. Or to upgrade in-place from a HTTP download to a torrent download. Putting a link in the header would seem a good place to start: Link: ; rel=”alternate”; title=”magnet”. Another approach would be trying to detect if a plugin was present, and using javascript to re-write all marked links on the page with magnet uris.

It may just be easier to add magnet links to all download links, run a tracker & peers for now, and solve the automatic-upgrade problem later. I would wager that a significant chunk of users who are downloading from the internet archive are savvy enough to use bittorrent, or at least the ones capable of installing a browser plugin.
Tod Robbins February 15, 2012 at 7:13 pm

How about a logo?
Ben Moskowitz February 15, 2012 at 8:26 pm

The folks at Wikimedia worked with P2PNext to do something very similar:

http://trial.p2p-next.org/

I believe that the SwarmPlayer is currently functional for Wikipedia-hosted videos.

Would be very interesting to learn from their experience.

And, the Ignite challenge that’s launching later this year would be a great place to explore the possibilities of systems like this: https://mozillaignite.org/blog/
brewster Post authorFebruary 16, 2012 at 3:06 pm

Thank you, Ben, we will check out Swarm.

On twitter iamtef suggested that “put a magnet url in a Link rel=… header, use a bittorrent plugin https://en.wikipedia.org/wiki/Magnet_URI_scheme”

On Twitter iamtef suggested that “to some extent, the pirate bay have already done this https://torrentfreak.com/bittorrents-future-dht-pex-and-magnet-links-explained-091120/ ”

We thank you for those suggestions and the others that have appeared. We have met and will be trying them on for size to see.

I love the web– so much has already been done, sometimes you just have to find it.

-brewster
Mary Murrell February 16, 2012 at 8:10 pm

Is this a possible solution?:

http://en.wikipedia.org/wiki/Content-centric_networking
Gordon Mohr (@gojomo) February 17, 2012 at 6:36 am

Indeed the p2p web would be one instance of content-centric networking. Content-centric networking is an interesting idea because it obviously dominates other architectures if people can be free and open about what they have cached/shareable/mass-reproducible. But if some things are illegal to cache, and others embarrassing, lots of possible optimizations disappear as everyone hides what they’re up to.

A P2P web needs one kind of identifier *other* than a pure hash, and that’s an identifier that says in a totally general fashion: ‘this authority uses this name for this stream-of-bytes’. An HTTP identifier communicates: ‘this domain-name uses this /path for this following content-body’… but the P2P authorities might just be long-lived signing keys. This lets the authority update what [key-identity]/rootpage means over time, even without a single-canonical server.

One rough sketch of how such a URI might work, inspired by the Freenet identifiers, I wrote up about the same time as the original magnet-URI proposal: http://zgp.org/pipermail/p2p-hackers/2002-July/000719.html

Nowadays you might add that to a shared log like the bitcoin/namecoin blockchain (or EFF ‘sovereign keys’ append-only structure) to have durable, hard-to-erase, still-possible-to-update public associations of (key/path) -> (current-version-hash). That is, the compact authoritative naming can go into a globally-verifiable and highly-replicated log, a structure now better understood from prior projects. Meanwhile, the bulk content location/swarm-delivery falls to a DHT/P2P net well understood from BitTorrent and its ilk.
John February 18, 2012 at 3:30 pm

Hi,
would you please tell us exactly what will you doing? you want to close IA? So how can we reach the site? we love your site.
Phil Culmer March 5, 2012 at 10:06 pm

@John,

no, the idea is not to close the IA – it’s to supplement the site with additional technology so that people can cache the media files.
This would then allow people to get the files from other people’s caches, reducing the load on the internet, and on IA.
One way of doing this may work something like bittorrent.
You could still reach the site just the same as normal, but there would be plugins that would let you get media files more quickly.

Hope this makes things clearer.
VPSLIST March 11, 2012 at 4:44 am

How much data are we talking about?

Solution for the Egyptian and Japanese communities; a torrent file and some seeding can help but not a lot of people, I fear, will “donate” their bandwidth with the mess the telecoms have with our infrastructure in right now.

I refuse to get a smart phone because why pay for an expensive amount of next to nothing bandwidth?
dizi-filmi March 13, 2012 at 9:13 pm

people to get the files from other people’s caches, reducing the load on the internet, and on IA.
Peter James Herz March 19, 2012 at 12:36 pm

Interesting idea, though I’ve thought and have since heavily researched this area since I worked at Akamai back in mid 2000’s and discovered torrents around then too. Immediately I saw how CDNs love to corner this ‘distribution’ problem by either patenting/owning multicasted/p2p-assisted technology or just not acknowledging/utilizing it altogether to drive up costs on their b2b client. Now Akamai bought RedSwoosh years ago to research possible p2p / http convergence over torrents. But even more fascinating and I believe will ultimately make this article a (perhaps naive) joke is Adobe’s recent efforts in this area via Flash / AIR using RTMFP .. the “F” added to RTMP for ‘flow control’ which is a greatly understated word for what Flash now does via Cirrus.. which is essentially sophisticated p2p delivery for video or any kind of binary asset / peer discover as well as peer discovery mechanisms. Also there’s an open ‘rendezvous’ / architecture of Cirrus called ArcusNode. Fascinating stuff and yeah, sorry they beat ya to it. Plus in HTML there’s similar things possible with something called WebSockets, but unfortunately it is highly less evolved than where Flash is — as with virtually everything else — hence why Steve Jobs wanted it dead because it wasn’t his and it was really ultimately more powerful and relevant than open standards or competing closed ones (only one I think of are ‘apps,’ Java, .Net, ECMAScript).
1. brewster Post authorMarch 19, 2012 at 5:44 pm
  
  Thank you for the informed note. Are you interested in working on this? We have not found a smooth solution to this, and would love to.
  
  -brewster
ac March 27, 2012 at 1:00 pm

> the 20gigabits/sec of bandwidth costs us a fortune

Where is that? You should shop around various euro countries (.nl,de,cz,fr,uk) and look at cdn’s for popular larger files. My understanding is that bandwidth is more expensive in the US and not trivially so.

Then when you find a good offer, you should send that for some of the big US providers and clouds and see if they’re interested.

I’m just saying that 20 gbit ain’t much these days and hobbyists can afford it, the bandwidth expense should be quite minimal compared to labour.

Of course it would make sense to have US,Euro,Asia mirror of the site if you don’t go with a cloud provider and automatic fallback if there’s connection problems or you need to quickly switch.

There may also be public funded resources available for site with such purposes or other way to look would be to see if the site could be put in public-funded network which has some similar operations. But it’s still worth shopping around since these public networks may not shop around that much and if they’re going to charge you something, it’s hard to say if it will be competitive.
ac March 27, 2012 at 1:18 pm

btw. web.archive org is quite unresponsive right now, it’s throwing Internet Explorer cannot display the webpage on maybe 40% of page load attempts. I don’t think this is solved with cheaper bandwidth but having either cloud or cdn for large files+distributed datacenters for small files. I’m in europe and yet I’m getting data from US, that’s probably not the cheapest or lowest latency way to do things.
deadlywind paintball April 25, 2012 at 6:52 pm

Love this idea, and has been one I’ve thought about for a very long time. What I’d actually like to see is a similar distributed architecture used for a search engine as well… ie, distributed crawlers and indexes. this would have it’s own full set of (massive!) challenges to implement, but should have a lot of technological overlap with the distributed and trusted caching system you describe here.
deadlywind paintball April 25, 2012 at 7:00 pm

Ah and I almost forgot – Microsoft included something called “peer caching” in the 3.0 verison of BITS, which does something similar to this. It looks like version 4.0 then got rid of this feature and this functionality is now served by something called “BrancheCache”

I knew I’d just read about this somewhere, took a minute to recall…
akb June 2, 2012 at 6:30 pm

The best concept I’ve come across to do this was Dijjer a project of Ian Clarke’s (Freenet creator). It worked basically as you described except instead of a browser plugin it runs as a local service. Websites wishing to make use of the p2p network would publish links with javascript that would check if the service was running locally, otherwise it just downloads the file from the webserver as normal.

Unfortunately the project never gained traction and has been dormant for some time. The code is available, though I believe it never quite matured to production quality.