Architecture for Metacommentary
Mar. 3rd, 2004 05:09 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I had an idea, related to (but different from) some things that Anton and I have been talking about lately.
The basic concept is: what if you could leave comments about web sites. *Any* web site. And other people could see your comments along with everybody else's. So products for sale, documentation, all the various information on the web, could be rated blog-style at their existing locations.
There are some problems, naturally, even if you can write your own browser to display the stuff automatically.
For instance: who hosts the comments? You don't want a web site to host its own comments, or unfavorable product reviews can be quietly removed by the person advertising the product! Censorship should be the right only of the original poster, and perhaps of some central rating authority, though preferably not that either.
It could be done as a web portal even now: you'd go to the portal, and browse in the normal way but through their stuff. The portal would automatically add the comments at the end and a form to post your own. However, that requires a portal and thus a central authority, which again leads to conflicts of interest.
You'd like the comments to be hosted by the commentors in some sense, both for security and to distribute the server load. But then how could you look up all the comments from all the servers for a particular web site? You could have a central server that collated all that information, but again you're back to the censorship and scalability problems, just on a smaller task.
So if you were going to store and query this stuff in a relatively peer-to-peer way (storage is easy that way, you just host your own comments), you'd need some kind of distributed query mechanism. The string you're querying on would be the URL -- any page you can bookmark, you can comment on. The returned items would be the comments, or perhaps a list of URLs/locations for them. Having the comments have URLs would make replying and threading relatively easy, though the efficiency might suck.
But how do you do the query? How do you ask the world at large, "say, what do you think of http://www.sleazy.com/cheap/filling/horsemeat.html"?
The basic concept is: what if you could leave comments about web sites. *Any* web site. And other people could see your comments along with everybody else's. So products for sale, documentation, all the various information on the web, could be rated blog-style at their existing locations.
There are some problems, naturally, even if you can write your own browser to display the stuff automatically.
For instance: who hosts the comments? You don't want a web site to host its own comments, or unfavorable product reviews can be quietly removed by the person advertising the product! Censorship should be the right only of the original poster, and perhaps of some central rating authority, though preferably not that either.
It could be done as a web portal even now: you'd go to the portal, and browse in the normal way but through their stuff. The portal would automatically add the comments at the end and a form to post your own. However, that requires a portal and thus a central authority, which again leads to conflicts of interest.
You'd like the comments to be hosted by the commentors in some sense, both for security and to distribute the server load. But then how could you look up all the comments from all the servers for a particular web site? You could have a central server that collated all that information, but again you're back to the censorship and scalability problems, just on a smaller task.
So if you were going to store and query this stuff in a relatively peer-to-peer way (storage is easy that way, you just host your own comments), you'd need some kind of distributed query mechanism. The string you're querying on would be the URL -- any page you can bookmark, you can comment on. The returned items would be the comments, or perhaps a list of URLs/locations for them. Having the comments have URLs would make replying and threading relatively easy, though the efficiency might suck.
But how do you do the query? How do you ask the world at large, "say, what do you think of http://www.sleazy.com/cheap/filling/horsemeat.html"?
no subject
Date: 2004-03-03 05:37 pm (UTC)no subject
Date: 2004-03-03 05:50 pm (UTC)no subject
Date: 2004-03-03 05:54 pm (UTC)As for distributed query algorithms, you might find MIT's Chord (http://www.pdos.lcs.mit.edu/chord/) work interesting.
The problems that these usually run into are more on the social end: how do you get a large enough group of people to play, so that it's interesting? And once a large enough group is interested, how do you filter out all the crap? You need some sort of community moderation system.
pah
Date: 2004-03-03 07:43 pm (UTC)Sadly they gave up on some short term goodness because they got all architecturally wanky about scalability (http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/Notes/annotations-and-news.html).
People shouldn't throw away compelling code before it has a chance to have scalability problems. Succesful prototypes inspire scalability solutions; reflecting on architecture often results in nothing but reflection.
Re: pah
Date: 2004-03-03 07:47 pm (UTC)However, there's some more interesting stuff toward the end of the paper, so maybe that'll handle the problems in question more gracefully.
I'm actually looking at it for annotations of stuff other than web pages, but since the problem is 90% the same, I figured I'd ask in terms of web pages. People have heard of them and know what they are.
Re: pah
Date: 2004-03-03 08:32 pm (UTC)Re: pah
Date: 2004-03-04 12:24 am (UTC)Anton's stuff works in several dimensions (splits the key up into an N-d entity rather than working on 1-D) and has some reliability metrics built into a cache heuristic. It's similar -- they're both multires, and both rely on probabilistic algorithms to avoid losing keys when nodes go down. Anton's version is just multidimensional, and less rigid in operation. That makes it harder to prove reliability in awful cases (which turns out not to matter in his specific problem domain), but avoids an O(log^2 N) hit every time a node connects or disconnects.
Re: pah
Date: 2004-03-04 05:47 am (UTC)It's hard to know if this intuition is correct without understanding the details, but in more "natural" cases of multi-dimensional indexing (I'm thinking of spatial indexes), indexes with large dimensionalities have a disturbing tendency to degrade into linear lookups (basically because there's no way to guarantee that you can build a balanced tree of hypervolumes). But that fear may not be relevant at all since I really don't know what you mean by "splits the key up into an N-d entity".
More links...
Date: 2004-03-03 10:22 pm (UTC)The odd spot is that all the basic search terms are too generic: commentary, websites, commuity, mark-up, layers, etc. You could see a write-up on the old back-link and annotation stuff at Kuro5hin.
It feels like a "do research before writing code" kind of topic.
Re: More links...
Date: 2004-03-15 05:04 pm (UTC)It's certainly that. Unfortunately, it appears that pretty much everybody runs a separate annotations server. The closest to the kind of bulletproof distribution I'm looking for is NCSA's small group version, which basically boils down to "run an annotations server on your local LAN and only check that."
The closest to the right vision that I'm seeing so far is Xanadu, and their text is generally so impenetrably self-righteous that it's always hard to tell what they've thought of, and what they've actually got design notes for.
Nonetheless, I'm sifting it for nuggets of anything useful.