FedSearch - Federated network search engine

FedSearch

Jonny · @jonny

657 followers · 1847 posts · Server neuromatch.social

ok so re-reading #IPFS paper and there are a few things I think in retrospect are undesirable about the #MerkelDAG spec. it's hard to parse them out as separable ideas because they depend on one another, but the main thing I think is how it conflates the structure of a metadata graph, the content of the graph, and the notion of authorship/identity.

In (basic) IPFS, each node contains some data and some links. the data is some unspecified binary blob, the links are all references to hashes of other nodes, and then the hash of all that identifies the node. There are some abstractions like flattened trees that can represent n-depth links, but that's the gist. I'm refreshing myself, so correct me where I'm wrong.

This makes traversing the graph expensive from a naive (cacheless) state- you have to fetch each node and parse its links serially, and since there isn't a notion of authorship except when used to sign a node, you might have to do the resolution process across a lot of the network instead of being able to say "ah ok this is from this identity so I should ask their neighborhood first"

Since the links are untyped, and because of the need for serial resolution, you can't really "plan" queries and move the query logic to the "edges" (in a networking, rather than graph parlance) of the network - the network resolution logic handles all that.

This structure also makes it so you can't "talk about" a node. A node contains its links. The links are directional, so I could make some statement about a node by pointing to it, but I can't, as a third party make a link under my identity, separate from the author and content of the node, that points from some object to another. That makes the network more like a hard drive than a social space.

Further, since links aren't typed, you have to move that metadata inside the node, and since "keys" for identifying different fields in the node aren't themselves links, you can't have any notion of "schema" where a term can be reused. So there isn't really a facility for being able to do graph queries like "find me this type of data whose field has this value" which restricts a whole huge range of possibilities too long to list here. This also makes knowing what the binary data inside a node is. #IPLD and #Multiformats are intended to solve, post-hoc.

I'll stop there for now, and save what I think could be a different model for later, but I am thinking along the lines of merging with #LinkedData #Triplets , encoding the notion of authorship into links (so that links can have an "utterance" rather than "fact" ontological status), a notion of container/contained for explicit block formation and metadata separation, and formalizing the notion of orthogonal Merkel DAGs to change the points where the content addressing happens to be able to have "graph subunits" that allow for cycles at a "complete" scope but for the purposes of hashing have no cycles. very much #WIP, still at conceptual stage haven't started writing spec yet.

#LongPost #p2p #WorkingInPublic

#ipfs #merkeldag #ipld #multiformats #linkeddata #triplets #wip #longpost #p2p #workinginpublic

Last updated 1 year ago

Original post