FedSearch - Federated network search engine

musicmatze :rust: :nixos: · @musicmatze

905 followers · 3090 posts · Server social.linux.pizza

Is there an alternative to #ipfs and #ipld out there?

Essentially I want a way to build a DAG/Merkle Tree, a content-addressable storage, where I can define the node format, can store binary data and can find data from remotes via hashes...

All of that is offered by ipfs 😭 but I don't want to write go or javascript for that, I want to be able to use #rust

I am almost at the point where I think of trying #typescript and use the JS implementation of ipfs... almost.

#ipfs #ipld #rust #typescript

Last updated 1 year ago

Original post

JP · @byjp

94 followers · 366 posts · Server hachyderm.io

I am, for geeky reasons, mostly on Bluesky these days. Send me a DM if you’d like an invite.

(Geeky reasons: It uses #IPLD, and I *think* I can build a fully static #ATProto server on #IPFS, so my static blog can use ATProto for comments only in HTML and CSS (no JS!))

#ipld #atproto #ipfs

Last updated 1 year ago

Original post

Jonny · @jonny

657 followers · 1847 posts · Server neuromatch.social

ok so re-reading #IPFS paper and there are a few things I think in retrospect are undesirable about the #MerkelDAG spec. it's hard to parse them out as separable ideas because they depend on one another, but the main thing I think is how it conflates the structure of a metadata graph, the content of the graph, and the notion of authorship/identity.

In (basic) IPFS, each node contains some data and some links. the data is some unspecified binary blob, the links are all references to hashes of other nodes, and then the hash of all that identifies the node. There are some abstractions like flattened trees that can represent n-depth links, but that's the gist. I'm refreshing myself, so correct me where I'm wrong.

This makes traversing the graph expensive from a naive (cacheless) state- you have to fetch each node and parse its links serially, and since there isn't a notion of authorship except when used to sign a node, you might have to do the resolution process across a lot of the network instead of being able to say "ah ok this is from this identity so I should ask their neighborhood first"

Since the links are untyped, and because of the need for serial resolution, you can't really "plan" queries and move the query logic to the "edges" (in a networking, rather than graph parlance) of the network - the network resolution logic handles all that.

This structure also makes it so you can't "talk about" a node. A node contains its links. The links are directional, so I could make some statement about a node by pointing to it, but I can't, as a third party make a link under my identity, separate from the author and content of the node, that points from some object to another. That makes the network more like a hard drive than a social space.

Further, since links aren't typed, you have to move that metadata inside the node, and since "keys" for identifying different fields in the node aren't themselves links, you can't have any notion of "schema" where a term can be reused. So there isn't really a facility for being able to do graph queries like "find me this type of data whose field has this value" which restricts a whole huge range of possibilities too long to list here. This also makes knowing what the binary data inside a node is. #IPLD and #Multiformats are intended to solve, post-hoc.

I'll stop there for now, and save what I think could be a different model for later, but I am thinking along the lines of merging with #LinkedData #Triplets , encoding the notion of authorship into links (so that links can have an "utterance" rather than "fact" ontological status), a notion of container/contained for explicit block formation and metadata separation, and formalizing the notion of orthogonal Merkel DAGs to change the points where the content addressing happens to be able to have "graph subunits" that allow for cycles at a "complete" scope but for the purposes of hashing have no cycles. very much #WIP, still at conceptual stage haven't started writing spec yet.

#LongPost #p2p #WorkingInPublic

#ipfs #merkeldag #ipld #multiformats #linkeddata #triplets #wip #longpost #p2p #workinginpublic

Last updated 1 year ago

Original post

Jonny · @jonny

589 followers · 1530 posts · Server neuromatch.social

Gist - IPLD: Codecs and Completeness

OK I'm starting my #p2p #LinkedData reading list to get started drafting a protocol and I'm checking out #IPLD - lots of really good ideas here, and plenty to learn from. It has a bit of a different focus than what I have planned, but some stuff i like and some stuff I can learn from:

This typology of complete vs incomplete codecs - I'm learning that one of the major ways I differ in thinking from a lot of prior art is an explicit embrace of heterogeneity, vernacularism, and mess as desirable features of an expressive system rather than designed out by engineer types as an error. I think explicitly allowing for incomplete/imperfect translation between schema is super important for systems of expression, after all it's how language works! So I really liked seeing the "Incompleteness is Valid" section. I also love some of the terminology here, "topowild," "plane-mangling," "underkinded."
How IPFS gateways work - I have always wondered how this works, and it is a pretty concise access point to seeing why some of the ideas in IPLD are good ones. I like Protocol Labs general approach to bridging across protocols, and being able to access data in #IPFS from HTTP is a really important part of how IPFS gets used (eg. by libgen). I want to read more about how other protocols approach (or don't) interoperability like this.
CIDs are interesting - - but i ultimately I think they collapse too much information because of how they are intended to be used. It binds the metadata and data to a specific codec, which has attractive qualities, but it becomes clear that they had to do a lot of work around the fact that there is no division between metadata and bytestring/binary data for querying, selecting, etc. Having to expand blocks is not really desirable, and it also is related to one of the bigger problems I see with this approach, which makes data modeling pretty damned complicated having to deal with blocks, models, advanced data layouts, schemas, etc. You can also really see how the blockchain stuff starts to seep into the rest of the ecosystem design starting around here and in adjacent stuff like graphsync
The tricky-choices section is extremely interesting and i wish more projects had something like it. In particular I liked the discussion on why ordering by default is a good decision in graphs/maps that don't necessarily need order.
The schemas docs are pretty revealing about the direction of the project, values, design priorities, etc. In particular they are "developer"-oriented interfaces, rather than something intended for any old person out there to be able to structure data with. They share some of what I'm thinking about re: structuring existing data, but the combination of the data schema with the serialization has similar points of difficulty as with CIDs. I want to read more about these and ADLs because i dont' have time rn to do but they seem p subtle and worth spending time with.

I depart from a lot of their design decisions, and it's also clear that this is something that evolved in the process of developing IPFS (they say as much) to fill gaps as they were emerging, rather than a foundational part of the ecosystem. In particular I think the blockchain brain ties them to this notion of immutability, append-only stuff which (imperfectly) trades off with needs for privacy and careful scoping/permissions, valuing verifiability and structuredness above ease of expression, and etc. Regardless, interesting to see a bit of the way they think, particularly since they're a bunch of years ahead of me in dealing with the practicalities of implementation.

I'm gonna try and do this project in public, writing as I go on here rather than limiting to an end piece, so if u want to avoid future posts like this from me in the future u can mute the #Longpost and #WorkingInPublic hashtags which will be sort of wandering like this.

#Longpost #Protocols #WorkingInPublic

#p2p #linkeddata #ipld #ipfs #longpost #workinginpublic #protocols

Last updated 2 years ago

Original post