FedSearch - Federated network search engine

FedSearch

Doc Edward Morbius ⭕ · @dredmorbius

2082 followers · 14677 posts · Server toot.cat

@jonny My principles here are:

The filename should be descriptive and not simply unique.
It should be human-meaningful in some manner if at all possible.
It should scope to the collection size / namespace.

Estimates I'm aware of are that there are on the order of 100--200m books ever published, growing at ~1m year, and a generally comparable set of scientific articles. News organisations such as Reuters, AP, and AFP produce about 1k--5k items daily, and I suspect many of those are photos or videos. Major newspapers tend to produce about 100--500 stories daily (weekday vs. weekend). You can work out ballpark maths from that.

For correspondence, the originator and recipient ("From:" and "To:" are both significant. Those might be referenced. Publishing, to a general audience, is in a sence correspondence where "From:" == Author and "To:" == World.

The filename need not be precise, exact, or an accurate presentation of conents, but USEFUL. That is, within a corpus, can I find a specific work or works of interest. In this sense, the titling scheme is an example of the principle I've developed that search is identity, in the sense that a search might produce 0, 1, or n>1 results. 0 is null, 1 is identity, and > 1 is a result set.

There are other naming and cataloguing schemes. A complete system would have correspondences between these and the conventional / human-readable titles, e.g., ISBN, LOCCS, OCLC, DOI, etc.

And yes there are other cataloguing systems such as SuDoc (used by the US government) which are useful in their own contexts.

Author, date, content, audience, and publisher are generally useful search-space reducing concepts of fairly generally applicable context. E.g., if I were including, say, store receipts or purchase orders, the vendor, customer, date, location, and a summary of contents (say, largest item) a description. Computer logs tend to be time and process/service oriented, perhaps also mentioning user or network address, etc.

Related hashtags and discussion:

#docfs #webfs #KFC #PaulOtlet #Maundenaum

#docfs #webfs #kfc #PaulOtlet #maundenaum

Last updated 4 years ago

Original post

Doc Edward Morbius ⭕ · @dredmorbius

2082 followers · 14677 posts · Server toot.cat

@Valenoern This is the essential idea behind "docfs", which would be a document-oriented filesystem. Its networked sibling being "webfs".

"Document" here is in the sense of #PaulOtlet, of any durable record. That might be a text, image, sound, video, multimedia content, data, software, or an amalgamation or melange.

One of my key ideas is that the metadata for these documents would be part of the filesystem, extending the notion of what constitutes file-centric data. I'd like to see some form of bibliographic data presented, where available for public and published media (book, articles, audio recordings, films).

Search is another element, and one idea for the filesystem would be as a virtual filesystem in which attributes could be supplied until a single item matching those criteria was found. "Identity is search".

For projects, some concept of structured workflows, with groups, tasks, milestones, and contributing data. For a sufficiently structured organisation, security and access controls.

I'd like the whole concept to be as commercialisation-hostile as possible, with both copyrights and payments entirely out of scope.

#docfs #webfs #kfc #maundenaum #DublinCore #metadata #bibliography #Plan9OS #Schopenhauer

#PaulOtlet #docfs #webfs #kfc #maundenaum #dublincore #metadata #bibliography #plan9os #schopenhauer

Last updated 4 years ago

Original post

Doc Edward Morbius ⭕​ · @dredmorbius

Doc Edward Morbius ⭕​ · @dredmorbius

Doc Edward Morbius ⭕ · @dredmorbius

Doc Edward Morbius ⭕ · @dredmorbius