FedSearch - Federated network search engine

Matthias MProve · @mprove

68 followers · 17 posts · Server hci.social

mprove.de - Beyond HyperLocal Journalism, World Publishing Expo 2015 @mprove

@julian Yes, the "father of information science".
I like the quote from The Atlantic, »…a global network of “electric telescopes”«

But, frankly, it would be an exaggeration to say I am familiar with #PaulOtlet

/cf. slide 20: https://mprove.de/script/15/beyondhyperlocal/index.html

#PaulOtlet

Last updated 3 years ago

Original post

Doc Edward Morbius ⭕ · @dredmorbius

2083 followers · 14674 posts · Server toot.cat

@Researchbuzz The proximity element is limited as I am, of course, on Altair IV, some 20 of your light years away.

That said, one of my obsessions (though not necessarily a major element of my Mastodon tooting) is information, knowledge, and document management.

The tags #kfc, #webfs, and #docfs will lead to a few of my information-management / search toots / threads.

And if you've got opinions, feelings, and/or deep intel on #PaulOtlet and his #Mundaneum I'm all ears.

@woozle

#kfc #webfs #docfs #PaulOtlet #mundaneum

Last updated 3 years ago

Original post

Doc Edward Morbius ⭕ · @dredmorbius

2082 followers · 14677 posts · Server toot.cat

@jonny My principles here are:

The filename should be descriptive and not simply unique.
It should be human-meaningful in some manner if at all possible.
It should scope to the collection size / namespace.

Estimates I'm aware of are that there are on the order of 100--200m books ever published, growing at ~1m year, and a generally comparable set of scientific articles. News organisations such as Reuters, AP, and AFP produce about 1k--5k items daily, and I suspect many of those are photos or videos. Major newspapers tend to produce about 100--500 stories daily (weekday vs. weekend). You can work out ballpark maths from that.

For correspondence, the originator and recipient ("From:" and "To:" are both significant. Those might be referenced. Publishing, to a general audience, is in a sence correspondence where "From:" == Author and "To:" == World.

The filename need not be precise, exact, or an accurate presentation of conents, but USEFUL. That is, within a corpus, can I find a specific work or works of interest. In this sense, the titling scheme is an example of the principle I've developed that search is identity, in the sense that a search might produce 0, 1, or n>1 results. 0 is null, 1 is identity, and > 1 is a result set.

There are other naming and cataloguing schemes. A complete system would have correspondences between these and the conventional / human-readable titles, e.g., ISBN, LOCCS, OCLC, DOI, etc.

And yes there are other cataloguing systems such as SuDoc (used by the US government) which are useful in their own contexts.

Author, date, content, audience, and publisher are generally useful search-space reducing concepts of fairly generally applicable context. E.g., if I were including, say, store receipts or purchase orders, the vendor, customer, date, location, and a summary of contents (say, largest item) a description. Computer logs tend to be time and process/service oriented, perhaps also mentioning user or network address, etc.

Related hashtags and discussion:

#docfs #webfs #KFC #PaulOtlet #Maundenaum

#docfs #webfs #kfc #PaulOtlet #maundenaum

Last updated 4 years ago

Original post

Doc Edward Morbius ⭕ · @dredmorbius

2082 followers · 14677 posts · Server toot.cat

@vertigo If you're familiar with #PaulOtlet, "document" is pretty much any fixed record: texts, images, audio, video, multimedia, data, software.

For publishing --- looking at texts, I'm thinking along the lines of a Kolmogorov complexity or minimum requisite complexity for a given work --- how much specification is required to create a sufficiently complete representation. I'm leaning heavily to Markdown and LaTeX as primary formats. (Possibly other lightweight markup langs, e.g., asciidoc or reStructured text).

Notion of having a source from which multiple endpoints might be produced: straight text, HTML, ePub, PDF, etc.

#PaulOtlet

Last updated 4 years ago

Original post

Doc Edward Morbius ⭕ · @dredmorbius

2082 followers · 14677 posts · Server toot.cat

@Valenoern This is the essential idea behind "docfs", which would be a document-oriented filesystem. Its networked sibling being "webfs".

"Document" here is in the sense of #PaulOtlet, of any durable record. That might be a text, image, sound, video, multimedia content, data, software, or an amalgamation or melange.

One of my key ideas is that the metadata for these documents would be part of the filesystem, extending the notion of what constitutes file-centric data. I'd like to see some form of bibliographic data presented, where available for public and published media (book, articles, audio recordings, films).

Search is another element, and one idea for the filesystem would be as a virtual filesystem in which attributes could be supplied until a single item matching those criteria was found. "Identity is search".

For projects, some concept of structured workflows, with groups, tasks, milestones, and contributing data. For a sufficiently structured organisation, security and access controls.

I'd like the whole concept to be as commercialisation-hostile as possible, with both copyrights and payments entirely out of scope.

#docfs #webfs #kfc #maundenaum #DublinCore #metadata #bibliography #Plan9OS #Schopenhauer

#PaulOtlet #docfs #webfs #kfc #maundenaum #dublincore #metadata #bibliography #plan9os #schopenhauer

Last updated 4 years ago

Original post

Doc Edward Morbius ⭕ · @dredmorbius

2071 followers · 14639 posts · Server toot.cat

@vortex_egg Fair question.

"Computer" itself is a misnomer. Our devices are informators, they process information (hence: information technology and data processing), and are connected to communications networks and information storage.

The information roles themselves are, generally:

Interpersonal communications. Text, image, voice, video.
Document access, in the sense of #PaulOtlet: any fixed record (text, image, voice, video, ...) not primarily interactive. May be informational or entertainment. I'm including streamed/live acces here.
Commerce & business: buying, selling, and transacting business (buying, selling, services, activities).
Government: Financial (mostly taxes/fees) and other informational transactions.
Personal, household, task/activity, financial management & organisation. Recordkeeping, planning, budgeting, designing, device/services management, etc.
Other technical/scientific data streams, e.g.utility, safety, environmental monitoring, healthcare.
Creation & expression.

@clacke @hisham_hm @kensanata

1/

#PaulOtlet

Last updated 5 years ago

Original post

Matthias MProve · @mprove

Doc Edward Morbius ⭕​ · @dredmorbius

Doc Edward Morbius ⭕​ · @dredmorbius

Doc Edward Morbius ⭕​ · @dredmorbius

Doc Edward Morbius ⭕​ · @dredmorbius

Doc Edward Morbius ⭕​ · @dredmorbius

Doc Edward Morbius ⭕ · @dredmorbius

Doc Edward Morbius ⭕ · @dredmorbius

Doc Edward Morbius ⭕ · @dredmorbius

Doc Edward Morbius ⭕ · @dredmorbius

Doc Edward Morbius ⭕ · @dredmorbius