Doc Edward Morbius ⭕​ · @dredmorbius
2307 followers · 15190 posts · Server toot.cat

Pondering the Big Questions:

When did "meet-cute" become A Thing?

Ngram Viewer says ... mostly post-2010:

books.google.com/ngrams/graph?

It seems recent to me.

(Both "meet cute" and "meet-cute" plotted. I suspect the unhyphenated version will have numerous false positives as in "meet cute (girl(s)|guy(s))".)

#ngrams #NgramViewer

Last updated 2 years ago

Doc Edward Morbius ⭕​ · @dredmorbius
2168 followers · 14992 posts · Server toot.cat

On the changing of language usage patterns over time, homelessness is an interesting case.

I'd discovered some time back, that term broke into usage suddenly in 1980. It wasn't entirely unknown before, but the concept often appeared as a compound verb, "made homeless", rather than as a noun, "homeless (man|woman|person)", and almost always as an immediate consequence of some disaster, such as a structural fire, hurricane, flood, or earthquake. Earlier terms that had been used to describe long-term lack of reliable housing include vagrant, itinerant, and the like (I'd need to look these up again).

Part of this seems to be due to changes in how housing was approached in the US, and especially the elimination of alternatives to single-family dwellings (e.g., rooming houses, residence hotels) in many areas. But some also seems to be a linguistic, social, and political change in usage.

Ngram: "homelessness": books.google.com/ngrams/graph?

Ngram: "homeless, vagrant, itinerant": books.google.com/ngrams/graph?

The message is that ngrams and the Google corpus are useful but also require interpretation.

#ngrams #NgramViewer #homelessness

Last updated 2 years ago

Doc Edward Morbius ⭕​ · @dredmorbius
2170 followers · 14989 posts · Server toot.cat

Google Ngrams: "white nationalist"

Apropos some recent discussions, I've been looking into a number of aspects of this term and aspects related to it.

Google Ngram Viewer is a powerful, if occasionally problematic, tool for exploring language and terms used within it.

An ngram of the headline phrase of this toot ... shows an immense rise in prevalence of the term through 2019 (the most recent data in the corpus), roughly 10 times the 2010 level.

What's driving that isn't necessarily clear --- language and usage reflects both the reflected real-world phenomena described by terms, and preferences for certain terms over others.

But it's attention-grabbing all the same. And a bit sobering.

books.google.com/ngrams/graph?

#ngrams #NgramViewer #racism

Last updated 2 years ago

Doc Edward Morbius ⭕​ · @dredmorbius
2083 followers · 14675 posts · Server toot.cat
claude · @mathr
287 followers · 2739 posts · Server post.lurk.org

Doing some n-gram analysis of texts, trying to see what sort of abstract structural features of language exist on a statistical kind of level.

Image is a graph of probability (log scale) against n-gram size.

The solid purple curve decreases increasingly rapidly as n increases. I think this indicates that distinct n-grams get increasingly more sparse (in the space of all character combinations) as n increases.

The dashed green curve decreases very rapidly until a minimum at n = 4, P = 10^{-26}, then increases less rapidly but at a steady rate, near 10^{-4} at n = 10. This shows that given a string of 140 characters whose (n-1)-grams are all found in the corpus, it's increasing likely (as n increases) that all its n-grams are found in the corpus too (provided n > 3).

I don't know what this implies about the nature of patterns of various scales in human language.

Corpus for this experiment was gutenberg.org/files/48320/4832

Runtime of my Haskell code to analyse the data set was 1m40s.

#ngrams #statistics #probability

Last updated 2 years ago

Verwechslungsgefährte · @dichotomiker
70 followers · 1654 posts · Server qoto.org

Who else is interested in and and uses custom tools on their ?

#ngrams #googletrends #data

Last updated 2 years ago

Doc Edward Morbius ⭕​ · @dredmorbius
2082 followers · 14677 posts · Server toot.cat
Doc Edward Morbius ⭕​ · @dredmorbius
2082 followers · 14677 posts · Server toot.cat
Doc Edward Morbius ⭕​ · @dredmorbius
2082 followers · 14677 posts · Server toot.cat

In case you were wondering, Christmas is in fact doing just fine

If there ever was in fact a war against it, that ran from 1950--1980.

Via Googe Ngram Viewer US English Corpus

books.google.com/ngrams/graph?

#christmas #ngrams #language

Last updated 3 years ago

Doc Edward Morbius ⭕​ · @dredmorbius
2070 followers · 14629 posts · Server toot.cat
Doc Edward Morbius ⭕​ · @dredmorbius
2070 followers · 14630 posts · Server toot.cat