Coming next week: The #DataSittersClub is back with a new book on corpus-building! #DigitalHumanities
#digitalhumanities #DataSittersClub
All right, public accountability time: I hereby vow that a full draft of the much-delayed #DataSittersClub 19 on how to think through corpus-building will be finished by a week from today. ๐ฆพ #DigitalHumanities
#digitalhumanities #DataSittersClub
In case you're wondering what Quinn's been failing at lately (besides finishing the #DataSittersClub book on corpora), the answer is #Texsolv heddles. "Just slide them on then undo the twist ties. Super easy!" ๐คจ My skill at screwing things up is apparently unparalleled. #DHmakes
#DHmakes #texsolv #DataSittersClub
@thatandromeda I've done enough preliminary dabbling to be able to say conclusively that identifying sexual acts in literature is a hard problem; there's no quick and easy way to do this legitimately at scale (and one would hate to make it easier, since it's not the sort of thing put to good ends.)
The next #DataSittersClub involves corpus selection and suffice it to say, I'm sticking to pizza as the query.
@quinnanya you reference #DataSittersClub a _whole lot_ but I admit I've never seen an explanation of what it ... is. What is it???
@gekitsu @christof @mpe At least personally, I'd rather have a site like that up so we could critique it. Having it pulled down sends the message that the whole thing is illegitimate and wrong. Agreed that getting the right statement to stick is hard, but harder still is doing it when what we want to critique isn't there.
The current draft of the next #DataSittersClub ends with a impassioned section about all this. We'll see how much survives editing.
@christof @mpe There really isn't anything there that would be a copyright problem (other than, at most, the size of the snippet, but at least in the US that should be legit) and Prosecraft itself didn't use anything resembling AI. No texts were available for redistribution. The website didn't even charge membership fees or anything where authors could feel like they're losing out on profit.
Ironically, the next #DataSittersClub is "The Bad Corpus" and we'll have to talk about this.
@jose_eduardo I was thinking about this more last night. Beyond the failure angle, I think the other thing that we try to foreground with the #DataSittersClub is that fundamentally, DH is people. It's people who help us solve problems, who write the code we use, who challenge our assumptions. And still we fail! But it all happens in conversation with others.
@scott_bot @elotroalex @jtheibault My take on this kind of problem is the #DataSittersClub. Write ups of using different methods, and as the project grows, referring back to things we've done already. Each piece is discrete and done, and we may be working on an easier to use entry point especially for folks new to DH. ๐
@jose_eduardo @gworthey Urgh, good point. ๐ฌ Maybe an update/clean-up pass is going to be needed on a few #DataSittersClub books later this summer.
Is there any better news to wake up to than the fact that Norway has digitized All The Books and it's no problem at all to get all their Baby-Sitters Club translations? ๐คฉ #DigitalHumanities #DataSittersClub #corpora
#corpora #DataSittersClub #digitalhumanities
On May 4th, in a halfhearted attempt to gather data for the next #DataSittersClub book on corpora, I searched 30k young reader books for the following terms: chocolate, science, boyfriend, girlfriend, pizza, makeup, mall, prom.
In the clearer-headed light of mid-June, that has got to be the most random throwing-things-at-a-wall list of nouns I've ever come up with. #DigitalHumanities
#digitalhumanities #DataSittersClub
On May 4th, in a halfhearted attempt to gather data for the next #DataSittersClub book on corpora, I searched 30k young reader books for the following terms: chocolate, science, boyfriend, girlfriend, pizza, makeup, mall, prom.
In the clearer-headed light of mid-June, that has got to be the most random throwing-things-at-a-wall list of nouns I've ever come up with. #DigitalHumanities
#digitalhumanities #DataSittersClub
@xandaschofield student collaborator got a genuine #DataSittersClub meeting experience as we meandered from #TopicModeling to the challenges of interdisciplinary work to the fact that you're kidding yourself if you think #academia is mostly you and your code or books or archives-- the whole thing is people, with all the mess that comes with it. I'm super excited about this book, though, I've already learned a ton. #DigitalHumanities
#digitalhumanities #academia #topicmodeling #DataSittersClub
I am finally mentally unblocked on writing the #DataSittersClub draft on #TopicModeling. Unfortunately, that is not the book that is the hold-up on the whole series. ๐ฌ
#topicmodeling #DataSittersClub
@miriamkp I don't know how well it worked tbh, but we had a #DataSittersClub book trying to explain these things through a series of analogies like guessing Claudia's grades, and also comparing GPUs vs CPUs with a shopping spree metaphor: https://datasittersclub.github.io/site/dsc9.html#what-is-machine-learning-anyway
It's been a rainy, gloomy day around here, so I've been getting back to practicing #spinning (which gives me something other than code to get mad at) and writing up a #DataSittersClub debate from our retreat last week.
Pretty excited to be sporting a #DataSittersClub scrunchie today thanks to @quinnanya โ greatly enjoyed chatting with the club at Dartmouth today and sharing in the pizza party.
@storytracer Shout-out to the #DataSittersClub, teaching #DigitalHumanities methods to laypeople using a popular 90's series. There's an overwhelming amount of data on the internet; take care of the things you love, even if it's in a huge collection, because it can get lost in it. #DHd2023
#dhd2023 #digitalhumanities #DataSittersClub
@roopikarisam I can't schedule like you but hoping to contribute nonetheless in scrunchie form! #DataSittersClub #DSCSuperSpecial
#dscsuperspecial #DataSittersClub