gaby_wald · @gaby_wald
87 followers · 16402 posts · Server framapiaf.org

Conseil de lecture : "Data Cleaning (Pocket Primer), Oswald Campesato".

... (sed, grep, awk et autres !)

terrorgum.com/tfox/books/datac

#data #datacleaning #bash #shell #lecture

Last updated 2 years ago

In , we will laugh at ourselves in the coming decades!

Scholars spent hours on and so-called "Fehlerlehre" to deal with all the tiny errors of bibliographic data (2% here, 8% there...). With and the world 'beyond the paper', the data party will goe full chaos and all the efforts from the past will seem tiny!

If we add to this equation, information specialists will be degraded to search engine optimizers...

#artificialintelligence #OpenScience #datacleaning #Scientometrics

Last updated 2 years ago

Volker · @Volker
27 followers · 20 posts · Server fosstodon.org

Can anyone recommend a package for (residential) matching (spelling variations) that can handle typical German address variations?

#python #address #datascience #datacleaning #machinelearning

Last updated 2 years ago

OpenRefine · @OpenRefine
3 followers · 1 posts · Server fosstodon.org

Hello Fediverse, we are , an open source power tool for working with messy data! We finally have an account here and will be sharing news about what is happening in the project and the broader community.

@pintoch

#introductions #openrefine #datacleaning #foss #opendata

Last updated 2 years ago

Andy Brice · @AndyBrice
69 followers · 103 posts · Server hachyderm.io
Karl Broman · @kbroman
1116 followers · 131 posts · Server fosstodon.org

I’m giving a talk about “Data cleaning principles” tomorrow, here at UW-Madison (though via zoom)

kbroman.org/Talk_DataCleaning2

Here’s a video of the original, shorter version from csv,conf: youtube.com/watch?v=7Ma8WIDinD

#datacleaning #datascience

Last updated 2 years ago

Daniel E. Weeks · @StatGenDan
213 followers · 299 posts · Server fediscience.org

Feeling frustrated when trying to put together your dbGaP data submission?

@lwheinsberg@twitter.com
and I are excited to announce our new R package, , which we designed to help YOU (and us) with this complex process! (1/6)

#datasharing #OpenScience #datacleaning #dbgap #genetics #rstats #dbgapcheckup

Last updated 2 years ago

brozu ▪️ · @brozu
71 followers · 486 posts · Server mastodon.uno
Serg Masís :verified: · @smasis
73 followers · 48 posts · Server masto.ai

Often revelations found with these methods will lead to better , , and even modeling. In my experience, it pays off! I'm by no means an expert, though. There are many topics in the book I'm eager to sink my teeth into! I will post about them as I learn :) 4/4

#datacleaning #dataprep

Last updated 3 years ago

Mark Rubin · @MarkRubin
1465 followers · 1174 posts · Server fediscience.org

Open research data:

Survey of 14 research articles published in Psychological Science finds:

“(a) provision of cleaned data without raw data, (b) provision of raw data without cleaned data, and (c) no description of, or code for, the data-cleaning process.”

Open access: doi.org/10.1177/09567976221140

Some Mastodonian authors: @cruwell, @deboraha, @maltoesermalte, @sandrajgeiger, @jmoneger, @sTeamTraen


@psychology



#datacleaning #reproducibility #metascience #OpenScience #psychology

Last updated 3 years ago

Luca 🔨 · @luca
2905 followers · 63648 posts · Server social.luca.run

Was soll "20.5-.4-21 10:24:55" für ein Datum sein? DD.M-.M-DD HH:MM:SS? Natürlich ist es 2005-04-21. Logisch. Alle Punkte durch Nullen ersetzen oder lieber zufällig bis ein bekanntes Format rauskommt?

#parsethedate #datacleaning

Last updated 3 years ago

Nora · @nvdbosch
66 followers · 25 posts · Server mstdn-social.social.shrimpcam.pw

Hello, I have began to write documentation of the R-code I've used to clean historical so I don't have to reinvent the wheel in every new project. I also decided to make the accessible for who faces similar issues in their project. I will try to add one problem with exemplary every Friday (). To some people, this code might be trivial, to others it saves a lot of time and energy.

#FollowFriday #Code #datacleaning #dh #cookbook #Data #chinese

Last updated 3 years ago

Nora · @nvdbosch
66 followers · 25 posts · Server mstdn.social

Hello, I have began to write documentation of the R-code I've used to clean historical so I don't have to reinvent the wheel in every new project. I also decided to make the accessible for who faces similar issues in their project. I will try to add one problem with exemplary every Friday (). To some people, this code might be trivial, to others it saves a lot of time and energy.

#FollowFriday #Code #datacleaning #dh #cookbook #Data #chinese

Last updated 3 years ago

Pamela Oliver · @pamelaoliver
1343 followers · 1169 posts · Server sciences.social

Wondering whether anyone is interested in an ongoing thread about the process of and in our project about of in mainstream newswires and Black newspapers. Lots of data to clean (~10K articles & ~10K events), so looking for ways to get papers along the way. We clean data by issue cluster, so I reviewed the articles & decided to see if there is a paper about that. I think I saw some interesting patterns.

#datacleaning #paperwriting #newscoverage #blackprotest #millionmanmarch

Last updated 3 years ago

Carolina · @carolina
70 followers · 7 posts · Server fosstodon.org

:rstats: No data analysis project will be good without a previous cleaning of the data, a step that usually takes considerable time. I just found this free ebook suitable for R begginners. It is very succinct, easy to understand and lays the groundwork for more complex pipelines:
bookdown.org/f_lennert/data-pr

#rstats #datacleaning #stats #r

Last updated 3 years ago

Carolina · @carolina
73 followers · 9 posts · Server fosstodon.org

:rstats: No data analysis project will be good without a previous cleaning of the data, a step that usually takes considerable time. I just found this free ebook suitable for R begginners. It is very succinct, easy to understand and lays the groundwork for more complex pipelines:
bookdown.org/f_lennert/data-pr

#rstats #datacleaning #stats #r

Last updated 3 years ago

IandLoveandData · @deepsha
65 followers · 41 posts · Server fosstodon.org

US Monthly retail sales was extra fun to analyze with in

Detailed article on and using to add interactivity in visualization is here: medium.com/@menghani.deepsha/t

#tidytuesday #plotly #rstats #QuartoPub #quarto #datacleaning #plotting #crosstalk

Last updated 3 years ago

Evan Marie Carr · @EvanMarie
11 followers · 7 posts · Server sigmoid.social

🧹 APIs, JSON, and Data Cleaning

This is my most recent article, where I dive headfirst into the topic of real world data and give a thorough walk-through via a project based on the FBI's Most Wanted API.

evanmarie.com/apis-json-and-da

#DataScience #API #apis #dataengineering #pandas #Python #fbi #project #json #datacleaning

Last updated 3 years ago

Charlotte Högberg · @Ch_Hogberg
128 followers · 20 posts · Server sciences.social

Getting a message that it is the annual cleaning day of Swepub, swepub.kb.se A shout out to all hard working data cleaners!

#datacleaning

Last updated 3 years ago

The Big Data Cluster · @cznbigdata
17 followers · 43 posts · Server mstdn.science

Cluster Chat with Network ’s Dr. Byung Lee based at the University of .

Work described in this conversation will be presented during

youtu.be/i4T0nC5uUL0

#CriticalZone #bigdatacluster #vermont #agu22 #machinelearning #datacleaning #ai

Last updated 3 years ago