Conseil de lecture : "Data Cleaning (Pocket Primer), Oswald Campesato".
#Data #DataCleaning #Bash #Shell #Lecture ... (sed, grep, awk et autres !)
https://terrorgum.com/tfox/books/datacleaning_pocketprimer.pdf
#data #datacleaning #bash #shell #lecture
In #scientometrics, we will laugh at ourselves in the coming decades!
Scholars spent hours on #datacleaning and so-called "Fehlerlehre" to deal with all the tiny errors of bibliographic data (2% here, 8% there...). With #OpenScience and the world 'beyond the paper', the data party will goe full chaos and all the efforts from the past will seem tiny!
If we add #ArtificialIntelligence to this equation, information specialists will be degraded to search engine optimizers...
#artificialintelligence #OpenScience #datacleaning #Scientometrics
Can anyone recommend a #python package for #address (residential) matching (spelling variations) that can handle typical German address variations? #datascience #datacleaning #machinelearning
#python #address #datascience #datacleaning #machinelearning
Hello Fediverse, we are #OpenRefine, an open source power tool for working with messy data! We finally have an account here and will be sharing news about what is happening in the project and the broader community.
− @pintoch
#introductions #openrefine #datacleaning #foss #opendata
Easy Data Transform v1.40.0 now available for all your #datacleaning, #datawrangling and #dataanalysis needs.
https://www.easydatatransform.com/newsletters/newsletter_25_apr_2023.html
#datacleaning #dataWrangling #dataanalysis
I’m giving a talk about “Data cleaning principles” tomorrow, here at UW-Madison (though via zoom)
https://kbroman.org/Talk_DataCleaning2023/data_cleaning_notes.pdf
Here’s a video of the original, shorter version from csv,conf: https://www.youtube.com/watch?v=7Ma8WIDinDc
Feeling frustrated when trying to put together your dbGaP data submission?
@lwheinsberg@twitter.com
and I are excited to announce our new R package, #dbGaPCheckup, which we designed to help YOU (and us) with this complex process! (1/6)
#Rstats #Genetics #dbGaP #DataCleaning #OpenScience #DataSharing
#datasharing #OpenScience #datacleaning #dbgap #genetics #rstats #dbgapcheckup
#DataCentricAI free course by #MIT
#machinelearning #data #datacleaning #datacentric #ai #artificialintelligence
#datacentricai #mit #machinelearning #data #datacleaning #datacentric #ai #artificialintelligence
Often revelations found with these methods will lead to better #datacleaning, #dataprep, and even modeling. In my experience, it pays off! I'm by no means an expert, though. There are many topics in the book I'm eager to sink my teeth into! I will post about them as I learn :) 4/4
Open research data:
Survey of 14 research articles published in Psychological Science finds:
“(a) provision of cleaned data without raw data, (b) provision of raw data without cleaned data, and (c) no description of, or code for, the data-cleaning process.”
Open access: https://doi.org/10.1177/09567976221140828
Some Mastodonian authors: @cruwell, @deboraha, @maltoesermalte, @sandrajgeiger, @jmoneger, @sTeamTraen
#Psychology
@psychology
#OpenScience
#MetaScience
#Reproducibility
#DataCleaning
#datacleaning #reproducibility #metascience #OpenScience #psychology
Was soll "20.5-.4-21 10:24:55" für ein Datum sein? DD.M-.M-DD HH:MM:SS? Natürlich ist es 2005-04-21. Logisch. Alle Punkte durch Nullen ersetzen oder lieber zufällig bis ein bekanntes Format rauskommt?
Hello, I have began to write documentation of the R-code I've used to clean #Chinese historical #data so I don't have to reinvent the wheel in every new project. I also decided to make the #cookbook accessible for who faces similar issues in their #dh project. I will try to add one #datacleaning problem with exemplary #code every Friday (#FollowFriday). To some people, this code might be trivial, to others it saves a lot of time and energy.
#FollowFriday #Code #datacleaning #dh #cookbook #Data #chinese
Hello, I have began to write documentation of the R-code I've used to clean #Chinese historical #data so I don't have to reinvent the wheel in every new project. I also decided to make the #cookbook accessible for who faces similar issues in their #dh project. I will try to add one #datacleaning problem with exemplary #code every Friday (#FollowFriday). To some people, this code might be trivial, to others it saves a lot of time and energy.
#FollowFriday #Code #datacleaning #dh #cookbook #Data #chinese
Wondering whether anyone is interested in an ongoing thread about the process of #DataCleaning and #PaperWriting in our project about #NewsCoverage of #BlackProtest in mainstream newswires and Black newspapers. Lots of data to clean (~10K articles & ~10K events), so looking for ways to get papers along the way. We clean data by issue cluster, so I reviewed the #MillionManMarch articles & decided to see if there is a paper about that. I think I saw some interesting patterns.
#datacleaning #paperwriting #newscoverage #blackprotest #millionmanmarch
:rstats: No data analysis project will be good without a previous cleaning of the data, a step that usually takes considerable time. I just found this free ebook suitable for R begginners. It is very succinct, easy to understand and lays the groundwork for more complex pipelines:
https://bookdown.org/f_lennert/data-prep_2days/
#rstats #datacleaning #stats #R
#rstats #datacleaning #stats #r
:rstats: No data analysis project will be good without a previous cleaning of the data, a step that usually takes considerable time. I just found this free ebook suitable for R begginners. It is very succinct, easy to understand and lays the groundwork for more complex pipelines:
https://bookdown.org/f_lennert/data-prep_2days/
#rstats #datacleaning #stats #R
#rstats #datacleaning #stats #r
#TidyTuesday US Monthly retail sales was extra fun to analyze with #Plotly in #rstats #QuartoPub #Quarto
Detailed article on #datacleaning #plotting and using #crosstalk to add interactivity in visualization is here: https://medium.com/@menghani.deepsha/tidytuesday-retail-sales-data-analysis-with-plotly-in-r-c8ca605d4d0
#tidytuesday #plotly #rstats #QuartoPub #quarto #datacleaning #plotting #crosstalk
🧹 APIs, JSON, and Data Cleaning
This is my most recent article, where I dive headfirst into the topic of real world data and give a thorough walk-through via a project based on the FBI's Most Wanted API.
https://www.evanmarie.com/apis-json-and-data-cleaning/
#datascience #API #APIs #dataengineering #pandas #python #FBI #project #JSON #datacleaning
#DataScience #API #apis #dataengineering #pandas #Python #fbi #project #json #datacleaning
Getting a message that it is the annual cleaning day of Swepub, https://swepub.kb.se A shout out to all hard working data cleaners! #datacleaning
Cluster Chat with #CriticalZone Network #BigDataCluster’s Dr. Byung Lee based at the University of #Vermont.
Work described in this conversation will be presented during #AGU22
#CriticalZone #bigdatacluster #vermont #agu22 #machinelearning #datacleaning #ai