Gilles San Martin · @sanmartin
8 followers · 27 posts · Server fosstodon.org

📊 I gave a seminar on best practices for data collection

Learn which data/metadata to collect systematically & tips to avoid hours of data cleaning nightmare 😅

📽️ Watch now on YT: youtu.be/zsyTlgAG_58?feature=s

#datacollection #datascience #TidyData

Last updated 1 year ago

datamaps :rickwhoah: · @datamaps
396 followers · 753 posts · Server social.linux.pizza

@sharoz @JASPStats

SPSS suppose that you have a "dataset", a collection of "observations" that relate to a particular subject/survey/domain: one row is one observation about a (sample) unit, one column is a variable collected over all units, one cell is a singular value (this is essentially what is now called because someone wrote an article naming it, but it's been part of the profession long before that)

remember that SPSS meant statistical package for social sciences

#TidyData

Last updated 1 year ago

New Blog post

✅ .data - The data being passed that will be augmented by the function.

✅ .dx_col - The column containing the Principal Diagnosis for the discharge.

✅ .px_col - The column containing the Principal Coded Procedure for the discharge. It is possible that this could be blank.

✅ .drg_col - The DRG Number coded to the inpatient discharge.

Post: spsanderson.com/steveondata/po

#OpenSource #TidyData #DRG #serviceline #Px #dx #analytics #dataanalysis #Data #RStats #DataScience #Healthcare

Last updated 2 years ago

CodeRefinery · @coderefinery
54 followers · 32 posts · Server fosstodon.org

Day finished successfully, feedback was good. We resume tomorrow 9:50 EET / 8:50 CET with more (going through practical usage), ( as a base of the ecosystem), and then disk-based data formats.

If you want something to review for tomorrow, check out as defined in this paper - useful for anyone organizing, or not:
vita.had.co.nz/papers/tidy-dat

#PythonForSciComp #pandas #visualization #matplotlib #TidyData #python

Last updated 2 years ago

Zane Selvans · @ZaneSelvans
348 followers · 365 posts · Server social.coop

The last hard thing is trying to generalize the way we reshape the FERC data, which typically comes in a wide format (like... 500 columns sometimes) into that's more relational.

We do have a nice way to concatenate the old DBF and new XBRL data, which also aligns all of the old data, whose row numbers changed meaning from year to year as new fields were added, split, or removed.

#TidyData

Last updated 2 years ago