📊 I gave a seminar on best practices for data collection
Learn which data/metadata to collect systematically & tips to avoid hours of data cleaning nightmare 😅
#DataCollection #DataScience #TidyData
📽️ Watch now on YT: https://youtu.be/zsyTlgAG_58?feature=shared
#datacollection #datascience #TidyData
SPSS suppose that you have a "dataset", a collection of "observations" that relate to a particular subject/survey/domain: one row is one observation about a (sample) unit, one column is a variable collected over all units, one cell is a singular value (this is essentially what is now called #tidydata because someone wrote an article naming it, but it's been part of the profession long before that)
remember that SPSS meant statistical package for social sciences
New Blog post
✅ .data - The data being passed that will be augmented by the function.
✅ .dx_col - The column containing the Principal Diagnosis for the discharge.
✅ .px_col - The column containing the Principal Coded Procedure for the discharge. It is possible that this could be blank.
✅ .drg_col - The DRG Number coded to the inpatient discharge.
Post: https://www.spsanderson.com/steveondata/posts/weekly-rtip-healthyr-2023-01-27/
#healthcare #datascience #rstats #data #dataanalysis #analytics #dx #px #serviceline #drg #tidydata #opensource
#OpenSource #TidyData #DRG #serviceline #Px #dx #analytics #dataanalysis #Data #RStats #DataScience #Healthcare
Day finished successfully, feedback was good. We resume tomorrow 9:50 EET / 8:50 CET with more #Pandas (going through practical usage), #visualization (#matplotlib as a base of the ecosystem), and then disk-based data formats.
If you want something to review for tomorrow, check out #TidyData as defined in this paper - useful for anyone organizing, #Python or not:
https://vita.had.co.nz/papers/tidy-data.pdf
#PythonForSciComp #pandas #visualization #matplotlib #TidyData #python
My #r #package {TidyDensity} is on its way to #cran @ramikrispin #distributions #randomdata #tidydata
#TidyData #randomdata #distributions #cran #package #r
The last hard thing is trying to generalize the way we reshape the FERC data, which typically comes in a wide format (like... 500 columns sometimes) into #TidyData that's more relational.
We do have a nice way to concatenate the old DBF and new XBRL data, which also aligns all of the old data, whose row numbers changed meaning from year to year as new fields were added, split, or removed.