Yann :python: · @nobodyinperson
101 followers · 343 posts · Server fosstodon.org

@roadskater Compressed is (in my case) actually smaller than . Trade IO speed for size and data approachability.

#csv #NetCDF4

Last updated 2 years ago

Yann :python: · @nobodyinperson
86 followers · 316 posts · Server fosstodon.org

@thfriedrich I benchmarked the different compression algorithms in once if you're interested: gitlab.com/-/snippets/2043808

With the metric I use there (distance to optimum 'fast and small'), blosc:lz4 is the best compromise.

I still hit a wall with at some point though, I guess the compression prevented something from being done, I don't remember...

Also, is a subtype of so you'll feel familiar.

#hdf5 #NetCDF4

Last updated 2 years ago

Thomas Friedrich · @thfriedrich
47 followers · 84 posts · Server fosstodon.org

@nobodyinperson thanks for the insight. 🙂 I’ll have a look at and which are both new to me. Working in science, so far I used mostly for larger data, which I think has a + for & since there’s easy interfaces for most programming languages. Compression is decent I believe.

#mqtt #NetCDF4 #hdf5 #opendata #openscience

Last updated 2 years ago

Yann :python: · @nobodyinperson
86 followers · 316 posts · Server fosstodon.org

@thfriedrich Also, I've had problems with bindings not being thread-safe, so I couldn't parallelize operations very well. With compressed CSV, just throw threads (or processes) onto the problem. Files just work, no weird library in between 🙂

#NetCDF4

Last updated 2 years ago

Yann :python: · @nobodyinperson
86 followers · 316 posts · Server fosstodon.org

@thfriedrich Multidimensional would definitely be , but if you are on a multidimensional scale with model-size outputs, then that's a different task than handling measurement device timeseries (what I was referring to).

#NetCDF4

Last updated 2 years ago

Yann :python: · @nobodyinperson
86 followers · 316 posts · Server fosstodon.org

For post-processed data though, I think is the best format with its multiple structured and indexed array data fields which can have arbitrary metadata attached and work flawlessly with 's package:

xarray.dev/

#NetCDF4 #python #xarray #dataanalysis

Last updated 2 years ago