Tabular · @tabular
53 followers · 62 posts · Server data-folks.masto.host

PyIceberg: Python Development Setup

This video will walk you through the steps required to set up the Python development environment for PyIceberg. We will set up a local instance of Spark, Rest catalog, and MinIO for querying an actual table. This makes it easy to do interactive development and test everything end to end.


youtu.be/D0HJuB0uSio

#iceberg #python #pyiceberg #tabular #minio #spark #DataLake #datalakehouse #pyarrow

Last updated 3 years ago

Tabular · @tabular
50 followers · 52 posts · Server data-folks.masto.host

Fokko Driesprong has written a very interesting new blog on using the latest version of with and DuckDB Labs to load data from an table into PyArrow or DuckDB with PyIceberg.

tabular.medium.com/pyiceberg-0

#pyiceberg #pyarrow #iceberg #python #spark #minio

Last updated 3 years ago

Tabular · @tabular
50 followers · 51 posts · Server data-folks.masto.host

With 0.2.1 now available, we thought a video that illustrates using it with and DuckDB Labs would be in order. Thank you Fokko Driesprong for the content.

youtu.be/rYbSu9wvQmk

#pyiceberg #pyarrow #iceberg #apacheiceberg #duckdb #voltrondata #DataLake #datalakehouse

Last updated 3 years ago

Good news - Python's CSV reader supports unicode characters like 🤘 as CSV field delimiters.

Bad news is that doesn't support it yet :(

Make PyArrow great again!

#pyarrow #python #developer #unicode #csv #bigdata

Last updated 3 years ago

Tabular · @tabular
44 followers · 29 posts · Server data-folks.masto.host

A hearty thank you to the PyIceberg community on the release of Apache PyIceberg release 0.2.0!

This release includes a few major features, such as

* Read support using PyArrow and DuckDB

* Support for AWS Glue

Please check the updated docs (py.iceberg.apache.org/) for the details.

This release can be downloaded from: pypi.org/project/pyiceberg/0.2

And can be installed using: pip3 install pyiceberg==0.2.0

#iceberg #python #pyiceberg #duckdb #pyarrow

Last updated 3 years ago

Taras Novak 🇺🇦 · @dataSamurai
64 followers · 86 posts · Server vis.social

Hey 🤓, good news:

v0.6.0 brings reading data on par with & and loads 1.66 GB of data in 1.9s with 12 cores/24 threads when experimental parallel CSV reader & unordered insertion are enabled.

🧐 github.com/RandomFractals/chic

🔬 ...

#datatools #ChicagoCrimes #polars #pyarrow #csv #duckdb #datanerds

Last updated 3 years ago

Today I'm doing some archeology, and once again dabbling in to read and convert parquet to so that I can do some minimal data exploration and answer questions on how the models were trained. If you store data in parquet format, PyArrow is a great resource.

arrow.apache.org/docs/python/i

#pandas #pyarrow #machinelearning

Last updated 3 years ago