CrateDB · @cratedb
72 followers · 85 posts · Server fosstodon.org

Check out our latest tutorial on how to import Parquet files into CrateDB using Python, PyArrow, SQLAlchemy, and Pandas 💡

Our Solution Engineer, Karyn Azevedo, shows you how to do it in a step-by-step tutorial👩🏻‍💻

Read the full blog post ➡️
hubs.ly/Q01M1k7d0

#cratedb #db #database #blogpost #technicalblogpost #TechContent #tutorial #python #sqlalchemy #ApacheArrow #parquet #pandas

Last updated 2 years ago

devlog

Didn't expected that GC already in origin trial. Seems I need try if Host Types in wasm GC for JS API work in my case before moving to .
github.com/DrSensor/nusa/issue

I wonder if there's a language or compiler that already support wasm GC 🤔 (I doubt rust js_sys and wasm_bindgen already support it)

#nusa #webassembly #wasm #ApacheArrow

Last updated 2 years ago

FOSSlife · @fosslife
1559 followers · 68 posts · Server fosstodon.org
François Michonneau · @fmic_
340 followers · 4 posts · Server hachyderm.io

My colleague @paleolimbot has released nanoarrow, a small C library that provides an interface to the data structures. There is also an package for zero-copy conversions with R objects.
* blog post: arrow.apache.org/blog/2023/03/
* R package: cran.r-project.org/web/package
* GitHub repo: github.com/apache/arrow-nanoar

#ApacheArrow #rstats

Last updated 2 years ago

Sharon Machlis · @smach
2007 followers · 938 posts · Server fosstodon.org

@eamon Although you can use :rstats: for data that won't fit in memory too 😀
@thomas_mock 's lightning talk at last year's Arrow conference
Video youtube.com/watch?v=LvTX1ZAZy6
Slides jthomasmock.github.io/arrow-dp

#rstats #duckdb #ApacheArrow

Last updated 2 years ago

Evan Pappas · @epappas
9 followers · 16 posts · Server sigmoid.social

ROAPI automatically spins up read-only APIs for static datasets without requiring you to write a single line of code. It builds on top of and (and ).

github.com/roapi/roapi

#ApacheArrow #datafusion #rust

Last updated 3 years ago

Matt Topol · @zeroshade
66 followers · 23 posts · Server data-folks.masto.host

Hey all!

I'll be presenting at , a 3-day, community-driven, data science, engineering, analytics & AI event, featuring over 10-tracks and 75+ speakers!

lnkd.in/ew7DEWeJ

Come down and see me! I'll even have some signed copies of my book "In Memory Analytics with Apache Arrow" to hand out!

Looking forward to meeting lots of interesting people and having great conversations

#datacouncil23 #AI #community #event #engineering #datascience #analytics #ApacheArrow #databases

Last updated 3 years ago

Daniel Hocking · @djhocking
282 followers · 45 posts · Server bayes.club

I was just recommending @djnavarro posts about and in to someone and decided I'd share here too. She writes fantastic intros and explanations in entertaining posts, tutorials, and courses.

blog.djnavarro.net/posts/2021-

blog.djnavarro.net/posts/2022-

arrow-user2022.netlify.app/

#ApacheArrow #parquet #datascience #rstats

Last updated 3 years ago

Lucas Longour · @llongour
65 followers · 56 posts · Server mapstodon.space

Just discovered the Sci-Hub download log on year 2017 (zenodo.org/record/1158301), so let's do a map ... but the original tsv file is very large ~14Go uncompressed. It's the perfect moment to try package, and to make maps with .

#ApacheArrow #RStats #bertinjs #gischat

Last updated 3 years ago

Olivier Grisel · @ogrisel
1640 followers · 50 posts · Server sigmoid.social

A short technical blog post by Christian Lorentzen (scikit-learn developer) on some computational aspects of the histograms used in Gradient Boosting Trees in scikit-learn / xgboost / lightgbm:

lorentzen.ch/index.php/2022/10

#duckdb #ApacheArrow #tabmat #Sklearn #PyData #machinelearning

Last updated 3 years ago

· @emauviere
57 followers · 9 posts · Server mapstodon.space

Le format devient mainstream, il a pourtant presque 10 ans. En quoi est-il devenu un successeur crédible à ?
Quels sont ses rapports avec , ou ? Comment l'utiliser dans ou ?
Je vous éclaire ici 👇 :
icem7.fr/outils/parquet-devrai

#apacheparquet #csv #ApacheArrow #duckdb #RStats #qgis

Last updated 3 years ago

gianarb :nixos: :vim: :rust: · @gianarb
328 followers · 28 posts · Server m.gianarb.it

I am thinking about writing an with cheat sheet or something similar. I want to share a few of the lessons I have learned working with it. The pipeline I wrote works kind of fine so, I think it is time to collect what I figured out

#ApacheArrow #golang

Last updated 3 years ago

Gunnar Morling · @gunnarmorling
797 followers · 111 posts · Server mastodon.online

🗣️ "The central idea behind Flight is deceptively simple: it provides a standard protocol for transferring Arrow data over a network"

Great post by @djnavarro; Flight (and Flight SQL) is super-interesting, definitely keep an eye on it in '23.

blog.djnavarro.net/posts/2022-

#ApacheArrow

Last updated 3 years ago

Danielle Navarro · @djnavarro
3896 followers · 832 posts · Server fosstodon.org

Since Hadley has announced it on twitter I will do the honours on here, but I'll forego the pirate-speak out of common decency...

There's a new chapter on and Parquet data in R4DS. It's mostly based on my work so please let me know if you spot any problems with the chapter and I promise to annoy Hadley with a pull request fixing it

r4ds.hadley.nz/arrow.html

#ApacheArrow #rstats

Last updated 3 years ago

Danielle Navarro · @djnavarro
3814 followers · 734 posts · Server fosstodon.org

For reasons unknown she is blogging again. I am so sorry, but should you happen to be curious about how Dataset objects work in the package, and enjoy me being mildly irritable about... things, this post may be of some interest? :blobcatheart:

blog.djnavarro.net/unpacking-a

#ApacheArrow #rstats

Last updated 3 years ago

Kae Suarez · @kaesuarez
49 followers · 31 posts · Server hachyderm.io

Hello! I'm Kae, and this is my Hachyderm.

I want to be clear about what's going to happen here.

I am going to be making a lot of posts that are me floundering with tech. That's the point. As a person in , I think it's important to be honest about floundering now and then, and let others, especially engineers, see some pain points.

Plus, it can be entertaining, and gives me tons of draft material for my blog.

I hope I can show everyone some cool things!

#python #ApacheArrow #devrel

Last updated 3 years ago

Nic Crane · @nic_crane
74 followers · 1 posts · Server fosstodon.org

A blog post I wrote comparing type inference in the CSV readers in readr and arrow 📦

thisisnic.github.io/2022/11/21

#rstats #tidyverse #ApacheArrow

Last updated 3 years ago

Nic Crane · @nic_crane
192 followers · 9 posts · Server fosstodon.org

A blog post I wrote comparing type inference in the CSV readers in readr and arrow 📦

thisisnic.github.io/2022/11/21

#rstats #tidyverse #ApacheArrow

Last updated 3 years ago

Danielle Navarro · @djnavarro
3738 followers · 706 posts · Server fosstodon.org

My favourite trick for working with huge data sets in R. If your dataset is larger than memory and the query result is also larger than memory, you can still use dplyr/arrow pipelines. Example:

library(arrow)
library(dplyr)

nyc_taxi <- open_dataset("nyc-taxi/")
nyc_taxi |>
filter(payment_type == "Credit card") |>
group_by(year, month) |>
write_dataset("nyc-taxi-credit")

Input is 1.7 billion rows (70GB), output is 500 million (15GB). Takes 3-4 mins on my laptop 🙂

#rstats #ApacheArrow

Last updated 3 years ago

Matt Topol · @zeroshade
44 followers · 13 posts · Server data-folks.masto.host

@jayatid Agreed! My current job is actually full-time working on the and libraries. My book on Arrow also includes Go examples (in addition to Python and C++), but I'm still looking for any opportunities or assistance in building the community, haha.

#parquet #golang #ApacheArrow

Last updated 3 years ago