Thomas Sandmann · @thomas_sandmann
366 followers · 1147 posts · Server genomic.social

Today I learned how to use Pagès’s awesome {DelayedArray} @bioconductor package to handle gene expression data stored in parquet files. tomsing1.github.io/blog/posts/ This way, I can leverage familiar R tools and take
advantage of the language-agnostic parquet file format, querying very large gene expression datasets. Do you have experience with parquet files for biological data - please share the lessons you have learned!

#til #parquet #RStats #duckdb #Bioconductor

Last updated 1 year ago

· @emauviere
93 followers · 23 posts · Server mapstodon.space

Le format et le moteur SQL DuckDB changent vraiment la donne en requêtage .
L'Insee commence à diffuser en Parquet, ça tombe bien !

Avec shell.duckdb.org, requête en direct sur le fichier de 470 mo des adresses géocodées des électeurs par bureau de vote (data.gouv.fr/fr/datasets/burea).

Toulouse en tête pour le nb d'adresses, résultat en 1 s, chapeau DuckDB, ça décoiffe 🙌

#parquet #opendata

Last updated 1 year ago

Thomas Sandmann · @thomas_sandmann
366 followers · 1147 posts · Server genomic.social

Today I learned how to store gene expression data in (multiple) parquet files, and query them as a single dataset from R with the {arrow}, {duckdb} or {sparklyr} packages. I am amazed by {duckdb}'s speed 🚀 - even on my laptop! Here's a blog post with what I learned: tomsing1.github.io/blog/posts/

#til #RStats #duckdb #parquet #spark #compBio #rnaseq

Last updated 1 year ago

Andrea Borruso · @aborruso
227 followers · 203 posts · Server mastodon.uno

Il formato CSV può essere "grosso", brutto e cattivo.

Ci sono modi per pubblicarlo e descriverlo, e strumenti per elaborarlo, che lo fanno diventare "bello" e soprattutto "pronto".

Tips & tricks, ispirati da , Apache e OpenCoesione.

aborruso.github.io/posts/duckd

#duckdb #parquet

Last updated 1 year ago

Our tooling is mostly based. We primarily use @pandas_dev for data wrangling, and @dagster for orchestration.

We publish our smaller, more relational outputs as databases and bigger skinny tables as Apache datasets.

Historically we've focused on wrangling semi-structured data like spreadsheets, VisualFoxPro DBs, XBRL, and piles of CSVs into clean tables, but recently we've also started using OCR and ML to extract tables from regulatory PDFs in bulk (millions of pages).

#python #sqlite #parquet

Last updated 1 year ago

Philippe Massicotte · @philmassicotte
64 followers · 90 posts · Server fosstodon.org

New spatial products freely available from Meta, Microsoft, Amazon, and Tomtom, and all that available in cloud-native Parquet format!

overturemaps.org/download/

#spatial #parquet #gis

Last updated 1 year ago

Mark Wolfe · @wolfeidau
102 followers · 113 posts · Server awscommunity.social

Built out a small sample project which illustrates using apache arrow to write a parquet file from JSON Lines github.com/wolfeidau/arrow-gh- Using the GitHub archives as input has been interesting

#golang #parquet

Last updated 1 year ago

Konstantin Stadler :verified: · @kst
444 followers · 58 posts · Server qoto.org

New release (v0.5.1) with support for file storage, GLORIA downloader (parser coming soon) and some bugfixes. First release with LGPL licence. Available on pypi pypi.org/project/pymrio/ and :
anaconda.org/conda-forge/pymri

The new based file format reduces the save/read time of the full system from around 2 min to 20 sec on a typical SSD/laptop (with half the space req).
Will present parts of this at the conference special session, Tuesday 27.6 at 1430.

#MRIO #exiobase #pymrio #parquet #condaforge #iioa #opensource

Last updated 1 year ago

François Michonneau · @fmic_
432 followers · 47 posts · Server hachyderm.io

A short blog post where I show how to use to connect to a remote file hosted over HTTPS and work with it using :

francoismichonneau.net/2023/06

#duckdb #parquet #dplyr

Last updated 1 year ago

François Michonneau · @fmic_
484 followers · 74 posts · Server hachyderm.io

A short blog post where I show how to use to connect to a remote file hosted over HTTPS and work with it using :

francoismichonneau.net/2023/06

#duckdb #parquet #dplyr

Last updated 1 year ago

GuruHiTech · @guruhitech
85 followers · 928 posts · Server mastodon.uno
POUJOL-ROST Mathias ✅ · @poujolrost
284 followers · 10872 posts · Server mstdn.jp

[ 🔄 ]

@Mediapart 🔗 mastodon.social/users/Mediapar
-
Les « » de Matignon: la enquête sur des d’ massives

Après le d’une , le de Paris a ouvert une préliminaire, confiée à la , à propos d’une au sein de . Pas moins de 300 techniques de renseignement ont été pratiquées sans du premier , comme l’impose pourtant la .

mediapart.fr/journal/france/28

#loi #ministre #validation #matignon #derive #dgsi #enquete #parquet #gendarme #signalement #irregularites #soupcons #justice #ecoutes

Last updated 1 year ago

POUJOL-ROST Mathias ✅ · @poujolrost
306 followers · 11086 posts · Server mstdn.jp

[ 🔄 ]

@Mediapart 🔗 mastodon.social/users/Mediapar
-
Les « » de Matignon: la enquête sur des d’ massives

Après le d’une , le de Paris a ouvert une préliminaire, confiée à la , à propos d’une au sein de . Pas moins de 300 techniques de renseignement ont été pratiquées sans du premier , comme l’impose pourtant la .

mediapart.fr/journal/france/28

#loi #ministre #validation #matignon #derive #dgsi #enquete #parquet #gendarme #signalement #irregularites #soupcons #justice #ecoutes

Last updated 1 year ago

Dave Mackey · @davidshq
861 followers · 1400 posts · Server hachyderm.io

any on best articles to read to understand ?

#recommendations #apache #parquet

Last updated 2 years ago

Leon Brocard · @orangeacme
214 followers · 141 posts · Server fosstodon.org

Recently I've been building reports based upon HTTP Archive data. Rather than call BigQuery, I instead export the data I'm interested in into Parquet format and then query it locally on my laptop using DuckDB. Here's how I did it: discuss.httparchive.org/t/quer

#httparchive #parquet #duckdb

Last updated 2 years ago

GuruHiTech · @guruhitech
66 followers · 626 posts · Server mastodon.uno
Thom O'Connor · @thomoco
184 followers · 977 posts · Server mas.to
Tim Yocum · @tky
74 followers · 11 posts · Server hachyderm.io

Proud of the team today. We’ve launched InfluxDB 3.0 on InfluxDB Cloud. Built with , leveraging , , and , it sports unlimited cardinality, SQL, and so much more. This unlocks new capabilities for , , and other use cases.

influxdata.com/blog/introducin

#influxdb #rust #parquet #datafusion #Arrow #observability #iot

Last updated 2 years ago

Nikola Ilic · @DataMozart
69 followers · 16 posts · Server dataplatform.social

NEW BLOG POST!

We evolve! So do the data...New flavors of data required new ways of storing it. In this article, we are going to dive deep under the hood of the file format and explain the multiple benefits that it brings to the table.

Oh, there is a bonus part on as well😉

data-mozart.com/parquet-file-f

#apache #parquet #deltalake

Last updated 2 years ago

Eugene Meidinger · @Sqlgene
254 followers · 569 posts · Server techhub.social

Your 5 weekly BI links - April 25th, 2023

mailchi.mp/sqlgene/your-5-week

This week's links include Meagan Longoria, Chris Webb, David Eldersveld, Nikola Ilic, and Primal Branding.

#powerbi #parquet

Last updated 2 years ago