FedSearch - Federated network search engine

Thomas Sandmann · @thomas_sandmann

366 followers · 1147 posts · Server genomic.social

Today I learned how to use Pagès’s awesome {DelayedArray} @bioconductor package to handle gene expression data stored in parquet files. https://tomsing1.github.io/blog/posts/parquetArray/ This way, I can leverage familiar R tools and take
advantage of the language-agnostic parquet file format, querying very large gene expression datasets. Do you have experience with parquet files for biological data - please share the lessons you have learned! #til #parquet #rstats #duckdb #bioconductor

#til #parquet #RStats #duckdb #Bioconductor

Last updated 2 years ago

Original post

· @emauviere

93 followers · 23 posts · Server mapstodon.space

Open media

Le format #Parquet et le moteur SQL DuckDB changent vraiment la donne en requêtage #opendata.
L'Insee commence à diffuser en Parquet, ça tombe bien !

Avec https://shell.duckdb.org, requête en direct sur le fichier de 470 mo des adresses géocodées des électeurs par bureau de vote (https://www.data.gouv.fr/fr/datasets/bureaux-de-vote-et-adresses-de-leurs-electeurs/).

Toulouse en tête pour le nb d'adresses, résultat en 1 s, chapeau DuckDB, ça décoiffe 🙌

#parquet #opendata

Last updated 2 years ago

Original post

Thomas Sandmann · @thomas_sandmann

366 followers · 1147 posts · Server genomic.social

Today I learned how to store gene expression data in (multiple) parquet files, and query them as a single dataset from R with the {arrow}, {duckdb} or {sparklyr} packages. I am amazed by {duckdb}'s speed 🚀 - even on my laptop! Here's a blog post with what I learned: https://tomsing1.github.io/blog/posts/parquet/ #TIL #RStats #duckdb #parquet #spark #compbio #rnaseq

#til #RStats #duckdb #parquet #spark #compBio #rnaseq

Last updated 2 years ago

Original post

Andrea Borruso · @aborruso

227 followers · 203 posts · Server mastodon.uno

Open media

Il formato CSV può essere "grosso", brutto e cattivo.

Ci sono modi per pubblicarlo e descriverlo, e strumenti per elaborarlo, che lo fanno diventare "bello" e soprattutto "pronto".

Tips & tricks, ispirati da #DuckDB, Apache #Parquet e OpenCoesione.

https://aborruso.github.io/posts/duckdb-intro-csv/

#duckdb #parquet

Last updated 2 years ago

Original post

Catalyst Cooperative · @catalystcoop

4 followers · 5 posts · Server mastodon.energy

Our tooling is mostly #Python based. We primarily use @pandas_dev for data wrangling, and @dagster for orchestration.

We publish our smaller, more relational outputs as #SQLite databases and bigger skinny tables as Apache #Parquet datasets.

Historically we've focused on wrangling semi-structured data like spreadsheets, VisualFoxPro DBs, XBRL, and piles of CSVs into clean tables, but recently we've also started using OCR and ML to extract tables from regulatory PDFs in bulk (millions of pages).

#python #sqlite #parquet

Last updated 2 years ago

Original post

Philippe Massicotte · @philmassicotte

64 followers · 90 posts · Server fosstodon.org

New spatial products freely available from Meta, Microsoft, Amazon, and Tomtom, and all that available in cloud-native Parquet format!

https://overturemaps.org/download/

#spatial #parquet #gis

Last updated 2 years ago

Original post

Mark Wolfe · @wolfeidau

102 followers · 113 posts · Server awscommunity.social

Built out a small sample project which illustrates using apache arrow to write a parquet file from JSON Lines https://github.com/wolfeidau/arrow-gh-processor Using the GitHub archives as input has been interesting #golang #parquet

#golang #parquet

Last updated 2 years ago

Original post

Konstantin Stadler :verified: · @kst

444 followers · 58 posts · Server qoto.org

New #pymrio release (v0.5.1) with support for #Parquet file storage, GLORIA #mrio downloader (parser coming soon) and some bugfixes. First release with LGPL licence. Available on pypi pypi.org/project/pymrio/ and #condaforge :
https://anaconda.org/conda-forge/pymrio

The new #parquet based file format reduces the save/read time of the full #exiobase system from around 2 min to 20 sec on a typical SSD/laptop (with half the space req).
Will present parts of this at the #iioa conference #opensource special session, Tuesday 27.6 at 1430.

#MRIO #exiobase #pymrio #parquet #condaforge #iioa #opensource

Last updated 2 years ago

Original post

François Michonneau · @fmic_

432 followers · 47 posts · Server hachyderm.io

Open media

A short blog post where I show how to use #DuckDB to connect to a remote #Parquet file hosted over HTTPS and work with it using #dplyr:

https://francoismichonneau.net/2023/06/duckdb-r-remote-data/

#duckdb #parquet #dplyr

Last updated 2 years ago

Original post

François Michonneau · @fmic_

484 followers · 74 posts · Server hachyderm.io

Open media

A short blog post where I show how to use #DuckDB to connect to a remote #Parquet file hosted over HTTPS and work with it using #dplyr:

https://francoismichonneau.net/2023/06/duckdb-r-remote-data/

#duckdb #parquet #dplyr

Last updated 2 years ago

Original post

GuruHiTech · @guruhitech

85 followers · 928 posts · Server mastodon.uno

MIGO Ascender, il primo robot aspirapolvere in grado di salire le scale [VIDEO]
#Ascender #MigoRobotics #robot #scale #aspirapolvere #pavimento #pulizia #polvere #parquet
https://guruhitech.com/migo-ascender-il-primo-robot-aspirapolvere-in-grado-di-salire-le-scale/

#ascender #migorobotics #robot #scale #aspirapolvere #pavimento #pulizia #polvere #parquet

Last updated 2 years ago

Original post

POUJOL-ROST Mathias ✅ · @poujolrost

284 followers · 10872 posts · Server mstdn.jp

[ 🔄 ]

@Mediapart 🔗 https://mastodon.social/users/Mediapart/statuses/110445950829069600
-
Les « #écoutes » de Matignon: la #justice enquête sur des #soupçons d’ #irrégularités massives

Après le #signalement d’une #gendarme, le #parquet de Paris a ouvert une #enquête préliminaire, confiée à la #DGSI, à propos d’une #dérive au sein de #Matignon. Pas moins de 300 techniques de renseignement ont été pratiquées sans #validation du premier #ministre, comme l’impose pourtant la #loi.

› https://www.mediapart.fr/journal/france/280523/les-ecoutes-de-matignon-la-justice-enquete-sur-des-soupcons-d-irregularites-massives

#loi #ministre #validation #matignon #derive #dgsi #enquete #parquet #gendarme #signalement #irregularites #soupcons #justice #ecoutes

Last updated 2 years ago

Original post

POUJOL-ROST Mathias ✅ · @poujolrost

306 followers · 11086 posts · Server mstdn.jp

Mastodon - Mediapart (@Mediapart@mastodon.social)

[ 🔄 ]

@Mediapart 🔗 https://mastodon.social/users/Mediapart/statuses/110445950829069600
-
Les « #écoutes » de Matignon: la #justice enquête sur des #soupçons d’ #irrégularités massives

Après le #signalement d’une #gendarme, le #parquet de Paris a ouvert une #enquête préliminaire, confiée à la #DGSI, à propos d’une #dérive au sein de #Matignon. Pas moins de 300 techniques de renseignement ont été pratiquées sans #validation du premier #ministre, comme l’impose pourtant la #loi.

› https://www.mediapart.fr/journal/france/280523/les-ecoutes-de-matignon-la-justice-enquete-sur-des-soupcons-d-irregularites-massives

#loi #ministre #validation #matignon #derive #dgsi #enquete #parquet #gendarme #signalement #irregularites #soupcons #justice #ecoutes

Last updated 2 years ago

Original post

Dave Mackey · @davidshq

861 followers · 1400 posts · Server hachyderm.io

any #recommendations on best articles to read to understand #apache #parquet?

#recommendations #apache #parquet

Last updated 2 years ago

Original post

Leon Brocard · @orangeacme

214 followers · 141 posts · Server fosstodon.org

Recently I've been building reports based upon HTTP Archive data. Rather than call BigQuery, I instead export the data I'm interested in into Parquet format and then query it locally on my laptop using DuckDB. Here's how I did it: https://discuss.httparchive.org/t/querying-the-http-archive-with-duckdb/2568
#httpArchive #parquet #DuckDB

#httparchive #parquet #duckdb

Last updated 2 years ago

Original post

GuruHiTech · @guruhitech

66 followers · 626 posts · Server mastodon.uno

Pavimenti al top con l'aspirapolvere 3 in 1 Neakasa PowerScrub II
#Neakasa #PowerScrub2 #aspirapolvere #pavimento #casa #pulizia #sconto #offerta #coupon #parquet
https://guruhitech.com/pavimenti-al-top-con-laspirapolvere-3-in-1-neakasa-powerscrub-ii/

#neakasa #powerscrub2 #aspirapolvere #pavimento #casa #pulizia #sconto #offerta #coupon #parquet

Last updated 2 years ago

Original post

Thom O'Connor · @thomoco

184 followers · 977 posts · Server mas.to

Part 2 of our blog series on using #Apache #Parquet with #ClickHouse

https://clickhouse.com/blog/apache-parquet-clickhouse-local-querying-writing-internals-row-groups

#clickhouse #parquet #apache

Last updated 2 years ago

Original post

Tim Yocum · @tky

74 followers · 11 posts · Server hachyderm.io

Proud of the #influxdb team today. We’ve launched InfluxDB 3.0 on InfluxDB Cloud. Built with #rust, leveraging #parquet, #datafusion, and #arrow, it sports unlimited cardinality, SQL, and so much more. This unlocks new capabilities for #observability, #iot, and other use cases.

https://www.influxdata.com/blog/introducing-influxdb-3-0/

#influxdb #rust #parquet #datafusion #Arrow #observability #iot

Last updated 2 years ago

Original post

Nikola Ilic · @DataMozart

69 followers · 16 posts · Server dataplatform.social

NEW BLOG POST!

We evolve! So do the data...New flavors of data required new ways of storing it. In this article, we are going to dive deep under the hood of the #Apache #Parquet file format and explain the multiple benefits that it brings to the table.

Oh, there is a bonus part on #DeltaLake as well😉

https://data-mozart.com/parquet-file-format-everything-you-need-to-know/

#apache #parquet #deltalake

Last updated 2 years ago

Original post