FedSearch - Federated network search engine

Chris Wensel · @cwensel

161 followers · 1145 posts · Server fosstodon.org

GitHub - GitHub - cwensel/cascading: Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

So Tessellate inherits lots of support for various data formats from Cascading
https://github.com/cwensel/cascading

Even though #apacheparquet dropped Cascading support, we were able to port it over.

Now that Parquet is native to Cascading, it should be easier to add #apacheiceberg support.

This would allow #clusterless to convert data as it arrives into Iceberg continuously for use in #aws Athena or other data front-ends.

Anyone interested in a challenge?

#aws #java

#apacheparquet #ApacheIceberg #clusterless #aws #java

Last updated 2 years ago

Original post

Chris Wensel · @cwensel

150 followers · 1066 posts · Server fosstodon.org

A little more color on this announcement..
https://fosstodon.org/@cwensel/110549001614086663

First, #ApacheParquet removed #Cascading support, so I had to splice the original source into Cascading. But the ParquetScheme didn't honor type information fully. So there is a new TypedParquetScheme that has native support for JSON and Timestamps.

Second, Parquet requires the #ApacheHadoop FileSystem, which means we get the wonderful S3A implementation. But we also get a 331MB jar dependency with the aws bundle.

#apacheparquet #cascading #apachehadoop

Last updated 2 years ago

Original post

· @nlamirault

0 followers · 9 posts · Server mastodon.cloud

Grafana Tempo 2.0 release: TraceQL and Apache Parquet

Great ! With #apacheparquet ... Let's go to try #TraceQL ...
---
RT @grafana
✨ Grafana Tempo 2.0 is finally here! ✨

Among other updates, Tempo 2.0 comes with two important new features; a new Apache Parquet backend storage format, and #TraceQL, a new language designed for discovering traces.
https://grafana.com/blog/2023/02/01/new-in-grafana-tempo-2.0-apache-parquet-as-the-default-storage-format-support-for-traceql/?mdm=social
https://twitter.com/grafana/status/1620857616736845844

#apacheparquet #traceql

Last updated 3 years ago

Original post

· @emauviere

57 followers · 9 posts · Server mapstodon.space

Le format #ApacheParquet devient mainstream, il a pourtant presque 10 ans. En quoi est-il devenu un successeur crédible à #CSV ?
Quels sont ses rapports avec #ApacheArrow, ou #duckdb ? Comment l'utiliser dans #rstats ou #QGIS ?
Je vous éclaire ici 👇 :
https://www.icem7.fr/outils/parquet-devrait-remplacer-le-format-csv/

#apacheparquet #csv #ApacheArrow #duckdb #RStats #qgis

Last updated 3 years ago

Original post