So Tessellate inherits lots of support for various data formats from Cascading
https://github.com/cwensel/cascading
Even though #apacheparquet dropped Cascading support, we were able to port it over.
Now that Parquet is native to Cascading, it should be easier to add #apacheiceberg support.
This would allow #clusterless to convert data as it arrives into Iceberg continuously for use in #aws Athena or other data front-ends.
Anyone interested in a challenge?
#apacheparquet #ApacheIceberg #clusterless #aws #java
A little more color on this announcement..
https://fosstodon.org/@cwensel/110549001614086663
First, #ApacheParquet removed #Cascading support, so I had to splice the original source into Cascading. But the ParquetScheme didn't honor type information fully. So there is a new TypedParquetScheme that has native support for JSON and Timestamps.
Second, Parquet requires the #ApacheHadoop FileSystem, which means we get the wonderful S3A implementation. But we also get a 331MB jar dependency with the aws bundle.
#apacheparquet #cascading #apachehadoop
Great ! With #apacheparquet ... Let's go to try #TraceQL ...
---
RT @grafana
✨ Grafana Tempo 2.0 is finally here! ✨
Among other updates, Tempo 2.0 comes with two important new features; a new Apache Parquet backend storage format, and #TraceQL, a new language designed for discovering traces.
https://grafana.com/blog/2023/02/01/new-in-grafana-tempo-2.0-apache-parquet-as-the-default-storage-format-support-for-traceql/?mdm=social
https://twitter.com/grafana/status/1620857616736845844
Le format #ApacheParquet devient mainstream, il a pourtant presque 10 ans. En quoi est-il devenu un successeur crédible à #CSV ?
Quels sont ses rapports avec #ApacheArrow, ou #duckdb ? Comment l'utiliser dans #rstats ou #QGIS ?
Je vous éclaire ici 👇 :
https://www.icem7.fr/outils/parquet-devrait-remplacer-le-format-csv/
#apacheparquet #csv #ApacheArrow #duckdb #RStats #qgis