βββHOW LAKEHOUSE TABLE FORMAT WORKSβββ
1. Engine reads table format metadata
2. Builds list of files with relevant data based on metadata
3. Scans those files and executes query
#DataEngineering #DataAnalytics #BigData #DataLakehouse #ApacheIceberg #ApacheHudi #DeltaLake
#dataengineering #dataanalytics #bigdata #datalakehouse #apacheiceberg #apachehudi #deltalake
π§π¨ How to Implement Write-Audit-Publish (WAP) - an exploration of how WAP can be done on @apacheicebergdevs, #apacheHudi, @deltalakeoss, @lakeFS , or #projectNessie
ππ» What is WAP?
ππ» Check out yesterday's blog: https://lakefs.io/blog/data-engineering-patterns-write-audit-publish/?utm_campaign=Social%20media%20activity&utm_source=Mastodon&utm_medium=social&utm_content=blog_rm-wap1
#apachehudi #projectnessie #dataengineering #data #writeauditpublish #datadon
Get a detailed overview of #DeltaLake, #ApacheHudi, and #ApacheIceberg as we discuss their data storage, processing capabilities, and deployment options https://dzone.com/articles/delta-hudi-and-iceberg-the-data-lakehouse-trifecta
#deltalake #apachehudi #apacheiceberg #analytics #spark
This blog from Onehouse about #ApacheHudi is interesting.
My eye was caught by the chart showing which organisations and companies contribute to the #opensource projects. We all know that DB dominates DL. I wonder if the balance on the other two will stay over time or if Onehouse and Tabular (circled) will start to grow.
My Medium adventure enters a new phase: the first post for a Medium-held publication, Plumbers of Data Science, just got published :)
It's also more technical than my previous writings. The point is to introduce Apache Hudi in a softer way than the official documentation does at the moment. So, if you're interested in starting with Hudi, look no further :)
#apachehudi #apachespark #dataengineering
https://medium.com/plumbersofdatascience/apache-hudi-copy-on-write-explained-563f1d23d34f
#apachehudi #ApacheSpark #dataengineering
AWS Glue now supports all three table formats #ApacheHudi #ApacheIceberg, and #DeltaLake: https://aws.amazon.com/about-aws/whats-new/2022/11/aws-glue-apache-spark-native-data-lake-frameworks-apache-hudi-iceberg-delta-lake/
#apachehudi #apacheiceberg #deltalake #datadon