Tabular · @tabular
78 followers · 179 posts · Server data-folks.masto.host

A new Tabular Solutions episode is now available with Tabular co-founder Jason Reid. He shows Shawn Gordon how easy it is to set up a Google Colab notebook and use to read/write data from Tabular-managed tables.

youtu.be/VI5dtq-pCN8

#ApacheSpark #apacheiceberg #iceberg #DataLake #datalakehouse #dataengineering #googlecolab

Last updated 1 year ago

dazfuller :rickwhoah: · @dazfuller
102 followers · 1206 posts · Server mstdn.social

Took some time today to rewrite a local build script I have for my data source. I use it for running tests locally against multiple Spark versions and building jar files to deploy to a test instance.

The current script is all in but I spend very little time in there anymore, so rewrote it as a script, and itโ€™s so much cleaner and nicer to read

#nushell #powershell #databricks #Excel #ApacheSpark

Last updated 1 year ago

Tabular · @tabular
71 followers · 152 posts · Server data-folks.masto.host

Our latest Tabular Bits shows you how to secure any compute engine against your Tabular managed tables. We use in the example to show how simple it is to change access privileges from Tabular immediately. Less than 3 minutes to find out how this works. Secure your data, not your compute, with Tabular.

youtu.be/DmZzn6Jl1IY

#apacheiceberg #ApacheSpark #datasecurity #dataengineering #DataLake #datalakehouse

Last updated 1 year ago

Tabular · @tabular
71 followers · 146 posts · Server data-folks.masto.host

We have a new interactive demo available that will let you walk through the steps involved in securing compute engines with Tabular against your Tabular-managed tables. The concept is significant. A single, unified security layer is applied at the data layer, providing security for tools that don't inherently have them, like . Secure the data, not the compute.

app.storylane.io/share/7vw3snw

#apacheiceberg #ApacheSpark #dataengineering #datasecurity #DataLake #datalakehouse

Last updated 1 year ago

Danica Fine · @thedanicafine
143 followers · 92 posts · Server data-folks.masto.host
Tabular · @tabular
68 followers · 131 posts · Server data-folks.masto.host

The end of May brings the May edition of the Community News. There is a lot of great content from the community once again with the release of Iceberg 1.3, and significant updates to PyIceberg, with the 0.4.0 release right around the corner. Important support was added for version 3.4 and version 1.17. There is also significant news from the vendor community and great blog posts from folks like Anuj Syal and Marin Agliฤ‡ ฤŒuviฤ‡.
tabular.io/blog/iceberg-202305

#apacheiceberg #ApacheSpark #apacheFlink

Last updated 1 year ago

dazfuller :rickwhoah: · @dazfuller
96 followers · 993 posts · Server mstdn.social

Been busy working on updating our data source reader to support Spark 3.4

FileSourceOptions and SparkPath throwing up some changes needed, but prompted a rewrite of the options class which was needed anyway. Unit tests are all passing now for Spark 3.0.1 up to 3.4.0 which is good, now for some manual testing on and

github.com/elastacloud/spark-e

#AzureSynapse #databricks #Excel #ApacheSpark

Last updated 1 year ago

dazfuller :rickwhoah: · @dazfuller
95 followers · 981 posts · Server mstdn.social

So my day has involved implementing a new feature into my Excel data source. And turning an old pallet into a new planter

#Gardening #ApacheSpark

Last updated 1 year ago

New video: Your first Spark SQL application.

No Python, no Scala, just SQL.

youtu.be/RuGm2SmxCWk

#ApacheSpark #sql #dataengineering

Last updated 1 year ago

dazfuller :rickwhoah: · @dazfuller
95 followers · 981 posts · Server mstdn.social

Sometimes it feels very lonely writing code using

But itโ€™s still ducking awesome

#scala #ApacheSpark

Last updated 2 years ago

Next step in Apache Spark DataKickstart is available - how to setup Databricks Community Edition. A simple way to get an environment to practice writing Spark code.

youtube.com/watch?v=Onwt8Twq3f

#ApacheSpark #databricks #datakickstart

Last updated 2 years ago

Tabular · @tabular
62 followers · 91 posts · Server data-folks.masto.host

The @ApacheIceberg newsletter for March is here. Lots of big news with version 1.2, , and vendor support.
tabular.substack.com/p/iceberg

#ApacheSpark #apacheFlink

Last updated 2 years ago

I am creating Apache Spark DataKickstart - free online training. First video released, trying to release a new part to the course every week or so to teach as efficiently as I can. Check it out here: youtu.be/0kQ7Iq_lG-k

#ApacheSpark

Last updated 2 years ago

Wojtek · @WojtekWalczak
0 followers · 1 posts · Server awscommunity.social

My Medium adventure enters a new phase: the first post for a Medium-held publication, Plumbers of Data Science, just got published :)

It's also more technical than my previous writings. The point is to introduce Apache Hudi in a softer way than the official documentation does at the moment. So, if you're interested in starting with Hudi, look no further :)

medium.com/plumbersofdatascien

#apachehudi #ApacheSpark #dataengineering

Last updated 2 years ago

Holden · @holden
543 followers · 66 posts · Server tech.lgbt

RT @jaceklaskowski
Trying to get the better grip over aggregation execution in and wonder what to google for to learn how to describe the topic in a more academic style.

Used "introduction aggregation" with and without "spark" and found some resources.

Any other recs? ๐Ÿ™

#ApacheSpark #sparksql

Last updated 2 years ago

Kit Menke · @kitmenke
4 followers · 5 posts · Server data-folks.masto.host

Come learn about building near-realtime data pipelines with Databricks in a presentation from Scott Crawford at the STL Big Data I.D.E.A. meetup on Wednesday, December 7, at 5:30 PM (Central time, GMT-6). Bring your questions about Spark, Delta Lake, and Streaming! Hope to see you there. meetup.com/st-louis-big-data-i

#bigdata #ApacheSpark #databricks #Streaming #dataengineering

Last updated 2 years ago

Quite an interesting way to index multiple tools in the same space - including search engine hits, and LinkedIn supply & demand. Crude, but paints a broad picture that's not unuseful gradientflow.com/the-stream-pr

#apachekafka #apacheFlink #ApacheSpark #data #streamprocessing

Last updated 2 years ago

Zach Wilson · @zach
795 followers · 150 posts · Server data-folks.masto.host

Hey Mastodon! I created a "content directory" for all my content across Linkedin, Twitter, and YouTube.

I spent about 60 hours engineering this over the last 2 weeks.

You can see my content split into categories like , , etc.

I'll be adding a search function soon too so you can find exactly what you're looking for from the mountain of content I've created over the last two years!

Check it out here!

eczachly.com/about

#apacheairflow #ApacheSpark #dataengineering

Last updated 2 years ago

Zach Wilson · @zach
1115 followers · 177 posts · Server data-folks.masto.host

Hey Mastodon! I created a "content directory" for all my content across Linkedin, Twitter, and YouTube.

I spent about 60 hours engineering this over the last 2 weeks.

You can see my content split into categories like , , etc.

I'll be adding a search function soon too so you can find exactly what you're looking for from the mountain of content I've created over the last two years!

Check it out here!

eczachly.com/about

#dataengineering #ApacheSpark #apacheairflow

Last updated 2 years ago