FedSearch - Federated network search engine

Yaro Ivanovsky · @YaroIvanovsky

0 followers · 3 posts · Server mas.to

DataBricks Introduces English as a New Programming Language for #ApacheSpark

https://analyticsindiamag.com/databricks-introduces-english-as-a-new-programming-language-for-apache-spark/

#apachespark

Last updated 2 years ago

Original post

Dylan Van Assche · @dylanvanassche

684 followers · 1016 posts · Server fosstodon.org

Open media

Claus Stadler is presenting their work behind SANSA: 'Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark' now at the Knowledge Graph Construction Workshop!

#ESWC2023 #KGCW2023 #RML #SPARQL #ApacheSpark @eswc_conf @aksw

#eswc2023 #kgcw2023 #rml #sparql #apachespark

Last updated 2 years ago

Original post

Delta Lake · @deltalakeoss

46 followers · 57 posts · Server social.lfx.dev

Open media

📣 We are excited to announce the release of Delta Lake 2.4.0 on Apache Spark 3.4. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13! 🎉

Documentation: https://lnkd.in/eTD9ua_6
Python artifacts: https://lnkd.in/e65AeChW

⭐ View the release notes: https://lnkd.in/er2PDhjJ

#deltalake #opensource #oss #data #apachespark

Last updated 2 years ago

Original post

Tech news from Canada · @TechNews

454 followers · 12806 posts · Server mastodon.roitsystems.ca

Ars Technica: “A really big deal”—Dolly is a free, open source, ChatGPT-style AI model https://arstechnica.com/?p=1931693 #Tech #arstechnica #IT #Technology #largelanguagemodels #machinelearning #textsynthesis #ApacheSpark #Databricks #EleutherAI #finetuning #Biz&IT #pythia #Dolly #LLaMA #meta #AI

#Tech #arstechnica #it #technology #largelanguagemodels #machinelearning #textsynthesis #apachespark #databricks #eleutherai #finetuning #biz #pythia #dolly #llama #meta #ai

Last updated 3 years ago

Original post

IT News · @itnewsbot

3096 followers · 256330 posts · Server schleuss.online

“A really big deal”—Dolly is a free, open source, ChatGPT-style AI model - Enlarge (credit: Databricks)

On Wednesday, Databricks released... - https://arstechnica.com/?p=1931693 #largelanguagemodels #machinelearning #textsynthesis #apachespark #databricks #eleutherai #finetuning #biz⁢ #pythia #dolly #llama #meta #ai

#ai #meta #llama #dolly #pythia #biz #finetuning #eleutherai #databricks #apachespark #textsynthesis #machinelearning #largelanguagemodels

Last updated 3 years ago

Original post

Delta Lake · @deltalakeoss

38 followers · 37 posts · Server social.lfx.dev

Check out the latest installment of #LastWeekInAByte on #DeltaLake which includes:

✔️ The fully packed delta-spark 2.3 release (it's not always about #rustlang and #python ... just a lot of the time) #apachespark

✔️ A great post by Will Girten on #deltasharing and #streaming

✔️ A shout to The Linux Foundation #finos #legend project

✔️ Our new #Slack archive via Linen, &..

✔️ A great blog by Nick Karpov on #AWS #Lambda and #DeltaLake

https://youtu.be/nILXJ-0aTPY

#opensource #linuxfoundation

#lastweekinabyte #deltalake #rustlang #python #apachespark #deltasharing #streaming #finos #legend #slack #aws #lambda #opensource #linuxfoundation

Last updated 3 years ago

Original post

Carlos Peña · @capemo

30 followers · 200 posts · Server infosec.exchange

Yesterday we tried to upload 10M rows to an #Azure Table using #ApacheSpark and #Databricks. We hit 333K rows per minute at our best.

Wondering if anyone here has done something similar?

From what I read, the max transactions per second on an Azure Table is 20K so… I guess we can try to speed it up a bit further.

#azure #apachespark #databricks

Last updated 3 years ago

Original post

Poda Black · @PodaBlack

4 followers · 27 posts · Server zbrx.org

Data Science Frameworks: This field involves the use of statistics, scientific methods, and algorithms to extract knowledge. Popular data science frameworks include #TensorFlow, #PyTorch, #ApacheSpark, and #NumPy, with #Python being the dominant programming language.

#tensorflow #pytorch #apachespark #numpy #python

Last updated 3 years ago

Original post

Carlos Peña · @capemo

30 followers · 195 posts · Server infosec.exchange

Wish someday we could get desktop computers instead of laptops for work.

I use #ApacheSpark on a daily basis and whenever I start running some functional tests the fans don’t like it. Yesterday the CPU was at 90°C. :blob_cry:

#apachespark

Last updated 3 years ago

Original post

Carlos Peña · @capemo

26 followers · 163 posts · Server infosec.exchange

Refactored some code today. Reduced execution time from ~13h to 5m.

#ApacheSpark is amazing.

#apachespark

Last updated 3 years ago

Original post

Jacek Laskowski · @jaceklaskowski

98 followers · 18 posts · Server fosstodon.org

@oleg Mostly source code (so I can learn even more at the same time). Worked great with #ApacheSpark #DeltaLake #ApacheKafka (as they all are written in #Scala). Thinking of #Dask as it's close to Spark but written in #Python I'd like to know better. HTH

#apachespark #DeltaLake #apachekafka #scala #dask #python

Last updated 3 years ago

Original post

Jacek Laskowski · @jaceklaskowski

98 followers · 18 posts · Server fosstodon-org.social.shrimpcam.pw

@oleg Mostly source code (so I can learn even more at the same time). Worked great with #ApacheSpark #DeltaLake #ApacheKafka (as they all are written in #Scala). Thinking of #Dask as it's close to Spark but written in #Python I'd like to know better. HTH

#apachespark #DeltaLake #apachekafka #scala #dask #python

Last updated 3 years ago

Original post

Wojtek Walczak · @wojtekwalczak

26 followers · 40 posts · Server mastodon.social

My Medium adventure enters a new phase: the first post for a Medium-held publication, Plumbers of Data Science, just got published :)

It's also more technical than my previous writings. The point is to introduce Apache Hudi in a softer way than the official documentation does at the moment. So, if you're interested in starting with Hudi, look no further :)

#apachehudi #apachespark #dataengineering

https://medium.com/plumbersofdatascience/apache-hudi-copy-on-write-explained-563f1d23d34f

#ApacheHudi #apachespark #dataengineering

Last updated 3 years ago

Original post

Paul King · @paulk

13 followers · 3 posts · Server foojay.social

Open media

Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv

https://www.javaadvent.com/2022/12/groovy-and-data-science.html

Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning

#apachebeam #apachecamel #apachecommons #apacheignite #apachemxnet #apacheopennlp #apachespark #apachewayang #datumbox #deepjavalibrary #deepnetts #eclipsedl4j #graalvm #gradle #stanfordnlp #tensorflow #tribuo #smile #tablesaw #opencsv #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning

Last updated 3 years ago

Original post

Paul King · @paulk

13 followers · 4 posts · Server foojay.social

Open media

Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv

https://www.javaadvent.com/2022/12/groovy-and-data-science.html

Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning

#apachebeam #apachecamel #apachecommons #apacheignite #apachemxnet #apacheopennlp #apachespark #apachewayang #datumbox #deepjavalibrary #deepnetts #eclipsedl4j #graalvm #gradle #stanfordnlp #tensorflow #tribuo #smile #tablesaw #opencsv #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning

Last updated 3 years ago

Original post

Alex Ott · @alexott

1 followers · 7 posts · Server infosec.exchange

Another post at company blog: Build Reliable and Cost Effective Streaming Data Pipelines With Delta Live Tables’ Enhanced Autoscaling

Autoscaling is important for handling of cybersecurity data that are often spiky by its nature. For this blog post I specially selected Zeek logs as an example to demonstrate that it’s possible to build cost efficient data ingestion pipelines.

https://www.databricks.com/blog/2022/12/08/build-reliable-and-cost-effective-streaming-data-pipelines.html

#databricks #zeek #deltalivetables #cybersecurity #apachespark

Last updated 3 years ago

Original post

IT News · @itnewsbot

2274 followers · 240364 posts · Server schleuss.online

AWS Glue upgrades Spark engines, backs Ray framework - AWS Glue, a serverless data integration service provided by Amazon Web Services, showc... - https://www.infoworld.com/article/3681339/aws-glue-upgrades-spark-engines-backs-ray-framework.html#tk.rss_all #amazonwebservices #dataintegration #cloudcomputing #apachespark #python

#python #apachespark #cloudcomputing #dataintegration #amazonwebservices

Last updated 3 years ago

Original post

Ambarish Ganguly · @ambarish

1 followers · 7 posts · Server hachyderm.io

This is a video on - Joins , Null Values and Built In Functions - 8th video of the playlist Apache Spark Developer Associate [ Databricks ]

The material is from the Databricks Academy. Please do subscribe , comment and lets learn together as a community.
#azuredatabricks #apachespark #apachesparkdeveloper #certifications

Objectives
✅ Apply built-in functions to generate data for new columns
✅ Apply DataFrame NA functions to handle null values
✅ Join DataFrames

https://youtu.be/oF3_AbndFcY

#azuredatabricks #apachespark #apachesparkdeveloper #certifications

Last updated 3 years ago

Original post