Kata · @PandaKata
2 followers · 50 posts · Server data-folks.masto.host

After a good three months of , I can only recommend the free course to everyone. The team puts together a fantastic program and the community is great. Everyone is so helpful so that everyone can get the best out of this course. THANK YOU

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 49 posts · Server data-folks.masto.host

Additional advantages of peer reviewing:
- opportunity for networking and collaboration with others in the same field or discipline
- stay current with the latest research and developments in the field
- learn about new approaches and techniques

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 48 posts · Server data-folks.masto.host

Peer reviewing other people's projects can be a valuable learning experience for several reasons:
- identify strengths and weaknesses in your own work by comparing
- exposes you to new perspectives and ideas that you may not have considered before.
- develop critical thinking skills by evaluating someone else's work.

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 47 posts · Server data-folks.masto.host

The peer review at is a great opportunity to find inspiration for future projects. Read more about it here:
github.com/DataTalksClub/data-

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 46 posts · Server data-folks.masto.host

I got three really exciting projects: The first one, which I'm supposed to grade, had a student do a sentiment analysis with tweets about the Corona crisis.

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 45 posts · Server data-folks.masto.host

Yesterday was the deadline for our projects for . Today the peer review finally starts: Everyone gets assigned three repositories and has to rate them according to given criteria.

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 44 posts · Server data-folks.masto.host

For orchestrating my prefect flow I decided to use another technology. Github Actions is a CI/CD service that automates workflows, like building, testing, and deploying code. You can create custom workflows using YAML files or choose from community-shared workflows. This is a great step-by-step guide:
medium.com/the-prefect-blog/sc

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 43 posts · Server data-folks.masto.host

The last part of my project for is the visualization. Since my tables are in BigQuery, I decided to use Google Looker Studio. You can also use it with a bunch of other data sources. lookerstudio.google.com/

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 42 posts · Server data-folks.masto.host

In dbt, staging and production are two separate environments that serve different purposes. Staging is where you prepare data for production, with testing and validation. Production is where end-users access and interact with data. Staging is for data preparation, production is for delivering value to users.

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 41 posts · Server data-folks.masto.host

I also found that you can use dbt to partition tables. The help to improve query performance and manageability by dividing large tables into smaller, more manageable parts. They enable faster queries by allowing data to be accessed and processed more efficiently, reducing the amount of data scanned by queries.

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 40 posts · Server data-folks.masto.host

After loading my data into Google BigQuery I do some transformations with dbt. For beginners I would recommend the resources provided on the dbt website:
courses.getdbt.com/courses/fun

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 39 posts · Server data-folks.masto.host

A first draft for my architecture for my project for :

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 38 posts · Server data-folks.masto.host

It took me a while to fully grasp the concept of scheduling: If you schedule a flow in prefect cloud you still have to keep the agent running. Possible setup solution can be found here: medium.com/@danilo.drobac/7-a-

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 37 posts · Server data-folks.masto.host

You can work with prefect locally or in the cloud. I went with the cloud version, since I sometimes have problems accessing the Orion UI.
docs.prefect.io/ui/cloud/

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 36 posts · Server data-folks.masto.host

For orchestration I decided to go with prefect. Every day at 6, my flow loads the data from the .csv into Google Cloud Storage. 15 minutes later it loads the data from Cloud Storage to BigQuery. I used Blocks for that:
docs.prefect.io/concepts/block

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 34 posts · Server data-folks.masto.host

I want to design a clear diagram of the pipeline of my capstone project for . For this I have decided to use miro: miro.com/de/

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 33 posts · Server data-folks.masto.host

The first step of my capstone project for will be to create a new virtual machine on google cloud platform and document this step already - so I can make sure that everything is reproducible.

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 32 posts · Server data-folks.masto.host

Today is the first day that I am working on my capstone project for . I decided to work with covid data by our world in data: github.com/owid/covid-19-data/

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 31 posts · Server data-folks.masto.host

An important factor in data quality are tests: I learned in the workshop that Piperider allows you to test aspects of your data such as:
- do certain columns exist?
- are the unique?
- is the datatype correct?
- are there null value?
- what is the range of the column?
- is the schema consistent?

#dezoomcamp

Last updated 2 years ago

Kata · @PandaKata
2 followers · 30 posts · Server data-folks.masto.host

PipeRider is open-source which is a big plus. You can connect it to your data source at any point in your pipeline, generate a profile and test your profile against your assertions.

blog.piperider.io/data-reliabi

#dezoomcamp

Last updated 2 years ago