Rui Li of Bilibili Group has written a very informative blog on how Bilibili built an OLAP #DataLakehouse with #ApacheIceberg. They have over 1,000 #Iceberg tables that comprise over 10PB of data, and a daily increment of 75TB. #Trino is serving over 200,000 queries a day in their system with an average response time of 5 seconds. It's a pretty impressive setup.
#datalakehouse #apacheiceberg #iceberg #trino
Have you read this tutorial from Ryan Blue yet that shows you how to use Trino Software Foundation with #ApacheIceberg for data warehousing?
#apacheiceberg #trino #DataLake #datawarehouse #dataengineering
Explore the latest advances in leading #opensource projects and industry technologies includes #DeltaLake, #MLflow, #PyTorch, dbt, Presto/Trino, DuckDB & much more at #DataAISummit - June 26-29!
⭐ Use code is ETLINUX400 (expires June 2) to save $400 off the regular price of the full conference pass.
Register here ➡️ https://dbricks.co/3lvO1hz
#opensource #deltalake #mlflow #pytorch #dataaisummit #trino #oss #presto #duckdb #sanfrancisco
God bless the authors (and the companies that sponsor them) who put such useful books into the public domain!
It may seem like there is nothing new to read, but the information collected in a coherent and structured form does a great job of organizing knowledge in the mind! #trino #book #reading
https://www.amazon.com/Trino-Definitive-Guide-Storage-Environment/dp/109813723X/ref=mp_s_a_1_1
Do you use #Trino and wonder how to try it out with Tabular? Wonder no longer. This 2-minute video walks you through our Trino wizard to get you quickly connected.
#dataengineering #datalake #datalakehouse #iceberg #apacheiceberg.
#trino #dataengineering #DataLake #datalakehouse #iceberg #apacheiceberg
Data Mesh is an architecture for decentralized data storage that enables domain teams to utilize the storage technology of their choice. Microsoft provides Trino, a highly parallel and scalable query engine, as a managed service on Azure HDInsight for Data Mesh architectures. https://techcommunity.microsoft.com/t5/analytics-on-azure-blog/data-mesh-architecture-using-hdinsight-trino/ba-p/3733341 #DataMesh #Trino #AzureHDInsight
#datamesh #trino #azurehdinsight
The folks at #Trino have a clever video showing a solution with #MinIO and #Iceberg.
https://youtu.be/yaxPEWRpEzc
Great talk from SK Telecom from the recent #trino summit, and their journey to #iceberg from #hive.
https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html
"On November 21, 2022, #AWS announced its upstream contributions to
#opensource #Trino, which improves query performance when accessing CSV and JSON data formats."
https://aws.amazon.com/blogs/storage/run-queries-up-to-9x-faster-using-trino-with-amazon-s3-select-on-amazon-emr/
We've been using #Trino at #Backblaze both to experiment with #BackblazeB2 as #datalake storage, and to query our #DriveStats data set. I wrote up our experience in a blog post: https://www.backblaze.com/blog/querying-a-decade-of-drive-stats-data/
#trino #backblaze #BackblazeB2 #datalake #DriveStats
now that #federated is becoming mainstream.... telling everyone about #trino the coolest federated query engine #data
Hi fellow tooters👋 I'm Monica. As I participate in one of the most interesting social experiments ever and migrate from one social platform to another, I thought I'd introduce myself.
I'm a former data engineer who this year turned developer advocate at #starburst. Yay #trino.
Dog mom. Reality TV aficionado. Office mate to @emiller
Ecclesia quo vadis? Vaticano uno e trino. Scrive D`Anna - Formiche.net #ecclesia #vadis #vaticano #trino #scrive #danna #formichenet #4agosto https://parliamodi.news/article/aHR0cHM6Ly9mb3JtaWNoZS5uZXQvMjAyMi8wOC92YXRpY2Fuby1wYXBhLWZyYW5jZXNjby1kaW1pc3Npb25pLw==
#4agosto #formichenet #danna #Scrive #trino #Vaticano #vadis #ecclesia
#dwh #bigdata #datalake #prestodb #trino #clickhouse #disworks #bigdata
Два подхода к Data Warehouse на 2-3 и 120 IT-ков:
- https://habr.com/ru/post/593809/
- https://habr.com/ru/company/mediascope/blog/593685/
Мой стек позволяет обрабатывать тот же объем данных (11млрд/мес), что и компании 2, хоть и не так глубоко, но на серваке за $200/мес и $0 за ПО
- Витрина вместо BI: Zeppelin + R/Python
- Lake: file.gz + S3
- ETL: dataiku dss
- Процессинг: NiFi
- DB: clickhouse
- Doc/беcсхемное: ArangoDB
- Агрегация разных баз: Trino?
#dwh #bigdata #datalake #prestodb #trino #clickhouse #disworks