Sabías que Apache Hadoop es un framework para el procesamiento distribuido de grandes conjuntos de datos en clústeres de computadoras. Es una tecnología clave en el mundo del big data. 🐘🌐 #ApacheHadoop #CuriosidadesTecnológicas
Suscríbete a Código ergo sum
https://achirinos.substack.com/
#apachehadoop #curiosidadestecnologicas
For those a little familiar with Cascading, it was originally designed to run on #ApacheHadoop, and then #ApacheTez, but it also has a local planner.
This lets developers create non-clustered data applications, without the Hadoop/Tez etc dependencies or runtime.
I've been using the local planner in production for over 5 years now.
But Parquet requires Hadoop libraries, and this is ok, there is a shim between the libraries that allow Parquet and S3AFileSystem to be used locally.
A little more color on this announcement..
https://fosstodon.org/@cwensel/110549001614086663
First, #ApacheParquet removed #Cascading support, so I had to splice the original source into Cascading. But the ParquetScheme didn't honor type information fully. So there is a new TypedParquetScheme that has native support for JSON and Timestamps.
Second, Parquet requires the #ApacheHadoop FileSystem, which means we get the wonderful S3A implementation. But we also get a 331MB jar dependency with the aws bundle.
#apacheparquet #cascading #apachehadoop