0
Bootstrap a Data Lakehouse in an Afternoon
https://towardsdatascience.com/bootstrap-a-data-lakehouse-in-an-afternoon/(towardsdatascience.com)A data lakehouse can be bootstrapped on AWS using Apache Iceberg on S3 storage, with AWS Glue for metadata and Amazon Athena for querying. The process involves creating an Iceberg table and performing DML operations like INSERT, UPDATE, DELETE, and MERGE directly through Athena. Key Iceberg features such as time-travel queries and table maintenance using OPTIMIZE and VACUUM are also demonstrated. The guide shows how to inspect the same tables locally with DuckDB and use Glue/Spark for additional data insertion, showcasing a flexible and powerful setup. This approach provides database-like capabilities, including ACID transactions and schema evolution, on object storage.
0 points•by will22•2 days ago