Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres,...
Apache Kafka® compatible broker with S3, PostgreSQL, Apache Iceberg and Delta Lake
Use SQL to build ELT pipelines on a data lakehouse.
Lakehouse storage system benchmark
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
The Control Plane for Apache Iceberg
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
Icebird: JavaScript Iceberg Client
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
DAIVI is a reference solution with IAC modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation Amazon SageMaker Unified Studio. The goa...
Spark data pipeline that processes movie ratings data.
An open-source, community-driven REST catalog for Apache Iceberg!
Sample code to collect Apache Iceberg metrics for table monitoring
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark...
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino