trino 是一个分布式大数据 SQL 查询引擎(前身 PrestoSQL)
StarRocks 是新一代极速全场景 MPP (Massively Parallel Processing) 数据库。StarRocks 的愿景是能够让用户的数据分析变得更加简单和敏捷。用户无需经过复杂的预处理,就可以用 StarRocks 来支持多种数据分析场景的极速分析。
#数据仓库#Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
Upserts, Deletes And Incremental Processing on Big Data.
lakeFS - Data version control for your data lake | Git for data
一个基于 Apache Flink 二次开发、易扩展的一站式开发运维 FlinkSQL 及 SQL 的实时计算平台
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
The LeoFS Storage System
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
DuckDB-powered data lake analytics from Postgres
Apache Kafka® compatible broker with S3, PostgreSQL, Apache Iceberg and Delta Lake
Open Control Plane for Tables in Data Lakehouse
#Awesome#A curated list of open source tools used in analytics platforms and data engineering ecosystem
Use SQL to build ELT pipelines on a data lakehouse.
GigAPI is an infinite timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐