GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

datalake

Website
Wikipedia
https://static.github-zh.com/github_avatars/sinaptik-ai?size=40
sinaptik-ai / pandas-ai

#大语言模型#Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

大语言模型pandas人工智能数据分析数据科学gpt-4CSVdataSQL数据库datalake数据可视化text-to-sql
Python 20.61 k
1 个月前
https://static.github-zh.com/github_avatars/trinodb?size=40
trinodb / trino

trino 是一个分布式大数据 SQL 查询引擎(前身 PrestoSQL)

Javaprestohivehadoopbig-dataSQLprestodb数据库distributed-systemsdistributed-database数据科学datalakejdbcquery-enginetrinoanalyticsdelta-lakeiceberg
Java 11.43 k
1 小时前
https://static.github-zh.com/github_avatars/StarRocks?size=40
StarRocks / starrocks

StarRocks 是新一代极速全场景 MPP (Massively Parallel Processing) 数据库。StarRocks 的愿景是能够让用户的数据分析变得更加简单和敏捷。用户无需经过复杂的预处理,就可以用 StarRocks 来支持多种数据分析场景的极速分析。

数据库olapSQLanalyticsbig-datarealtime-databasevectorizeddistributed-databasereal-time-analyticsmppjoinstar-schemareal-time-updatesdelta-lakehudiiceberglakehousedatalakelakehouse-platformcloudnative
Java 10.13 k
1 天前
https://static.github-zh.com/github_avatars/activeloopai?size=40
activeloopai / deeplake

#数据仓库#Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....

数据集深度学习机器学习数据科学PyTorchTensorflowPython人工智能mlops机器视觉cv图像处理datalakelangchain大语言模型large-language-modelsvector-databasevector-searchmulti-modal
Python 8.66 k
5 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.

hudiapachehudidatalakebigdataapachesparkincremental-processingstream-processingdata-integrationapacheflink
Java 5.84 k
1 天前
treeverse/lakeFS
https://static.github-zh.com/github_avatars/treeverse?size=40
treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

data-engineeringdata-versioningGoobject-storagedata-lakeaws-s3data-qualityazure-blob-storagegoogle-cloud-storagegit-for-dataApache Sparkhadoop-filesystemdatalakedata-version-controlazure-storage
Go 4.72 k
10 小时前
DataLinkDC/dinky
https://static.github-zh.com/github_avatars/DataLinkDC?size=40
DataLinkDC / dinky

一个基于 Apache Flink 二次开发、易扩展的一站式开发运维 FlinkSQL 及 SQL 的实时计算平台

flinkflinksqlreal-time-computing-platformflinkcdcolapSQLdatalakedatawarehouse
Java 3.46 k
25 天前
https://static.github-zh.com/github_avatars/lakesoul-io?size=40
lakesoul-io / LakeSoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

lakesouldatalakelakehouseApache Sparkflinkstreamingbig-dataPostgreSQLRustSQLhuggingfacePythonPyTorcharrowdatafusionvectorizedvelox
Java 2.81 k
3 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

datalakelakehousemetadatafederated-querystratospheremetalakeskycomputingdata-catalogai-catalogmodel-catalogopendatacatalog
Java 1.64 k
4 小时前
https://static.github-zh.com/github_avatars/leo-project?size=40
leo-project / leofs

The LeoFS Storage System

Erlangdistributed-storagedistributed-file-systems3-storages3nfsdatalake
Erlang 1.57 k
5 年前
zinggAI/zingg
https://static.github-zh.com/github_avatars/zinggAI?size=40
zinggAI / zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

fuzzymatchfuzzy-matchingEntity resolutiondedupemasterdatadataengineering数据科学Apache Spark机器学习dataqualityanalyticsdatalakemaster-data-managementcustomer-data-platformdatabrickssnowflakecdpmdm
Java 1.04 k
3 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

bigdatadatalakelakehouse
Java 987
2 天前
https://static.github-zh.com/github_avatars/leesf?size=40
leesf / hudi-resources

汇总Apache Hudi相关资料

hudiapachehudiapachedatalakebigdatastream-processingincremental-processingdata-integration
553
7 天前
https://static.github-zh.com/github_avatars/Datavault-UK?size=40
Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

dbtetlsnowflakedataengineeringdatawarehousedatalakeeltSQLmetadata
541
3 个月前
https://static.github-zh.com/github_avatars/paradedb?size=40
paradedb / pg_analytics

DuckDB-powered data lake analytics from Postgres

analyticsarrowcolumnardatafusionlakehouseparquetPostgreSQLduckdbolapbig-data数据库datalakedeltalakeicebergobject-storageSQLlakehouse-platform
Rust 522
3 个月前
https://static.github-zh.com/github_avatars/tansu-io?size=40
tansu-io / tansu

Apache Kafka® compatible broker with S3, PostgreSQL, Apache Iceberg and Delta Lake

built-with-rustPostgreSQLs3apache-icebergapache-kafkaapache-arrowdatafusionparquetdelta-lakedatalake
Rust 387
5 天前
https://static.github-zh.com/github_avatars/linkedin?size=40
linkedin / openhouse

Open Control Plane for Tables in Data Lakehouse

big-datacatalogdatalakedatalakehousedeclarativeicebergmanagementtables
Java 354
2 天前
https://static.github-zh.com/github_avatars/pracdata?size=40
pracdata / awesome-open-source-data-engineering

#Awesome#A curated list of open source tools used in analytics platforms and data engineering ecosystem

Awesome Listsdata-analyticsdata-engineeringdata-platform数据库自托管mlopsdatadata-integrationdatalakelakehouseworkflow-engineanalyticsdata-warehouseobservabilitydata-pipelineetl
345
3 个月前
https://static.github-zh.com/github_avatars/cuebook?size=40
cuebook / cuelake

Use SQL to build ELT pipelines on a data lakehouse.

apache-icebergdeltalakehousedatalakedata-lakeeltetldata-engineeringdata-integrationdata-ingestionApache Sparkspark-sqldata-transferpipelinesdata-pipelinezeppelin-notebookSQL
JavaScript 287
3 年前
https://static.github-zh.com/github_avatars/gigapi?size=40
gigapi / gigapi

GigAPI is an infinite timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐

APIduckdbGoolapparquets3数据库REST APISQLclickhouse-serverdatalakequery-enginedata-lakelakehouse
Go 271
4 天前
loading...