GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-lake

Website
Wikipedia
treeverse/lakeFS
https://static.github-zh.com/github_avatars/treeverse?size=40
treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

data-engineeringdata-versioningGoobject-storagedata-lakeaws-s3data-qualityazure-blob-storagegoogle-cloud-storagegit-for-dataApache Sparkhadoop-filesystemdatalakedata-version-controlazure-storage
Go 4.72 k
14 小时前
https://static.github-zh.com/github_avatars/dlt-hub?size=40
dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

dataPythondata-engineeringdata-lakedata-loadingdata-warehouseeltextractloadtransform
Python 3.72 k
4 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Apache SparkhiveSQLthriftjdbcspark-sqldata-lakehadoopKubernetesHacktoberfest
Scala 2.2 k
3 天前
https://static.github-zh.com/github_avatars/bytedance?size=40
bytedance / bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...

flinkbig-datadata-integrationdata-lakedata-pipelinedata-synchronizationhigh-performancereal-time
Java 1.67 k
1 年前
san089/Udacity-Data-Engineering-Projects
https://static.github-zh.com/github_avatars/san089?size=40
san089 / Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

datadata-engineeringdata-engineering-pipelineetl-pipelinecassandra-databasepostgresql-databasedata-modelingdata-warehousedata-lakeairflowclusterApache CassandrainfrastructurePostgreSQLAmazon Web Servicesaws-ec2aws-sdkaws-s3cloudformation
Python 1.65 k
3 年前
san089/goodreads_etl_pipeline
https://static.github-zh.com/github_avatars/san089?size=40
san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

etl-pipelineetl-frameworkApache Sparkapache-airflowairflowredshiftemr-clusterlivys3data-lakeschedulerdata-migrationdata-engineeringdata-engineering-pipelinePythonetl-job
Python 1.39 k
5 年前
https://static.github-zh.com/github_avatars/Teradata?size=40
Teradata / kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is license...

Apache Sparknifidata-laketeradatahadoop
Java 1.11 k
2 年前
https://static.github-zh.com/github_avatars/alanchn31?size=40
alanchn31 / Data-Engineering-Projects

Personal Data Engineering Projects

data-lakedata-engineeringdata-warehouseApache CassandraMongoDBscrapyApache SparkairflowPostgreSQLstar-schemadata-modeling
Jupyter Notebook 941
2 年前
https://static.github-zh.com/github_avatars/lakekeeper?size=40
lakekeeper / lakekeeper

Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.

catalogdata-lakeiceberglakehouseRust
Rust 722
4 天前
https://static.github-zh.com/github_avatars/Canner?size=40
Canner / vulcan-sql

Data API Framework for AI Agents and Data Apps

api-builderdata-lakedata-warehouse数据库SQLanalyticsreportingSpreadsheetBigQueryduckdbPostgreSQLsnowflakerestful-apiTypeScriptclickhouseksqldb人工智能ai-agent
TypeScript 684
1 年前
https://static.github-zh.com/github_avatars/uber?size=40
uber / marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

hadoopdata-lakeavro-schemaApache Spark
Java 479
2 年前
https://static.github-zh.com/github_avatars/aws-solutions-library-samples?size=40
aws-solutions-library-samples / data-lakes-on-aws

Enterprise-grade, production-hardened, serverless data lake on AWS

Serverless框架data-lakeanalyticsAmazon Web Servicesetldata-engineeringlake-formationInfrastructure as codebest-practices
Python 454
3 个月前
https://static.github-zh.com/github_avatars/kaiwaehner?size=40
kaiwaehner / hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference

Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required

kafkahivemqMQTTkafka-streamskafka-connectksqlTensorflowgRPCJavaPythondata-lakeconfluentksqldbTerraformGoogle 云KubernetescloudMongoDB
Jupyter Notebook 414
5 年前
https://static.github-zh.com/github_avatars/Canner?size=40
Canner / wren-engine

#大语言模型#🤖 The Semantic Engine for Model Context Protocol(MCP) Clients and AI Agents 🔥

business-intelligencedata数据分析data-analyticsdata-lakedata-warehouseSQLsemanticsemantic-layer大语言模型Hacktoberfestagentagentic-ai人工智能mcpmcp-server
Java 343
4 天前
https://static.github-zh.com/github_avatars/cuebook?size=40
cuebook / cuelake

Use SQL to build ELT pipelines on a data lakehouse.

apache-icebergdeltalakehousedatalakedata-lakeeltetldata-engineeringdata-integrationdata-ingestionApache Sparkspark-sqldata-transferpipelinesdata-pipelinezeppelin-notebookSQL
JavaScript 287
3 年前
https://static.github-zh.com/github_avatars/gigapi?size=40
gigapi / gigapi

GigAPI is an infinite timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐

APIduckdbGoolapparquets3数据库REST APISQLclickhouse-serverdatalakequery-enginedata-lakelakehouse
Go 271
4 天前
https://static.github-zh.com/github_avatars/maxi-k?size=40
maxi-k / btrblocks

BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)

compressiondata-lake数据库research
C++ 243
2 个月前
https://static.github-zh.com/github_avatars/awslabs?size=40
awslabs / amazon-s3-find-and-forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

data-lakeamazon-s3s3gdprAmazon Web Servicesparquetccpabig-data隐私data
Python 242
6 天前
https://static.github-zh.com/github_avatars/Azure?size=40
Azure / usql

U-SQL Examples and Issue Tracking

big-dataAzuredata-lake
C# 234
2 年前
https://static.github-zh.com/github_avatars/garystafford?size=40
garystafford / tickit-data-lake-demo

Resources for video demonstrations and blog posts related to DataOps on AWS

DataOpsAmazon Web ServicesDevOpsairflowredshiftdata-lake
Python 177
3 年前
loading...