GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-pipeline

Website
Wikipedia
https://static.github-zh.com/github_avatars/apache?size=40
apache / shardingsphere

#Mysql高可用中间件#ShardingSphere 是一个用于分表分库的数据库中间件,由JDBC、Proxy 和 Sidecar组成

数据库distributed-databasedistributed-sql-databaseSQLsharddatabase-clusterMySQLPostgreSQLencryptbigdatadata-encryptiondata-pipelinedatabase-middlewaredistributed-transactionread-write-splittingdatabase-gateway
Java 20.29 k
1 天前
https://static.github-zh.com/github_avatars/airbytehq?size=40
airbytehq / airbyte

Airbyte 开源 EL(T) 平台,帮助用户将数据从应用程序,API 和数据库中同步到数据仓库

datapipeline数据分析data-engineeringJavaPythonetlchange-data-capturedata-collectiondata-integrationeltBigQueryredshiftsnowflakedata-pipelinesql-serverMySQLPostgreSQLs3自托管
Python 18.43 k
15 小时前
https://static.github-zh.com/github_avatars/debezium?size=40
debezium / debezium

Debezium为捕获数据更改(change data capture,CDC)提供了一个低延迟的流式处理平台

change-data-capturekafka-connectapache-kafkadebeziumcdc数据库kafkakafka-producerevent-streamingdata-pipeline
Java 11.49 k
3 天前
snowplow/snowplow
https://static.github-zh.com/github_avatars/snowplow?size=40
snowplow / snowplow

The leader in Customer Data Infrastructure

analyticsdatadata-pipelinedata-collectionproduct-analytics
Scala 6.93 k
11 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / flink-cdc

Flink CDC Connector 是ApacheFlink的一组数据源连接器

change-data-capturecdcbatchdata-integrationdata-pipelinedistributedeltetlflinkkafkaMySQLpaimonPostgreSQLreal-timeschema-evolution
Java 6.1 k
2 天前
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / data-juicer

#大语言模型#Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

数据分析数据科学large-language-models大语言模型数据可视化instruction-tuningpre-trainingmulti-modalsynthetic-datadatadata-pipelinedata-processingfoundation-models
Python 4.58 k
2 天前
rudderlabs/rudder-server
https://static.github-zh.com/github_avatars/rudderlabs?size=40
rudderlabs / rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

隐私warehouse-managementdata-warehousecustomer-data-platformdata-integrationdata-synchronizationetlBigQueryredshiftsnowflakedata-pipelineeltdata-engineeringcdpevent-streaming
Go 4.21 k
3 天前
https://static.github-zh.com/github_avatars/adilkhash?size=40
adilkhash / Data-Engineering-HowTo

A list of useful resources to learn Data Engineering from scratch

distributed-systemsdata-engineeringdata-pipelinecloud-providersScala
3.81 k
1 年前
superstreamlabs/memphis
https://static.github-zh.com/github_avatars/superstreamlabs?size=40
superstreamlabs / memphis

Memphis.dev is a highly scalable and effortless data streaming platform

datadata-stream-processingdata-streamingKubernetesmessaging-queuedata-engineeringdata-pipelineGoenrichmentmessage-brokermessage-busmessage-queue微服务schema-registry
Go 3.31 k
1 年前
https://static.github-zh.com/github_avatars/bruin-data?size=40
bruin-data / ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

BigQuerycopy-databasedata-ingestiondata-integrationdata-pipelineduckdbingestion-pipelinesql-serverPostgreSQLsnowflake
Python 2.97 k
3 天前
https://static.github-zh.com/github_avatars/whylabs?size=40
whylabs / whylogs

#计算机科学#An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...

ai-pipelinesapproximate-statisticsstatistical-propertiesdata-qualitycalculate-statisticsPythonLoggingmlopsDataOpsml-pipelinesdata-pipelinedataset机器学习数据科学analyticsconstraints
Jupyter Notebook 2.72 k
5 个月前
elementary-data/elementary
https://static.github-zh.com/github_avatars/elementary-data?size=40
elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

data-lineagedata-governancedata-warehousesnowflakeBigQuery数据分析data-pipelinesdata-pipelinelineagedata-reliabilitydata-observabilityDataOpsdbtredshift
HTML 2.08 k
3 天前
reugn/go-streams
https://static.github-zh.com/github_avatars/reugn?size=40
reugn / go-streams

A lightweight stream processing library for Go

stream-processingpipelineetlkafkadata-streamkafka-streamsstreamsRedisApache Pulsardata-pipelinestreaming-datastream-processorWebSocketnats-streamingstreaming-apiwindowing低代码workflow
Go 2.05 k
1 个月前
https://static.github-zh.com/github_avatars/pydoit?size=40
pydoit / doit

CLI task management & automation tool

Pythonbuild-toolbuild-automationtask-runnerbuild-systemworkflow-managementdata-pipelineworkflowworkflow-automation数据科学Hacktoberfest命令行界面
Python 1.94 k
1 年前
https://static.github-zh.com/github_avatars/bytedance?size=40
bytedance / bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...

flinkbig-datadata-integrationdata-lakedata-pipelinedata-synchronizationhigh-performancereal-time
Java 1.67 k
1 年前
Multiwoven/multiwoven
https://static.github-zh.com/github_avatars/Multiwoven?size=40
Multiwoven / multiwoven

🔥🔥🔥 Open source composable CDP - alternative to hightouch and census.

data-engineeringreverse-etldata-pipelinedata-activationetlReactRuby自托管Open SourcedbtBigQuerydata-warehousedatabrickspostresqlredshiftsnowflakeTypeScriptHacktoberfestcdpcustomer-data-platform
Ruby 1.61 k
4 天前
https://static.github-zh.com/github_avatars/GoogleCloudPlatform?size=40
GoogleCloudPlatform / data-science-on-gcp

#计算机科学#Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

数据分析数据可视化cloud-computing机器学习data-pipelinedata-processing数据科学data-engineering
Jupyter Notebook 1.39 k
3 个月前
https://static.github-zh.com/github_avatars/damklis?size=40
damklis / DataEngineeringProject

#网络爬虫#Example end to end data engineering project.

big-datascrapingMongoDBelasticsearchdata-engineeringkafkakafka-connectdebeziumdjango-rest-frameworkRedisairflowminios3Pythondata-pipelineHacktoberfest
Python 1.29 k
3 年前
https://static.github-zh.com/github_avatars/superlinked?size=40
superlinked / superlinked

#自然语言处理#Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.

embeddingsetlvector-searchdata-pipeline深度学习information-retrieval大语言模型机器学习mlops自然语言处理Pythonretrievalretrieval-augmented-generationsemantic-searchvectorizationvector-database
Jupyter Notebook 1.15 k
4 天前
https://static.github-zh.com/github_avatars/datazip-inc?size=40
datazip-inc / olake

Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres,...

cdcchange-data-capturedata-pipeline数据库eltlakehousereplicationapache-icebergparquets3
Go 889
3 天前
loading...