GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-pipelines

Website
Wikipedia
apache/airflow
https://static.github-zh.com/github_avatars/apache?size=40
apache / airflow

#计算机科学#Apache Airflow 是一个workflow工作流调度、编排、监控平台

airflowapacheapache-airflowPythonschedulerworkflow自动化dagdata-engineeringdata-integrationdata-orchestratordata-pipelines数据科学eltetl机器学习mlopsorchestrationworkflow-engineworkflow-orchestration
Python 40.56 k
2 小时前
https://static.github-zh.com/github_avatars/pathwaycom?size=40
pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

batch-processingkafkapathwayPythonstreaming机器学习real-timedata-analyticsdata-pipelinesdata-processingdataflowetletl-frameworkiot-analyticsRuststream-processingtime-series-analysis
Python 26.78 k
2 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / dolphinscheduler

一个分布式易扩展的可视化DAG工作流任务调度系统。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中开箱即用

workflow-scheduleazkabanairflowtask-schedulerjob-schedulercloud-nativedata-pipelinesorchestrationworkflowworkflow-orchestrationpowerful-data-pipelines
Java 13.58 k
3 天前
https://static.github-zh.com/github_avatars/dagster-io?size=40
dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.

data-pipelinesdagsterworkflow数据科学workflow-automationPythonschedulerdata-orchestratoretlanalyticsdata-engineeringmlopsorchestrationdata-integrationmetadata
Python 13.38 k
2 天前
https://static.github-zh.com/github_avatars/Unstructured-IO?size=40
Unstructured-IO / unstructured

#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...

深度学习document-parsing机器学习自然语言处理OCRinformation-retrievaldata-pipelinespreprocessingpdf-to-textpdfpdf-to-jsondocument-image-analysisdonutdocument-image-processingdocument-parserdocxlangchain大语言模型
HTML 11.49 k
2 天前
https://static.github-zh.com/github_avatars/mage-ai?size=40
mage-ai / mage-ai

#计算机科学#🧙 Build, run, and manage data pipelines for integrating and transforming data.

机器学习人工智能datadata-engineering数据科学Pythoneltetlpipelinesdata-pipelinesorchestrationdata-integrationSQLApache Sparkdbtpipelinereverse-etltransformation
Python 8.37 k
2 天前
infinyon/fluvio
https://static.github-zh.com/github_avatars/infinyon?size=40
infinyon / fluvio

🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.

cloud-nativestreamingRustreal-timeServerlessstatefulstream-processingWebAssemblydata-integrationdata-flowdistributed-systemsevent-driven-architecturestream-processing-enginedata-pipelinesstreaming-datastreaming-data-pipelinesstreaming-data-processingdata-analyticsstreaming-analytics
Rust 4.94 k
6 天前
https://static.github-zh.com/github_avatars/orchest?size=40
orchest / orchest

#编辑器#Build data pipelines, the easy way 🛠️

数据科学机器学习pipelinesideJupyter Notebookcloud自托管jupyterlabnotebooksDockerPythondata-pipelines部署Kubernetesairflowdagetletl-pipeline
TypeScript 4.13 k
2 年前
StructuredLabs/preswald
https://static.github-zh.com/github_avatars/StructuredLabs?size=40
StructuredLabs / preswald

#大语言模型#Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...

人工智能analyticsdataanalytics-engineeringcopilotdata-applicationsdata-infrastructuredata-pipelinesdata-sdk数据可视化gpt大语言模型Open SourcePythonschema-managementVisual Studio Code
Python 4.12 k
4 天前
https://static.github-zh.com/github_avatars/Netflix?size=40
Netflix / maestro

#计算机科学#Maestro: Netflix’s Workflow Orchestrator

analytics自动化batch-processingdagdata-engineeringDataOpsdata-orchestratordata-pipelines数据科学eltetlJava机器学习mlopsorchestrationschedulerworkflowworkflow-engineworkflow-orchestration
Java 3.48 k
3 天前
https://static.github-zh.com/github_avatars/ucbepic?size=40
ucbepic / docetl

#大语言模型#A system for agentic LLM-powered data processing and ETL

dataetl大语言模型Pythondata-pipelineseltworkflowagentssemantic-datallm-datadocument-processing
Python 2.14 k
4 天前
https://static.github-zh.com/github_avatars/meltano?size=40
meltano / meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

DataOpseltOpen Sourcedatapipelinesextract-dataconnectorsintegrationtaploadersdata-pipelinesdata-engineering
Python 2.1 k
5 天前
elementary-data/elementary
https://static.github-zh.com/github_avatars/elementary-data?size=40
elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

data-lineagedata-governancedata-warehousesnowflakeBigQuery数据分析data-pipelinesdata-pipelinelineagedata-reliabilitydata-observabilityDataOpsdbtredshift
HTML 2.08 k
3 天前
data-engineering-community/data-engineering-wiki
https://static.github-zh.com/github_avatars/data-engineering-community?size=40
data-engineering-community / data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

data数据库data-engineeringdata-engineerSQLdata-modelingdata-pipelinesetl
CSS 1.69 k
15 天前
https://static.github-zh.com/github_avatars/combust?size=40
combust / mleap

MLeap: Deploy ML Pipelines to Production

scikit-learnApache Sparkdata-pipelinestransformersTensorflowScalaPython
Scala 1.52 k
7 个月前
https://static.github-zh.com/github_avatars/pyper-dev?size=40
pyper-dev / pyper

Concurrent Python made simple

asyncioconcurrencyPythonthreadingdata-pipelinesdata-processingmultiprocessingparallel-computingdatadata-collectiondata-engineering
Python 1.43 k
4 个月前
feldera/feldera
https://static.github-zh.com/github_avatars/feldera?size=40
feldera / feldera

The Feldera Incremental Computation Engine

数据库RustSQLstreamingincremental-computationdata-analyticsdata-pipelinesincremental-view-maintenanceivmmaterialized-viewsreal-time
Rust 1.42 k
3 天前
opendatadiscovery/odd-platform
https://static.github-zh.com/github_avatars/opendatadiscovery?size=40
opendatadiscovery / odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Open Sourcedata-platformmetadatametadata-managementdata-pipelinesdata-engineeringobservabilitydata-catalogdata-discoverydata-lineagebigdataalertinglineagedata-profilingdata-explorationdata-governancedata-quality数据科学data-observability
Java 1.33 k
4 个月前
https://static.github-zh.com/github_avatars/fmind?size=40
fmind / mlops-python-package

#计算机科学#Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

mlopsPython自动化data-pipelines数据科学机器学习mlflowpydantic
Jupyter Notebook 1.29 k
7 天前
https://static.github-zh.com/github_avatars/yobix-ai?size=40
yobix-ai / extractous

#自然语言处理#Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

extractionpdftikaunstructuredunstructured-datadata-pipelinesdocxetletl-pipelines大语言模型机器学习自然语言处理OCRpdf-parserragRust
Rust 1.14 k
6 个月前
loading...