#计算机科学#Apache Airflow 是一个workflow工作流调度、编排、监控平台
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
一个分布式易扩展的可视化DAG工作流任务调度系统。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中开箱即用
An orchestration platform for the development, production, and observation of data assets.
#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
#计算机科学#🧙 Build, run, and manage data pipelines for integrating and transforming data.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
#编辑器#Build data pipelines, the easy way 🛠️
#大语言模型#Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, D...
#计算机科学#Maestro: Netflix’s Workflow Orchestrator
#大语言模型#A system for agentic LLM-powered data processing and ETL
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
The best place to learn data engineering. Built and maintained by the data engineering community.
MLeap: Deploy ML Pipelines to Production
Concurrent Python made simple
The Feldera Incremental Computation Engine
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
#计算机科学#Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
#自然语言处理#Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.