GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

etl-pipeline

Website
Wikipedia
Zipstack/unstract
https://static.github-zh.com/github_avatars/Zipstack?size=40
Zipstack / unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

etl-pipelinellm-platformunstructured-data
Python 5.34 k
2 天前
https://static.github-zh.com/github_avatars/orchest?size=40
orchest / orchest

#编辑器#Build data pipelines, the easy way 🛠️

数据科学机器学习pipelinesideJupyter Notebookcloud自托管jupyterlabnotebooksDockerPythondata-pipelines部署Kubernetesairflowdagetletl-pipeline
TypeScript 4.13 k
2 年前
apache/streampark
https://static.github-zh.com/github_avatars/apache?size=40
apache / streampark

StreamX 的初衷是为了让流处理更简单. 打造一个一站式大数据平台,流批一体,湖仓一体的解决方案

streamingstreamparkapachedevelopment-frameworkeasy-to-useetl-pipelineoperation-platform
Java 4.09 k
7 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / hamilton

#计算机科学#Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

数据科学Pythondagdata-engineeringdataframeetletl-frameworketl-pipelinefeature-engineering机器学习pandas软件工程数据分析lineagellmopsmlopsorchestrationHacktoberfestrag
Jupyter Notebook 2.15 k
6 天前
https://static.github-zh.com/github_avatars/AlexIoannides?size=40
AlexIoannides / pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

pysparketl-jobPythondata-engineeringApache Spark数据科学etletl-pipeline
Python 1.93 k
2 年前
san089/Udacity-Data-Engineering-Projects
https://static.github-zh.com/github_avatars/san089?size=40
san089 / Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

datadata-engineeringdata-engineering-pipelineetl-pipelinecassandra-databasepostgresql-databasedata-modelingdata-warehousedata-lakeairflowclusterApache CassandrainfrastructurePostgreSQLAmazon Web Servicesaws-ec2aws-sdkaws-s3cloudformation
Python 1.65 k
3 年前
san089/goodreads_etl_pipeline
https://static.github-zh.com/github_avatars/san089?size=40
san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

etl-pipelineetl-frameworkApache Sparkapache-airflowairflowredshiftemr-clusterlivys3data-lakeschedulerdata-migrationdata-engineeringdata-engineering-pipelinePythonetl-job
Python 1.39 k
5 年前
https://static.github-zh.com/github_avatars/JSv4?size=40
JSv4 / OpenContracts

#大语言模型#Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!

agentagentic-aietletl-pipeline大语言模型unstructured-datavector-databaseprompt-engineering
Python 878
6 天前
https://static.github-zh.com/github_avatars/stitchfix?size=40
stitchfix / hamilton

#计算机科学#A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

Pythonpandasdag数据科学data-engineeringNumPy软件工程etl-frameworketl-pipelineetlfeature-engineeringdataframedata-platform机器学习
Python 860
2 年前
https://static.github-zh.com/github_avatars/techascent?size=40
techascent / tech.ml.dataset

#计算机科学#A Clojure high performance data processing system

ClojuredataframeCSVxlsxdatascience机器学习datasetetl-pipelineJava
Clojure 706
1 个月前
https://static.github-zh.com/github_avatars/SorellaLabs?size=40
SorellaLabs / brontes

A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection

以太坊evmmevetl-pipelineRust
Rust 617
2 个月前
https://static.github-zh.com/github_avatars/Pravko-Solutions?size=40
Pravko-Solutions / FlashLearn

#大语言模型#Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.

人工智能ai-agentsconcurrency大语言模型llm-agentPythonagentic-ai-developmentai-agents-frameworketl-pipeline
Python 595
3 个月前
https://static.github-zh.com/github_avatars/YotpoLtd?size=40
YotpoLtd / metorikku

A simplified, lightweight ETL Framework based on Apache Spark

big-dataApache SparkScalaetl-frameworkdistributed-computingSQLetletl-pipeline
Scala 586
1 年前
https://static.github-zh.com/github_avatars/unbody-io?size=40
unbody-io / unbody

#大语言模型#The Supabase of AI era. A modular, open-source backend for building AI-native software — designed for knowledge, not static data.

agentic-aiai-native后端聊天机器人data-ingestiondeveloper-toolsetl-pipelinegenerative-aiknowledge-base大语言模型ragvector-database
TypeScript 298
10 天前
https://static.github-zh.com/github_avatars/ebonnal?size=40
ebonnal / streamable

concurrent & fluent interface for (async) iterables

data-engineeringetl-pipelineetlreverse-etlcollectionsstreamsfluent-interfaceimmutabilitylazy-evaluationmethod-chainingvisitor-patterndataPythonasyncioconcurrent-data-structuremultiprocessingmultithreading
Python 269
4 天前
https://static.github-zh.com/github_avatars/airscholar?size=40
airscholar / e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...

apache-airflowapache-kafkaApache Sparkbig-dataApache Cassandracontainerizationdata-engineeringdata-pipelinedata-processingDockeretl-pipelinePostgreSQLreal-time-analytics
Python 255
4 个月前
https://static.github-zh.com/github_avatars/DataWithBaraa?size=40
DataWithBaraa / sql-data-warehouse-project

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

数据分析data-analyticsdata-cleaningdata-engineering数据科学data-warehousedata-warehousingdatalakedatasciencedatawarehouseetletl-jobetl-pipelineSQLsql-querysql-server
TSQL 198
2 个月前
https://static.github-zh.com/github_avatars/SETL-Framework?size=40
SETL-Framework / setl

#计算机科学#A simple Spark-powered ETL framework that just works 🍺

Apache Sparketl框架Scalapipelinedata-transformation数据科学data-engineering数据分析modularizationdatasetbig-dataetl-pipeline机器学习
Scala 181
1 个月前
https://static.github-zh.com/github_avatars/jitsucom?size=40
jitsucom / bulker

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

data-engineeringdatawarehouseetletl-pipelineingestionpipeline
Go 178
4 天前
https://static.github-zh.com/github_avatars/data-engineering-community?size=40
data-engineering-community / data-engineering-project-template

This is a template you can use for your next data engineering portfolio project.

data-engineeringSQLPythondatadata-warehouseetletl-pipeline
176
4 年前
loading...