GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-engineering-pipeline

Website
Wikipedia
san089/Udacity-Data-Engineering-Projects
https://static.github-zh.com/github_avatars/san089?size=40
san089 / Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

datadata-engineeringdata-engineering-pipelineetl-pipelinecassandra-databasepostgresql-databasedata-modelingdata-warehousedata-lakeairflowclusterApache CassandrainfrastructurePostgreSQLAmazon Web Servicesaws-ec2aws-sdkaws-s3cloudformation
Python 1.65 k
3 年前
san089/goodreads_etl_pipeline
https://static.github-zh.com/github_avatars/san089?size=40
san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

etl-pipelineetl-frameworkApache Sparkapache-airflowairflowredshiftemr-clusterlivys3data-lakeschedulerdata-migrationdata-engineeringdata-engineering-pipelinePythonetl-job
Python 1.39 k
5 年前
https://static.github-zh.com/github_avatars/vmware?size=40
vmware / versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

数据科学data-engineeringSQLtrinodata-lineageetleltdata-pipelinesdata-engineerdata-warehouseanalyticssnowflakeDataOpsdata-engineering-pipelinePythondatapipeline数据结构数据库
Python 450
20 天前
https://static.github-zh.com/github_avatars/alanchn31?size=40
alanchn31 / Movalytics-Data-Warehouse

Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow

DockerairflowApache SparkSQLPythonredshiftdata-engineering-pipelinepysparkdata-engineeringmovie-databaseaws-s3analyticsdata-modellingudacity
Python 146
5 年前
https://static.github-zh.com/github_avatars/anna-geller?size=40
anna-geller / dataflow-ops

Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate

自动化Amazon Web ServicesdatafloworchestrationprefectServerlessanalyticsanalytics-engineeringCI/CDdatadata-engineering数据科学Infrastructure as codeobservabilitypipelinePythondata-engineering-pipeline
Python 113
2 年前
https://static.github-zh.com/github_avatars/AhmetFurkanDEMIR?size=40
AhmetFurkanDEMIR / Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

datadata-engineerdata-engineeringdata-engineering-pipelineDockerDocker Composehadoophadoop-filesystemhadoop-hdfshdfskafkakafka-consumerkafka-producerkafka-uiPython
Python 112
2 年前
https://static.github-zh.com/github_avatars/anna-geller?size=40
anna-geller / prefect-deployment-patterns

Code examples showing flow deployment to various types of infrastructure

自动化Amazon Web Servicesdatadata-engineeringdata-engineering-pipeline数据科学datafloworchestrationpipelineprefectPythonServerlessserverless-framework
Python 107
2 年前
https://static.github-zh.com/github_avatars/immu0001?size=40
immu0001 / Udacity-Data-Engineer-nanodegree

Classwork projects and home works done through Udacity data engineering nano degree

Apache Spark数据分析big-dataetldata-pipelines数据科学data-lake-analyticss3-bucketemr-clusterredshiftdata-engineering-pipeline
Jupyter Notebook 74
2 年前
https://static.github-zh.com/github_avatars/anki-code?size=40
anki-code / xontrib-pipeliner

Let your pipe lines flow thru the Python code in xonsh.

XonshXontribpipepipelinepipelinesPythonShelldata-engineeringdata-engineering-pipeline
Python 59
1 年前
https://static.github-zh.com/github_avatars/anna-geller?size=40
anna-geller / prefect-aws-lambda

Deploy a Prefect flow to serverless AWS Lambda function

Amazon Web Servicesaws-lambdadata-engineeringdata-engineering-pipelinedataflowevent-drivenevent-driven-architecturelambdaPythonServerlessserverless-framework自动化CI/CD数据科学pipeline
Python 35
3 年前
https://static.github-zh.com/github_avatars/mikeroyal?size=40
mikeroyal / Apache-Spark-Guide

#Awesome#Apache Spark Guide

Apache Sparkspark-streamingdata-engineeringpyspark机器学习big-data数据科学data-engineering-pipelineAwesome Lists
Python 31
3 年前
https://static.github-zh.com/github_avatars/kishlayjeet?size=40
kishlayjeet / Stock-Market-Real-Time-Data-Pipeline-with-Apache-Kafka-and-Cassandra

A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.

apache-kafkaApache CassandrakafkapipelinePythonstock-marketAmazon Web Servicesaws-ec2data-engineeringdata-engineering-pipelinedata-pipelineec2etletl-pipelinekafka-streamsreal-time
Python 25
2 年前
https://static.github-zh.com/github_avatars/InosRahul?size=40
InosRahul / f1-data-pipeline

F1 Data Pipeline

BigQuerydata-engineering-pipelinedbtgcsprefectPythonTerraform
Python 23
2 年前
https://static.github-zh.com/github_avatars/VeraZab?size=40
VeraZab / nyc-stats

Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker

Google 云prefectTerraformdbtdata-engineering-pipelineCI/CDDocker
Python 20
2 年前
https://static.github-zh.com/github_avatars/longNguyen010203?size=40
longNguyen010203 / Youtube-Recommend-Master-ETL-Pipeline

A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api

dagsteretl-pipelineminioApache SparkdbtDockerDocker ComposeDockerfileMySQLPostgreSQLdata-engineeringdata-engineering-pipelinepysparkprocessingStreamlitpolarsYouTubeyoutube-apimetabase
Jupyter Notebook 20
7 个月前
https://static.github-zh.com/github_avatars/sanjeevai?size=40
sanjeevai / disaster-response-pipeline

ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event

etl-pipelinesupervised-learningdata-engineering-pipeline
Python 17
6 年前
https://static.github-zh.com/github_avatars/DarkStarStrix?size=40
DarkStarStrix / DataVolt

Reusable data engineering toolkit My personal data infrastructure

data-engineeringdata-engineering-pipelinedata-loadingperformancepreprocessinginfrastructure
Jupyter Notebook 17
7 天前
https://static.github-zh.com/github_avatars/Alero-Awani?size=40
Alero-Awani / Batch-data-engineering-project

A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.

airflowaws-s3data-engineering-pipelineDockerpipelinepysparkSQLTerraform
HCL 16
3 年前
https://static.github-zh.com/github_avatars/NitinDatta8?size=40
NitinDatta8 / realtime-data-streaming

End-to-end data engineering pipeline with various technologies to ingest real time data.

apache-airflowapache-kafkaApache Sparkbig-dataApache Cassandradata-engineeringdata-engineering-pipelinedata-processingDockeretl-pipelinePostgreSQL
Python 14
2 年前
https://static.github-zh.com/github_avatars/brunocampos01?size=40
brunocampos01 / predicting-retail-churn-with-azure-ml-studio

#计算机科学#Challenge to job: Data Scientist

Azuremachine-learning-studiochallengePythonpandas机器学习APIdata-engineeringdata-engineering-pipelinepowerbicheat-sheets数据科学
Python 14
3 年前
loading...