GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

emr-cluster

Website
Wikipedia
san089/goodreads_etl_pipeline
https://static.github-zh.com/github_avatars/san089?size=40
san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

etl-pipelineetl-frameworkApache Sparkapache-airflowairflowredshiftemr-clusterlivys3data-lakeschedulerdata-migrationdata-engineeringdata-engineering-pipelinePythonetl-job
Python 1.39 k
5 年前
https://static.github-zh.com/github_avatars/RubensZimbres?size=40
RubensZimbres / Repo-2019

BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

aws-rdsanomaly-detectionkeras-tensorflowsql-server树莓派Tensorflowmathematicapysparkemr-clusterbert-modelbert
Jupyter Notebook 139
4 年前
https://static.github-zh.com/github_avatars/aws-samples?size=40
aws-samples / aws-dbs-refarch-datalake

Reference Architectures for Datalakes on AWS

data-lakedata-analyticsemr-clustergluedata-catalogdata-transformation
HTML 79
5 年前
https://static.github-zh.com/github_avatars/immu0001?size=40
immu0001 / Udacity-Data-Engineer-nanodegree

Classwork projects and home works done through Udacity data engineering nano degree

Apache Spark数据分析big-dataetldata-pipelines数据科学data-lake-analyticss3-bucketemr-clusterredshiftdata-engineering-pipeline
Jupyter Notebook 74
2 年前
https://static.github-zh.com/github_avatars/cloudposse?size=40
cloudposse / terraform-aws-emr-cluster

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS

hcl2emremr-clusterTerraformterraform-awshadoophiveprestoApache Spark
HCL 74
4 天前
https://static.github-zh.com/github_avatars/dacort?size=40
dacort / demo-code

Bits of code I use during live demos

emr-clusteraws-cloudformation
Jupyter Notebook 31
6 个月前
https://static.github-zh.com/github_avatars/Wittline?size=40
Wittline / pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

Amazon Web Servicesemr-clusterPythonApache Sparkpysparkbig-data-analyticsbig-datadataengineeringec2-spotec2-spot-instances
Python 27
3 年前
https://static.github-zh.com/github_avatars/camposvinicius?size=40
camposvinicius / aws-etl

This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .c...

Apache SparkKubernetesairflowAmazon Web Servicesargocdathenaemrpysparkgluecatalog数据库PostgreSQLrdsetlpipelinedatadata-engineeremr-cluster
Smarty 17
3 年前
https://static.github-zh.com/github_avatars/dhiraa?size=40
dhiraa / spark-tpcds

Apache Spark TPC-DS benchmark setup with EMR launch setup

apachespark-sqlbenchmarkingemr-clusterAmazon Web Services
Smarty 17
3 年前
https://static.github-zh.com/github_avatars/minhky2185?size=40
minhky2185 / healthcare_data_pipeline

An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.

analyticsbig-datadatadata-engineeringdata-engineering-pipelinedata-lakeemr-clusterMySQLPostgreSQLpowerbis3Apache Spark可视化
Python 12
2 年前
https://static.github-zh.com/github_avatars/maelfabien?size=40
maelfabien / Cassandra-GDELT-Queries

A Cassandra Architecture for GDELT Database 🌍

big-dataScalaApache SparkAmazon Web ServiceszeppelinarchitectureApache Cassandraemr-cluster
Shell 11
6 年前
https://static.github-zh.com/github_avatars/Signiant?size=40
Signiant / dynamodb-emr-exporter

Uses EMR clusters to export dynamoDB tables to S3 and generates import steps

emr-clusterDockerAmazon Web Servicesdynamodb
Shell 11
3 年前
https://static.github-zh.com/github_avatars/xianwill?size=40
xianwill / spark-boilerplate

A boilerplate for spark projects with docker support for local development and scripts for emr support.

Apache Spark模板Dockeremremr-cluster
Scala 9
8 年前
https://static.github-zh.com/github_avatars/anthonywong611?size=40
anthonywong611 / Batch-ETL-with-AWS-EMR-and-MWAA

Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed d...

aws-cloudformationemr-clusters3-bucketairflowbatch-processing
Python 9
4 年前
https://static.github-zh.com/github_avatars/airscholar?size=40
airscholar / EMR-for-data-engineers

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS com...

Apache SparkAmazon Web Servicesaws-s3emr-cluster
Python 7
2 年前
https://static.github-zh.com/github_avatars/sjmiller8182?size=40
sjmiller8182 / Warehousing-Stock-Tweet-Data

A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.

big-datahiveAmazon Web Servicesstock-pricestweetshadoopemr-clusterX (Twitter)data-warehousePython
TSQL 7
5 年前
https://static.github-zh.com/github_avatars/bdoepf?size=40
bdoepf / aws-emr-prometheus

Amazon Web Servicesprometheusemremr-clusterApache Sparkapache-flink
HCL 4
4 年前
https://static.github-zh.com/github_avatars/Data-Bishop?size=40
Data-Bishop / Team5-BuildItAll-Data-Platform

This repository contains the codebase for the BuildItAll Big Data Processing Platform, a case study project designed to manage large daily data for a hypothetical Belgian client.

airflowAmazon Web Servicesbig-dataemr-clusterpysparkTerraform
HCL 4
1 个月前
https://static.github-zh.com/github_avatars/tawounfouet?size=40
tawounfouet / data-scientist-ocr-x-centralsupelec

#计算机科学#Experience with time-series analysis and forecasting models, large data sets, model development and visualisation, statistics.

APIbigdataCI/CDcloudcloud-computingclustering深度学习emr-cluster机器学习pipelines3-bucketApache Sparksupervised-learningunsupervised-learningAmazon Web Services
Jupyter Notebook 3
1 年前
https://static.github-zh.com/github_avatars/HarshadRanganathan?size=40
HarshadRanganathan / aws-emr-launcher

Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)

Amazon Web Servicesemr-cluster
Python 3
3 年前
loading...