mapreduce · GitHub Topics

donnemartin / data-science-ipython-notebooks

#计算机科学#Python 数据科学学习笔记：深度学习 (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, 大数据 (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python 核心, AWS, Linux命令

Python 机器学习深度学习数据科学 big-data Amazon Web Services Tensorflow theano caffe scikit-learn kaggle Apache Spark mapreduce hadoop matplotlib pandas NumPy SciPy Keras

Python 28.53 k

1 年前

heibaiying / BigData-Notes

大数据入门指南 ⭐

hadoop hdfs Yarn mapreduce hive Apache Spark storm hbase Scala kafka zookeeper flume azkaban sqoop phoenix bigdata big-data

Java 16.65 k

2 年前

PowerJob / PowerJob

新一代分布式任务调度与计算框架，支持CRON、API、固定频率、固定延迟等调度策略，提供工作流来编排任务解决依赖关系

scheduler workflow distributed mapreduce Java cron job job-scheduler

Java 7.59 k

2 天前

douban / dpark

Python clone of Spark, a MapReduce alike framework in Python

bigdata mapreduce dpark stream-processing Apache Spark Python

Python 2.68 k

5 年前

collabH / bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

flink kafka hive mapreduce Apache Spark olap hadoop hbase debezium hdfs bigdata hudi

Shell 1.67 k

25 天前

water8394 / BigData-Interview

#面试#🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata Apache Spark kafka hbase flink hadoop hdfs mapreduce Yarn 面试

1.64 k

4 年前

mahmoudparsian / data-algorithms-book

#计算机科学# MapReduce, Spark, Java, and Scala for Data Algorithms Book

hadoop-mapreduce Java distributed-computing Scala mapreduce Python 机器学习 pyspark Apache Spark design-patterns

Java 1.08 k

1 年前

microsoft / Mobius

C# and F# language binding and extensions to Apache Spark

Apache Spark dataframe dataset streaming C#spark-streaming F#bigdata mapreduce

C# 939

2 年前

happyer / distributed-computing

distributed_computing include mapreduce kvstore etc.

raft mapreduce consistency

Go 843

5 年前

cdapio / cdap

An open source framework for building data analytic applications.

unified integration platform dataset mapreduce Apache Spark spark-streaming Java cdap Python middleware

Java 784

3 天前

bcongdon / corral

🐎 A serverless MapReduce framework written for AWS Lambda

aws-lambda mapreduce Serverless

Go 694

4 年前

sunnyandgood / BigData

💎🔥大数据学习笔记

hadoop hive hbase hdfs zookeeper sqoop mapreduce flume MySQL Linux Shell

Java 681

6 年前

grailbio / bigslice

A serverless cluster computing system for the Go programming language

cluster computing Go mapreduce bigdata 机器学习 etl

Go 555

2 年前

apache / uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

mapreduce shuffle Apache Spark remote-shuffle-service RSS tez

Java 427

3 天前

CamDavidsonPilon / tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Python estimate pyspark distributed-computing mapreduce

Python 400

2 年前

cubefs / compass

Compass is a task diagnosis platform for bigdata

bigdata Apache Spark hadoop flink mapreduce scheduler SQL airflow dolphinscheduler

Java 398

10 个月前

RedisGears / RedisGears

Dynamic execution framework for your Redis data

Redis mapreduce stream-processing analytics

Rust 379

17 天前

cwensel / cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

hadoop Java mapreduce tez

Java 352

5 个月前

datawhalechina / juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

bigdata hadoop hive hbase hdfs Apache Spark mapreduce

Python 332

16 天前

DigitalPebble / behemoth

#自然语言处理#Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

hadoop Java 自然语言处理 mapreduce

Java 283

7 年前