GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

Apache Spark

css logo

Apache Spark 是一个开源分布式通用集群计算框架。

相对于Hadoop的MapReduce会在执行完工作后将中介资料存放到磁盘中,Spark使用了存储器内运算技术,能在资料尚未写入硬盘时即在存储器内分析运算。Spark在存储器内执行程序的运算速度能做到比Hadoop MapReduce的运算速度快上100倍。

Created by Matei Zaharia

发布于 May 26, 2014

Repository
apache/spark
Website
spark.apache.org
Wikipedia
维基百科

相关主题

Scala
https://static.github-zh.com/github_avatars/apache?size=40
apache / spark

Apache Spark - 用于大数据处理的统一分析引擎

PythonScalaRJavabig-datajdbcSQLApache Spark
Scala 41.31 k
21 小时前
https://static.github-zh.com/github_avatars/DataTalksClub?size=40
DataTalksClub / data-engineering-zoomcamp

免费数据工程师视频课程,共9周课时

data-engineeringkafkaApache SparkdbtDockerkestra
Jupyter Notebook 31.01 k
2 个月前
https://static.github-zh.com/github_avatars/donnemartin?size=40
donnemartin / data-science-ipython-notebooks

#计算机科学#Python 数据科学学习笔记:深度学习 (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, 大数据 (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python 核心, AWS, Linux命令

Python机器学习深度学习数据科学big-dataAmazon Web ServicesTensorflowtheanocaffescikit-learnkaggleApache SparkmapreducehadoopmatplotlibpandasNumPySciPyKeras
Python 28.28 k
1 年前
https://static.github-zh.com/github_avatars/getredash?size=40
getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

redashPython可视化analyticsbiredshiftBigQueryathenaMySQLPostgreSQLdashboardJavaScriptbusiness-intelligencedatabricksApache Sparkspark-sqlHacktoberfest
Python 27.41 k
11 天前
https://static.github-zh.com/github_avatars/yeasy?size=40
yeasy / docker_practice

Docker — 从入门到实践

Dockerbookcloud-computingcontainerKubernetesswarmmesosApache SparkDevOpsLinux
Go 25.44 k
6 个月前
https://static.github-zh.com/github_avatars/mlflow?size=40
mlflow / mlflow

#计算机科学#MLflow 是一个开源框架,旨在管理整个机器学习生命周期。 它可以在不同的平台上训练模型并为模型提供服务,让你能够使用相同的一组工具,而不管试验是在计算机本地、远程计算目标上、虚拟机上

机器学习人工智能mlflowApache Sparkmodel-management
Python 20.83 k
3 天前
https://static.github-zh.com/github_avatars/heibaiying?size=40
heibaiying / BigData-Notes

大数据入门指南 ⭐

hadoophdfsYarnmapreducehiveApache SparkstormhbaseScalakafkazookeeperflumeazkabansqoopphoenixbigdatabig-data
Java 16.49 k
1 年前
https://static.github-zh.com/github_avatars/FavioVazquez?size=40
FavioVazquez / ds-cheatsheets

#速查表 cheatsheets#有关数据科学的 Cheatsheets

datasciencePythonRApache Spark编程Jupyter Notebookcheatsheet
15.44 k
1 年前
https://static.github-zh.com/github_avatars/GaiZhenbiao?size=40
GaiZhenbiao / ChuanhuChatGPT

川虎 ChatGTP,为ChatGPT/ChatGLM/LLaMA等多种LLM提供了一个轻快好用的Web图形界面

聊天机器人ChatGPT APIchatglmclaudeerniegeminigemmallamamidjourneyminimaxmossollamaqwenApache Sparkstablelm
Python 15.41 k
3 个月前
zhisheng17/flink-learning
https://static.github-zh.com/github_avatars/zhisheng17?size=40
zhisheng17 / flink-learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、...

flinkkafkaelasticsearchApache SparkRedisMySQLrocketmqhbaserabbitmqstream-processingstreamingclickhouselokiinfluxdbopentsdb
Java 14.82 k
3 个月前
https://static.github-zh.com/github_avatars/horovod?size=40
horovod / horovod

#计算机科学#Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Tensorflowuber机器学习mpibaidu深度学习KerasPyTorchmxnetApache Sparkray
Python 14.51 k
2 个月前
https://static.github-zh.com/github_avatars/aalansehaiyang?size=40
aalansehaiyang / technology-talk

【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让你成为更牛的自己!

JavaSpringSpring BootdubbokafkaGithbasemycatApache SparkECMAScript
14.5 k
2 年前
https://static.github-zh.com/github_avatars/deeplearning4j?size=40
deeplearning4j / deeplearning4j

Deeplearning4j 是为Java以及基于JVM编写的开源深度学习库,是广泛支持各种深度学习算法的运算框架。

Javagpu深度学习neural-netsdeeplearning4jdl4jhadoopApache SparkIntelliJ IDEA人工智能PythonScalaClojurelinear-algebramatrix-library
Java 14.01 k
1 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / doris

Doris 是百度开源的支持对海量大数据进行快速分析的MPP数据库。

olap数据库hadoophivehudiicebergreal-timeSQLBigQuerydbtdelta-lakeeltetllakehousequery-engineredshiftsnowflakeApache Spark
Java 13.81 k
1 天前
https://static.github-zh.com/github_avatars/wangzhiwubigdata?size=40
wangzhiwubigdata / God-Of-BigData

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

flinkApache Sparkhadoophdfshivehbasekafkazookeeperbigdataflumeazkaban
10.14 k
2 年前
https://static.github-zh.com/github_avatars/mage-ai?size=40
mage-ai / mage-ai

#计算机科学#🧙 Build, run, and manage data pipelines for integrating and transforming data.

机器学习人工智能datadata-engineering数据科学Pythoneltetlpipelinesdata-pipelinesorchestrationdata-integrationSQLApache Sparkdbtpipelinereverse-etltransformation
Python 8.37 k
2 天前
https://static.github-zh.com/github_avatars/delta-io?size=40
delta-io / delta

Delta Lake 是一个开源存储框架,可以使用 Spark、PrestoDB、Flink、Trino 和 Hive 等计算引擎以及适用于 Scala、Java、Rust、Ruby 和 Python 的 API 构建 Lakehouse 架构。

Apache Sparkacidbig-dataanalyticsdelta-lake
Scala 8.08 k
19 小时前
https://static.github-zh.com/github_avatars/tobymao?size=40
tobymao / sqlglot

Python SQL Parser and Transpiler

transpilerSQLPythonParseroptimizerBigQueryduckdbhiveMySQLPostgreSQLprestosnowflakeApache SparkSQLitetrinotsqlclickhouseredshiftdatabricks
Python 7.86 k
2 天前
https://static.github-zh.com/github_avatars/h2oai?size=40
h2oai / h2o-3

#计算机科学#H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Me...

h2o机器学习数据科学深度学习big-dataensemble-learninggbmrandom-forestnaive-bayespcaOpen SourcedistributedJavaPythonRhadoopApache Sparkgpuautoml
Jupyter Notebook 7.2 k
17 小时前
https://static.github-zh.com/github_avatars/Alluxio?size=40
Alluxio / alluxio

Alluxio作为数据编排层为大数据和人工智能工作负载带来速度和敏捷性并降低成本,使用户能够迁移到对象存储等更新的存储解决方案

alluxiomemory-speedhadoopApache SparkprestoTensorflow数据分析data-orchestrationvirtual-distributed-filesystem
Java 7 k
2 个月前
loading...