GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

pyspark

Website
Wikipedia
https://static.github-zh.com/github_avatars/ibis-project?size=40
ibis-project / ibis

the portable Python dataframe library

Pythonimpalapandas数据库clickhousePostgreSQLSQLiteMySQLdatafusionSQLpysparkduckdbBigQuerysql-serverpolarssnowflaketrino
Python 5.85 k
3 天前
microsoft/SynapseML
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / SynapseML

#计算机科学#Simple and Distributed Machine Learning

Apache SparkpysparkAzureScalaMicrosoft机器学习databrickscognitive-serviceslightgbmHTTPmodel-deployment深度学习人工智能数据科学synapsebig-dataonnxOpenCV
Scala 5.14 k
3 天前
JohnSnowLabs/spark-nlp
https://static.github-zh.com/github_avatars/JohnSnowLabs?size=40
JohnSnowLabs / spark-nlp

#自然语言处理#State of the Art Natural Language Processing

自然语言处理Apache Sparkpysparknamed-entity-recognitionsentiment-analysislemmatizerspell-checkerentity-extractionpart-of-speech-taggerberttransformersTensorflowlanguage-detectionmachine-translationtext-classification大语言模型question-answeringllamacpponnx
Scala 3.99 k
3 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / linkis

Linkis 在上层应用和底层引擎之间构建了一层计算中间件。通过使用Linkis 提供的REST/WebSocket/JDBC 等标准接口,上层应用可以方便地连接访问Spark, Presto, Flink 等底层引擎,同时实现跨引擎上下文共享、统一的计算任务和引擎治理与编排能力

SQLApache Sparkhivepysparklivylinkisenginestorageresource-managerapplication-managerscriptisREST APIthrift-serverjdbcprestoimpala
Java 3.37 k
18 天前
https://static.github-zh.com/github_avatars/AlexIoannides?size=40
AlexIoannides / pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

pysparketl-jobPythondata-engineeringApache Spark数据科学etletl-pipeline
Python 1.93 k
2 年前
https://static.github-zh.com/github_avatars/uber?size=40
uber / petastorm

#计算机科学#Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, a...

TensorflowPyTorch深度学习机器学习pysparkparquet
Python 1.84 k
2 年前
https://static.github-zh.com/github_avatars/awesome-spark?size=40
awesome-spark / awesome-spark

A curated list of awesome Apache Spark packages and resources.

Apache SparkpysparkAwesome Lists
Shell 1.8 k
8 个月前
https://static.github-zh.com/github_avatars/jadianes?size=40
jadianes / spark-py-notebooks

#计算机科学#Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Apache SparkPythonpyspark数据分析Jupyter NotebooknotebookIPython数据科学机器学习big-databigdata
Jupyter Notebook 1.65 k
1 年前
https://static.github-zh.com/github_avatars/hi-primus?size=40
hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Sparkpysparkdata-wranglingbigdata数据科学data-cleansingdata-transformation机器学习data-profilingdata-extractiondata-exploration数据分析data-preparationcudfdaskdata-cleaning
Python 1.51 k
6 个月前
https://static.github-zh.com/github_avatars/ptyadana?size=40
ptyadana / SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

SQLMySQLexercises数据分析PostgreSQLSQLitetableauchallengessql-queriesPythonpysparkApache Spark
Jupyter Notebook 1.46 k
3 年前
https://static.github-zh.com/github_avatars/jupyter-incubator?size=40
jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Apache SparkKernelclusterlivymagicsql-querypandas-dataframeJupyter Notebookpysparkkerberosnotebook
Python 1.36 k
19 天前
https://static.github-zh.com/github_avatars/logicalclocks?size=40
logicalclocks / hopsworks

#计算机科学#Hopsworks - Data-Intensive AI platform with a Feature Store

feature-storeAmazon Web ServicesAzure数据科学feature-engineeringfeature-managementGoogle 云governancekserve机器学习mlopsmodel-servingpysparkPythonServerless
Java 1.23 k
4 个月前
https://static.github-zh.com/github_avatars/mahmoudparsian?size=40
mahmoudparsian / pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark

big-databig-data-analyticspysparkApache Sparkdataframes
Jupyter Notebook 1.23 k
21 天前
https://static.github-zh.com/github_avatars/narwhals-dev?size=40
narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

cudfpandaspolarsdaskduckdbpyspark
Python 1.11 k
3 天前
https://static.github-zh.com/github_avatars/mahmoudparsian?size=40
mahmoudparsian / data-algorithms-book

#计算机科学# MapReduce, Spark, Java, and Scala for Data Algorithms Book

hadoop-mapreduceJavadistributed-computingScalamapreducePython机器学习pysparkApache Sparkdesign-patterns
Java 1.08 k
8 个月前
https://static.github-zh.com/github_avatars/h2oai?size=40
h2oai / sparkling-water

#计算机科学#Sparkling Water provides H2O functionality inside Spark cluster

h2oApache Spark机器学习integrationbig-datapysparkScala
Scala 974
7 个月前
https://static.github-zh.com/github_avatars/WeBankFinTech?size=40
WeBankFinTech / Scriptis

#编辑器#Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

huezeppelinApache SparkhiveSQLpysparkScalaidehqllinkis
Vue 811
6 个月前
https://static.github-zh.com/github_avatars/lyhue1991?size=40
lyhue1991 / eat_pyspark_in_10_days

pyspark🍒🥭 is delicious,just eat it!😋😋

Apache Sparkpyspark
Python 805
3 年前
https://static.github-zh.com/github_avatars/HariSekhon?size=40
HariSekhon / DevOps-Python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML...

cloudformationPythonhbaseJSONavroparquetApache Sparkpysparktravis-cielasticsearchsolrhadoophdfsdockerhubDockerLinuxAmazon Web ServicesDevOpsGoogle 云gcf
Python 798
2 个月前
https://static.github-zh.com/github_avatars/kuwala-io?size=40
kuwala-io / kuwala

#网络爬虫#Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as ...

datadata-integration数据科学Open Dataspatial-analysiseltOpen SourcescrapingdbtPostgreSQLpysparkPythonJupyter NotebookpopulationReact无代码react-flow
JavaScript 795
3 年前
loading...