dask · GitHub Topics

dask / dask

Parallel computing with task scheduling

dask Python pydata NumPy pandas scikit-learn SciPy

Python 13.37 k

1 天前

rapidsai / cudf

cuDF - GPU DataFrame Library

gpu rapids cudf arrow CUDA pandas dataframe dask 数据分析数据科学 pydata C++Python

C++ 9.09 k

1 天前

stumpy-dev / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

数据科学 time-series-analysis dask numba Python anomaly-detection pattern-matching pydata matrix-profile motif-discovery time-series-segmentation

Python 3.96 k

18 天前

pydata / xarray

N-D labeled arrays and datasets in Python

Python netcdf NumPy pandas xarray dask

Python 3.92 k

1 天前

mars-project / mars

#计算机科学#Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Python NumPy tensor pandas 机器学习 scikit-learn Tensorflow PyTorch xgboost lightgbm ray dataframe dask

Python 2.73 k

2 年前

jmcarpenter2 / swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

pandas pandas-dataframe parallel-computing parallelization dask modin

Python 2.62 k

1 年前

fugue-project / fugue

#计算机科学#A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Apache Spark dask 机器学习 distributed-systems distributed-computing distributed SQL pandas

Python 2.1 k

4 个月前

dask / distributed

A distributed task scheduler for Dask

pydata dask distributed-computing Python Hacktoberfest

Python 1.64 k

12 小时前

hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Spark pyspark data-wrangling bigdata 数据科学 data-cleansing data-transformation 机器学习 data-profiling data-extraction data-exploration 数据分析 data-preparation cudf dask data-cleaning

Python 1.52 k

8 个月前

narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

cudf pandas polars dask duckdb pyspark

Python 1.21 k

8 小时前

itamarst / eliot

Eliot: the logging system that tells you *why* it happened

Python Logging logging-library tracing causality twisted elasticsearch asyncio scientific-computing dask NumPy

Python 1.15 k

5 个月前

pytroll / satpy

Python package for earth-observing satellite data processing

Python satellite weather Hacktoberfest dask xarray closember

Python 1.13 k

2 天前

Nixtla / mlforecast

#计算机科学#Scalable machine 🤖 learning for time series forecasting.

forecast forecasting 机器学习 lightgbm xgboost dask Python time-series

Python 1.05 k

3 天前

capitalone / datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

Python pandas Apache Spark data 数据科学 compare dataframes NumPy pyspark dask polars snowflake

Python 586

6 天前

ranaroussi / pystore

Fast data store for Pandas time-series data

datastore dask parquet pandas timeseries 数据库 dataframe

Python 585

9 天前

polyaxon / traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

pandas dataframes 数据科学 Apache Spark dask plotly 统计 matplotlib data-profiling 数据可视化 data-exploration DataOps mlops data-quality data-quality-checks explainable-ai PyTorch Tensorflow tracking

Python 518

2 个月前

dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask

sql-server SQL dask Python distributed 机器学习

Python 407

1 年前

pytroll / pyresample

Geospatial image resampling in Python

Python NumPy resampling kd-tree Hacktoberfest dask xarray closember

Python 370

20 天前

Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.

Python xarray dask

Python 364

2 天前

DataCanvasIO / HyperGBM

A full pipeline AutoML tool for tabular data

automl gbm xgboost lightgbm catboost semi-supervised-learning datacleaning preprocessing ensemble-learning tabular-data distributed-training dask gpu-acceleration rapidsai scikit-learn

Python 354

3 个月前