GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-preparation

Website
Wikipedia
https://static.github-zh.com/github_avatars/hi-primus?size=40
hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Sparkpysparkdata-wranglingbigdata数据科学data-cleansingdata-transformation机器学习data-profilingdata-extractiondata-exploration数据分析data-preparationcudfdaskdata-cleaning
Python 1.51 k
6 个月前
https://static.github-zh.com/github_avatars/skrub-data?size=40
skrub-data / skrub

#计算机科学#Machine learning with dataframes

机器学习数据科学data-cleaningdatadata-preparationdata-preprocessing数据分析data-wranglingdataframedataframes
Python 1.41 k
4 天前
https://static.github-zh.com/github_avatars/amphi-ai?size=40
amphi-ai / amphi-etl

Visual Data Preparation and Transformation. Low-Code Python-based ETL.

datadata-pipelinesetlstructured-dataunstructured-data数据分析数据科学data-preparation
TypeScript 1.07 k
20 天前
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / NeMo-Curator

#大语言模型#Scalable data pre processing and curation toolkit for LLMs

data-curation大语言模型datadata-prepdata-preparationdata-processingdata-qualitydatacurationdatarecipesEntity resolutionfine-tuninglarge-language-modelslarge-scale-data-processingllmappsPython
Jupyter Notebook 946
2 天前
https://static.github-zh.com/github_avatars/data-prep-kit?size=40
data-prep-kit / data-prep-kit

#大语言模型#Open source project for data preparation for GenAI applications

data-preparationfinetuning大语言模型llmappsdatadata-prepdata-preprocessingdata-preprocessing-pipelinesdatacurationlarge-language-modelslarge-scale-data-processingPythonrayApache SparkdatarecipesCode qualityEntity resolutionMalware
HTML 698
4 天前
https://static.github-zh.com/github_avatars/developmentseed?size=40
developmentseed / label-maker

#计算机科学#Data Preparation for Satellite Machine Learning

satellite-imagerydata-preparation深度学习机器视觉remote-sensingKeras
Python 467
2 年前
https://static.github-zh.com/github_avatars/PacktWorkshops?size=40
PacktWorkshops / The-Data-Science-Workshop

#计算机科学#A New, Interactive Approach to Learning Data Science

datasciencePythonregressionrandom-forest机器学习data-preparationfeature-engineeringdimensionality-reduction
Jupyter Notebook 229
3 年前
https://static.github-zh.com/github_avatars/pablo14?size=40
pablo14 / data-science-live-book

#学习与技能提升#An open source book to learn data science, data analysis and machine learning, suitable for all ages!

机器学习data-preparation数据科学big-data数据分析learningpredictive-modeling可视化统计analytics
TeX 222
1 年前
https://static.github-zh.com/github_avatars/hi-primus?size=40
hi-primus / bumblebee

#数据仓库#🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)

data-profilingdata-cleaningGUIdata-preparationPythondaskoptimusgpucudf数据集
Vue 141
2 年前
https://static.github-zh.com/github_avatars/whwu95?size=40
whwu95 / MVFNet

【AAAI'2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition

data-preparationmodel-zoovideo-understandingtemporal-modeling
Python 134
3 年前
https://static.github-zh.com/github_avatars/asavinov?size=40
asavinov / prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

workflowdata-processingmap-reduceApache SparkpandasPythonfeature-engineering数据科学data-wranglingdata-preprocessingdata-preparationbusiness-intelligenceolap
Python 91
4 年前
https://static.github-zh.com/github_avatars/sbcgua?size=40
sbcgua / mockup_loader

ABAP unit testing framework, prepare in Excel, reuse in abap code

data-preparationABAPsapTest automationUnit testingHacktoberfestTesting
ABAP 68
5 天前
https://static.github-zh.com/github_avatars/ruchikaverma-iitg?size=40
ruchikaverma-iitg / MoNuSAC

This repository contains my implementations of the algorithms which MoNuSAC participants could use for data preparation to train their models at ISBI 2020.

data-preparation
Jupyter Notebook 64
4 年前
https://static.github-zh.com/github_avatars/Sriram-PR?size=40
Sriram-PR / doc-scraper

#大语言模型#Go web crawler to scrape documentation sites and convert content to clean Markdown for LLM ingestion (RAG, training data).

data-preparation大语言模型web-scraper
Go 48
1 个月前
https://static.github-zh.com/github_avatars/soumyadip007?size=40
soumyadip007 / Data-Science-Using-Python-University-Course-Module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumP...

数据科学PythonJupyter Notebookknndata-processingdata-preprocessingdata-preparation数据可视化NumPyplotting
Jupyter Notebook 45
5 年前
https://static.github-zh.com/github_avatars/ashish-kamboj?size=40
ashish-kamboj / Market-Mix-Modeling

Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales

Rlinear-regressioneda数据可视化predictive-modelingdata-preparationfeature-engineeringfeature-selection
R 41
4 年前
https://static.github-zh.com/github_avatars/Kukuster?size=40
Kukuster / SumStatsRehab

GWAS summary statistics files QC tool

summary-statisticsdata-preparationdata-preprocessingdata-prepBioinformaticscomputational-biology
Python 40
6 个月前
https://static.github-zh.com/github_avatars/ELToulemonde?size=40
ELToulemonde / dataPreparation

Data preparation for data science projects.

data-preparation数据科学Rdata-preprocessingspeed
R 31
2 年前
https://static.github-zh.com/github_avatars/umich-dbgroup?size=40
umich-dbgroup / foofah

Foofah: programming-by-example data transformation program synthesizer

data-transformationdata-wranglingdata-preparationdata-cleaning
CSS 28
7 年前
https://static.github-zh.com/github_avatars/neuro-ml?size=40
neuro-ml / reskit

A library for creating and curating reproducible pipelines for scientific and industrial machine learning

Pythonpipelinedata-preparationscikit-learnreproducible-research
Jupyter Notebook 27
8 年前
loading...