GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-cleaning

Website
Wikipedia
https://static.github-zh.com/github_avatars/cleanlab?size=40
cleanlab / cleanlab

#数据仓库#The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

weak-supervisiondata-cleaningdata-quality数据科学noisy-labelsdata-centric-aiout-of-distribution-detectionoutlier-detectionactive-learningdata-labelingdata-profilingdata-validationlabelingdata-curationannotationDataOpsdataquality大语言模型数据集exploratory-data-analysis
Python 10.61 k
12 天前
voxel51/fiftyone
https://static.github-zh.com/github_avatars/voxel51?size=40
voxel51 / fiftyone

#计算机科学#Refine high-quality datasets and visual AI models

机器学习人工智能深度学习机器视觉developer-tools数据科学Pythonactive-learningdata-centric-aidata-cleaningdata-curationdata-qualityimage-classificationobject-detectionunstructured-datavector-search可视化
Python 9.59 k
13 小时前
johnkerl/miller
https://static.github-zh.com/github_avatars/johnkerl?size=40
johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

data-processingdata-cleaningCSVcsv-formatstreaming-datastreaming-algorithmstsvJSONjson-datadata-reduction统计statistical-analysisDevOpsdevops-toolstabular-data命令行界面command-line-tools
Go 9.32 k
2 天前
https://static.github-zh.com/github_avatars/unionai-oss?size=40
unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

pandasvalidationschemadataframesTestingpandas-dataframedata-validationdata-cleaningassertionshypothesis-testingdata-processing
Python 3.85 k
4 天前
https://static.github-zh.com/github_avatars/justmarkham?size=40
justmarkham / pandas-videos

Jupyter notebook and datasets from the pandas video series

数据科学Jupyter NotebookPythonpandas教程数据分析data-cleaning
Jupyter Notebook 2.21 k
1 年前
https://static.github-zh.com/github_avatars/justmarkham?size=40
justmarkham / DAT8

#自然语言处理#General Assembly's 2015 Data Science course in Washington, DC

数据科学机器学习scikit-learn数据分析pandasJupyter NotebookPythoncourselinear-regressionlogistic-regressionnaive-bayes自然语言处理decision-treesensemble-learningclusteringRegular expressionweb-scraping数据可视化data-cleaning
Jupyter Notebook 1.61 k
1 年前
https://static.github-zh.com/github_avatars/hi-primus?size=40
hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Sparkpysparkdata-wranglingbigdata数据科学data-cleansingdata-transformation机器学习data-profilingdata-extractiondata-exploration数据分析data-preparationcudfdaskdata-cleaning
Python 1.51 k
6 个月前
https://static.github-zh.com/github_avatars/sfirke?size=40
sfirke / janitor

simple tools for data cleaning in R

data-cleaning数据科学数据分析Rpivot-tablesexceltidyverse
R 1.41 k
6 个月前
https://static.github-zh.com/github_avatars/skrub-data?size=40
skrub-data / skrub

#计算机科学#Machine learning with dataframes

机器学习数据科学data-cleaningdatadata-preparationdata-preprocessing数据分析data-wranglingdataframedataframes
Python 1.41 k
4 天前
https://static.github-zh.com/github_avatars/data-forge?size=40
data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

data-wranglingdata-forgedata数据分析JavaScriptNode.jslinqpandas可视化数据可视化data-managementdata-manipulationdata-cleaningdata-cleansingCSVJSON
TypeScript 1.36 k
2 个月前
https://static.github-zh.com/github_avatars/ECNU-ICALK?size=40
ECNU-ICALK / EduChat

#大语言模型#An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

bellechinese-nlpdata-cleaning教学llama大语言模型mossopen-models
Jupyter Notebook 799
1 个月前
https://static.github-zh.com/github_avatars/akanz1?size=40
akanz1 / klib

Easy to use Python library of customized functions for cleaning and analyzing data.

数据科学数据分析数据可视化Pythonfeature-selectiondata-cleaningdata-preprocessing
Python 514
1 个月前
https://static.github-zh.com/github_avatars/schema-inspector?size=40
schema-inspector / schema-inspector

Schema-Inspector is a simple JavaScript object sanitization and validation module.

JavaScriptvalidationSanitizationdata-cleaning
JavaScript 504
6 个月前
https://static.github-zh.com/github_avatars/encord-team?size=40
encord-team / encord-active

#计算机科学#The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.

机器视觉data数据科学data-validation深度学习机器学习mlopsPythonactive-learningannotationsdata-centricdata-cleaningdata-qualitylabel-errorsnoisy-labelsobject-detection
Python 449
24 天前
https://static.github-zh.com/github_avatars/data-cleaning?size=40
data-cleaning / validate

Professional data validation for the R environment

Rvalidationdata-cleaning
R 421
2 个月前
https://static.github-zh.com/github_avatars/Desbordante?size=40
Desbordante / desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...

data-analyticsdata-cleaningdata-cleansingdata-engineeringdata-explorationdata-miningdata-profiling数据科学data-wranglingdata-preprocessingfeature-selectionfeature-engineeringfeature-extractionSpreadsheettabular-dataanomaly-detectionexploratory-data-analysisknowledge-discovery
C++ 405
11 天前
https://static.github-zh.com/github_avatars/jim-schwoebel?size=40
jim-schwoebel / voicebook

#计算机科学#🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

voicevoice-assistantvoice-recognitiontranscriptiondatadata-cleaning可视化generationvoice-activity-detectionvoice-controlServer安全encryption-decryptionPython机器学习wake-word-detection
Python 382
3 年前
https://static.github-zh.com/github_avatars/msamogh?size=40
msamogh / nonechucks

#计算机科学#Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

PyTorchdata-processingdata-preprocessingdata-pipelinedata-cleaningpreprocessing机器学习torch
Python 377
3 年前
https://static.github-zh.com/github_avatars/rasgointelligence?size=40
rasgointelligence / feature-engineering-tutorials

#计算机科学#Data Science Feature Engineering and Selection Tutorials

notebook教程Pythonpandas数据科学机器学习scikit-learnfeature-engineeringfeature-selectionfeaturesxgboostpandas-profilingJupyter Notebookexploratory-data-analysisdata-cleaning
Jupyter Notebook 286
4 个月前
https://static.github-zh.com/github_avatars/ajaymache?size=40
ajaymache / data-analysis-using-python

Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊

数据科学数据分析数据可视化data-cleaningdata-cleansingdata-wranglingdata-analyticsedaexploratory-data-analysiskaggle-competition
Jupyter Notebook 225
6 年前
loading...