GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-centric-ai

Website
Wikipedia
https://static.github-zh.com/github_avatars/cleanlab?size=40
cleanlab / cleanlab

#数据仓库#The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

weak-supervisiondata-cleaningdata-quality数据科学noisy-labelsdata-centric-aiout-of-distribution-detectionoutlier-detectionactive-learningdata-labelingdata-profilingdata-validationlabelingdata-curationannotationDataOpsdataquality大语言模型数据集exploratory-data-analysis
Python 10.61 k
12 天前
voxel51/fiftyone
https://static.github-zh.com/github_avatars/voxel51?size=40
voxel51 / fiftyone

#计算机科学#Refine high-quality datasets and visual AI models

机器学习人工智能深度学习机器视觉developer-tools数据科学Pythonactive-learningdata-centric-aidata-cleaningdata-curationdata-qualityimage-classificationobject-detectionunstructured-datavector-search可视化
Python 9.59 k
14 小时前
Docta-ai/docta
https://static.github-zh.com/github_avatars/Docta-ai?size=40
Docta-ai / docta

A Doctor for your data

datadata-centric-aidata-centric-machine-learningdata-curationdata-diagnosislanguage-modelrlhf
Python 3.31 k
5 个月前
code-kern-ai/refinery
https://static.github-zh.com/github_avatars/code-kern-ai?size=40
code-kern-ai / refinery

#自然语言处理#The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

annotationsdata-centric-aidata-labeling深度学习labelinglabeling-tool机器学习自然语言处理neural-searchtext-annotationtransformersPythonhuman-in-the-loopspaCy人工智能数据科学text-classificationactive-learningsupervised-learning
Python 1.45 k
6 个月前
https://static.github-zh.com/github_avatars/Renumics?size=40
Renumics / spotlight

#计算机科学#Interactively explore unstructured datasets from your dataframe.

data-centric-aidata-curation数据可视化机器视觉机器学习audioexploratory-data-analysisImagetimeseriesVideomeshesunstructured-dataHacktoberfest
TypeScript 1.18 k
4 天前
https://static.github-zh.com/github_avatars/HazyResearch?size=40
HazyResearch / data-centric-ai

#计算机科学#Resources for Data Centric AI

机器学习人工智能data-centric-ai
TeX 1.12 k
2 年前
daochenzha/data-centric-AI
https://static.github-zh.com/github_avatars/daochenzha?size=40
daochenzha / data-centric-AI

#计算机科学#A curated, but incomplete, list of data-centric AI resources.

人工智能data-centric-ai机器学习data-curationdata-centricdata-centric-machine-learning数据科学data-qualitydata-engineering
1.11 k
1 年前
https://static.github-zh.com/github_avatars/cleanlab?size=40
cleanlab / cleanvision

#计算机科学#Automatically find issues in image datasets and practice data-centric computer vision.

机器视觉data-centric-aidata-explorationdata-qualitydata-validation深度学习exploratory-data-analysisimage-analysisimage-classificationimage-generationimage-qualityimage-segmentationdata-profiling数据科学
Python 1.09 k
2 个月前
https://static.github-zh.com/github_avatars/Renumics?size=40
Renumics / awesome-open-data-centric-ai

#自然语言处理#Curated list of open source tooling for data-centric AI on unstructured data.

Awesome Listsdata-centric-aidata-curationdata-versioning数据可视化explainable-aiactive-learningfeature-vectorrobust-machine-learningbias-detection机器视觉data-drift深度学习自然语言处理noisy-labelsoutlier-detectionsynthetic-datauncertainty-estimation机器学习
718
2 年前
https://static.github-zh.com/github_avatars/dcai-course?size=40
dcai-course / dcai-lab

#计算机科学#Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

coursedata-centric-ai数据科学深度学习homeworklab机器学习
Jupyter Notebook 458
4 个月前
https://static.github-zh.com/github_avatars/gszfwsb?size=40
gszfwsb / NCFM

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).

synthetic-data机器视觉data-centric-ai
Python 368
8 天前
https://static.github-zh.com/github_avatars/GAIR-NLP?size=40
GAIR-NLP / ProX

#大语言模型#[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale

data-centric-aidata-quality大语言模型pre-trainingllamamistral
Python 249
11 天前
https://static.github-zh.com/github_avatars/JieyuZ2?size=40
JieyuZ2 / wrench

#自然语言处理#[NeurIPS 2021] WRENCH: Weak supeRvision bENCHmark

weak-supervisiondata-centric-aibenchmark-framework机器学习dataset自然语言处理sequence-labeling深度学习
Python 223
1 年前
https://static.github-zh.com/github_avatars/yueyu1030?size=40
yueyu1030 / AttrPrompt

#自然语言处理#[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.

data-centric-ailarge-language-models自然语言处理pretrained-language-modeltext-classificationzero-shot-learning
Python 150
2 年前
https://static.github-zh.com/github_avatars/aai-institute?size=40
aai-institute / pyDVL

#计算机科学#pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

机器学习game-theorydata-qualityrobust-machine-learningdata-centric-aidata-cleaning
Python 130
1 个月前
https://static.github-zh.com/github_avatars/dcai-course?size=40
dcai-course / dcai-course

#计算机科学#Introduction to Data-Centric AI, MIT IAP 2023 🤖

coursedata-centric-ai数据科学深度学习机器学习
CSS 100
4 个月前
https://static.github-zh.com/github_avatars/opendataval?size=40
opendataval / opendataval

#计算机科学#OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)

机器学习Pythonresearch统计data-centric-aidata-cleaninggame-theory
Python 98
4 个月前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / DiverseEvol

#自然语言处理#Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

data-centric-aiinstruction-tuninglarge-language-models自然语言处理efficiency
Python 81
2 年前
https://static.github-zh.com/github_avatars/SJTU-DMTai?size=40
SJTU-DMTai / awesome-ml-data-quality-papers

#计算机科学#Papers about training data quality management for ML models.

data-managementdata-profilingdata-quality机器学习data-centric-ai
77
23 天前
https://static.github-zh.com/github_avatars/NextBrain-ai?size=40
NextBrain-ai / nbsynthetic

nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasets

data-augmentationdata-centric-aiGenerative Adversarial Networksynthetic-datasynthetic-dataset-generation
Jupyter Notebook 68
2 年前
loading...