GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-curation

Website
Wikipedia
https://static.github-zh.com/github_avatars/cleanlab?size=40
cleanlab / cleanlab

#数据仓库#The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

weak-supervisiondata-cleaningdata-quality数据科学noisy-labelsdata-centric-aiout-of-distribution-detectionoutlier-detectionactive-learningdata-labelingdata-profilingdata-validationlabelingdata-curationannotationDataOpsdataquality大语言模型数据集exploratory-data-analysis
Python 10.61 k
12 天前
voxel51/fiftyone
https://static.github-zh.com/github_avatars/voxel51?size=40
voxel51 / fiftyone

#计算机科学#Refine high-quality datasets and visual AI models

机器学习人工智能深度学习机器视觉developer-tools数据科学Pythonactive-learningdata-centric-aidata-cleaningdata-curationdata-qualityimage-classificationobject-detectionunstructured-datavector-search可视化
Python 9.59 k
14 小时前
Docta-ai/docta
https://static.github-zh.com/github_avatars/Docta-ai?size=40
Docta-ai / docta

A Doctor for your data

datadata-centric-aidata-centric-machine-learningdata-curationdata-diagnosislanguage-modelrlhf
Python 3.31 k
5 个月前
https://static.github-zh.com/github_avatars/visual-layer?size=40
visual-layer / fastdup

#计算机科学#fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data...

data-curationdataset深度学习机器学习object-detectionoutlier-detectionPythonvisual-searchdata-augmentationimage-classificationImage图像处理visualization-toolsimage-analysis可视化
Python 1.69 k
5 个月前
https://static.github-zh.com/github_avatars/Renumics?size=40
Renumics / spotlight

#计算机科学#Interactively explore unstructured datasets from your dataframe.

data-centric-aidata-curation数据可视化机器视觉机器学习audioexploratory-data-analysisImagetimeseriesVideomeshesunstructured-dataHacktoberfest
TypeScript 1.18 k
4 天前
daochenzha/data-centric-AI
https://static.github-zh.com/github_avatars/daochenzha?size=40
daochenzha / data-centric-AI

#计算机科学#A curated, but incomplete, list of data-centric AI resources.

人工智能data-centric-ai机器学习data-curationdata-centricdata-centric-machine-learning数据科学data-qualitydata-engineering
1.11 k
1 年前
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / NeMo-Curator

#大语言模型#Scalable data pre processing and curation toolkit for LLMs

data-curation大语言模型datadata-prepdata-preparationdata-processingdata-qualitydatacurationdatarecipesEntity resolutionfine-tuninglarge-language-modelslarge-scale-data-processingllmappsPython
Jupyter Notebook 946
2 天前
https://static.github-zh.com/github_avatars/Renumics?size=40
Renumics / awesome-open-data-centric-ai

#自然语言处理#Curated list of open source tooling for data-centric AI on unstructured data.

Awesome Listsdata-centric-aidata-curationdata-versioning数据可视化explainable-aiactive-learningfeature-vectorrobust-machine-learningbias-detection机器视觉data-drift深度学习自然语言处理noisy-labelsoutlier-detectionsynthetic-datauncertainty-estimation机器学习
718
2 年前
https://static.github-zh.com/github_avatars/UCSC-REAL?size=40
UCSC-REAL / DS2

[ICLR 2025] Improving Data Efficiency via Curating LLM-Driven Rating Systems

instruction-tuninglarge-language-modelsdata-curation
Python 96
3 个月前
https://static.github-zh.com/github_avatars/getmetamapper?size=40
getmetamapper / metamapper

Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.

data-catalogdata-discoverydata-warehousePythonDjangodata-curationmetadata
Python 79
4 天前
https://static.github-zh.com/github_avatars/Renumics?size=40
Renumics / sliceguard

#计算机科学#A library for detecting problematic data segments in structured and unstructured data with few lines of code.

数据分析data-cleaningdata-curationdata-exploration数据科学数据可视化深度学习exploratory-data-analysis机器学习Python可视化eda
Python 64
1 年前
https://static.github-zh.com/github_avatars/LaureBerti?size=40
LaureBerti / Learn2Clean

Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning

reinforcement-learningdata-cleaningautomateddata-curationdata-preprocessing
Python 51
2 年前
https://static.github-zh.com/github_avatars/whythawk?size=40
whythawk / data-as-a-science

Lesson guide and textbook for "Data as a Science" course.

数据科学syllabusjupyter-notebooksdata-curation数据分析
Jupyter Notebook 47
4 年前
https://static.github-zh.com/github_avatars/x-CK-x?size=40
x-CK-x / Dataset-Curation-Tool

A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well a...

captioning-videosdata-curation下载器tagging
Python 36
5 个月前
https://static.github-zh.com/github_avatars/Digital-Dermatology?size=40
Digital-Dermatology / SelfClean

#计算机科学#🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors (NeurIPS'24).

data-centric-aidata-cleaningself-supervised-learning深度学习机器学习data-curation
Python 34
3 个月前
https://static.github-zh.com/github_avatars/cleanlab?size=40
cleanlab / cleanlab-studio

#自然语言处理#Client interface to Cleanlab Studio and the Trustworthy Language Model

annotationsautoml机器视觉data-centric-aidata-labelingdata-profilingdata-quality数据科学data-validation机器学习自然语言处理noisy-labelsdata-cleaningoutlier-detectiondata-curationimage-classificationtext-classificationstructured-datamodel-deployment大语言模型
Python 32
4 个月前
https://static.github-zh.com/github_avatars/brainlife?size=40
brainlife / ezbids

A web service for semi-automated conversion of raw imaging data to BIDS

mrineuroimagingWebbidsdicominteroperabilitybrain-imagingdata-curation
Vue 31
2 个月前
https://static.github-zh.com/github_avatars/iwangjian?size=40
iwangjian / TopDial

Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)

data-curationdialogue-systemspersonalization
Python 30
1 年前
https://static.github-zh.com/github_avatars/PennLINC?size=40
PennLINC / CuBIDS

Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.

neuroimagingpython-packagedata-curationneuroscienceneuroscience-methods
Python 25
7 天前
https://static.github-zh.com/github_avatars/mcsorkun?size=40
mcsorkun / AqSolDB

AqSolDB: A curated aqueous solubility dataset contains 9.982 unique compounds.

data-curationdatasetcheminformatics
Python 22
5 年前
loading...