GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

synthetic-dataset-generation

Website
Wikipedia
argilla-io/distilabel
https://static.github-zh.com/github_avatars/argilla-io?size=40
argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

人工智能huggingface大语言模型openaiPythonrlhfsynthetic-datasynthetic-dataset-generation
Python 2.75 k
6 天前
https://static.github-zh.com/github_avatars/Eladlev?size=40
Eladlev / AutoPrompt

A framework for prompt tuning using Intent-based Prompt Calibration

prompt-engineeringprompt-tuningsynthetic-dataset-generation
Python 2.59 k
2 个月前
bespokelabsai/curator
https://static.github-zh.com/github_avatars/bespokelabsai?size=40
bespokelabsai / curator

#自然语言处理#Synthetic data curation for post-training and structured data extraction

synthetic-dataagents大语言模型promptPythonsynthetic-dataset-generation深度学习fine-tuninginstruction-tuning机器学习自然语言处理
Python 1.4 k
3 天前
datadreamer-dev/DataDreamer
https://static.github-zh.com/github_avatars/datadreamer-dev?size=40
datadreamer-dev / DataDreamer

#自然语言处理#DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

深度学习机器学习自然语言处理nlp-libraryPythonPyTorchtransformersalignmentfine-tuninggptinstruction-tuning大语言模型llmopsopenaisynthetic-datasynthetic-dataset-generation
Python 1.02 k
4 个月前
https://static.github-zh.com/github_avatars/Unity-Technologies?size=40
Unity-Technologies / com.unity.perception

#计算机科学#Perception toolkit for sim2real training and validation in Unity

perceptionobject-detectiondetection机器视觉深度学习synthetic-dataset-generationdomain-randomizationpose-estimation机器学习segmentation
C# 965
7 个月前
https://static.github-zh.com/github_avatars/BatsResearch?size=40
BatsResearch / bonito

#大语言模型#A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

大语言模型synthetic-datasynthetic-dataset-generationzero-shot-learningdomain-adaptationgpttask-adaptation
Python 778
4 个月前
https://static.github-zh.com/github_avatars/magpie-align?size=40
magpie-align / magpie

#自然语言处理#[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

alignmentllama2llama3大语言模型自然语言处理Bukkitphi3qwen2synthetic-datasynthetic-dataset-generationdatasetgemma
Python 712
3 个月前
https://static.github-zh.com/github_avatars/nicolas-hbt?size=40
nicolas-hbt / pygraft

#计算机科学#Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

data-generatorgraph-generatorknowledge-baseknowledge-graphontologyschemaSemantic Webontology-generationPythonsynthetic-datasynthetic-dataset-generationcontributions-welcomeowlRDF (Resource Description Framework)人工智能benchmarkinglinked-data机器学习semantics
Python 686
1 年前
https://static.github-zh.com/github_avatars/paulbricman?size=40
paulbricman / thisrepositorydoesnotexist

A curated list of awesome projects which use Machine Learning to generate synthetic content.

synthetic-datasynthetic-dataset-generationGenerative Adversarial Network
585
2 年前
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / Dataset_Synthesizer

#计算机科学#NVIDIA Deep learning Dataset Synthesizer (NDDS)

机器视觉深度学习synthetic-dataset-generationdomain-randomizationpose-estimationobject-detection
C++ 582
5 年前
https://static.github-zh.com/github_avatars/sparkfish?size=40
sparkfish / augraphy

#计算机科学#Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

data-augmentation深度神经网络training-data机器学习data-pipeline图像处理synthetic-datasynthetic-dataset-generation机器视觉
Python 425
3 个月前
https://static.github-zh.com/github_avatars/StacklokLabs?size=40
StacklokLabs / promptwright

#计算机科学#Generate large synthetic data using an LLM

人工智能数据科学datasethuggingface机器学习synthetic-datasynthetic-dataset-generation
Python 424
5 天前
https://static.github-zh.com/github_avatars/remyxai?size=40
remyxai / VQASynth

Compose multimodal datasets 🎹

multimodal-datasetsmultimodal-deep-learningsynthetic-dataset-generation
Python 403
5 天前
https://static.github-zh.com/github_avatars/Unity-Technologies?size=40
Unity-Technologies / SynthDet

#计算机科学#SynthDet - An end-to-end object detection pipeline using synthetic data

object-detectiondetection机器视觉深度学习synthetic-datasynthetic-dataset-generationdomain-randomizationpose-estimation机器学习
C# 381
6 个月前
https://static.github-zh.com/github_avatars/zhenzhiwang?size=40
zhenzhiwang / HumanVid

[NeurIPS D&B Track 2024] Official implementation of HumanVid

image-to-video-generationsynthetic-dataset-generationunreal-engine-5video-generation
Python 316
1 个月前
https://static.github-zh.com/github_avatars/Unity-Technologies?size=40
Unity-Technologies / PeopleSansPeople

#计算机科学#Unity's privacy-preserving human-centric synthetic data generator

机器视觉深度学习synthetic-datasynthetic-data-generationsynthetic-dataset-generationhuman-pose-estimationhuman-activity-recognitionperceptionUnitylabelingobject-detectionpose-estimationtransfer-learning
C# 312
1 年前
https://static.github-zh.com/github_avatars/tabularis-ai?size=40
tabularis-ai / be_great

#计算机科学#A novel approach for synthesizing tabular data using pretrained large language models

data-generation深度学习tabular-datatransformerssynthetic-datasynthetic-dataset-generation
Python 312
1 个月前
https://static.github-zh.com/github_avatars/tirthajyoti?size=40
tirthajyoti / pydbgen

Random dataframe and database table generator

数据库random-generationpandas-dataframesqlite3SQLitePythonsynthetic-data数据科学Generatordata-generationfake-datasynthetic-dataset-generation
Python 309
4 年前
https://static.github-zh.com/github_avatars/fjxmlzn?size=40
fjxmlzn / DoppelGANger

#数据仓库#[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

隐私time-seriestimeseriesGenerative Adversarial Networkganssynthetic-datasynthetic-dataset-generationsynthetic-data-generation数据集
Python 307
2 年前
https://static.github-zh.com/github_avatars/davanstrien?size=40
davanstrien / awesome-synthetic-datasets

#数据仓库#awesome synthetic (text) datasets

人工智能数据集大语言模型synthetic-dataAwesome Listssynthetic-dataset-generation
Jupyter Notebook 282
8 个月前
loading...