GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

dataset-generation

Website
Wikipedia
Kiln-AI/Kiln
https://static.github-zh.com/github_avatars/Kiln-AI?size=40
Kiln-AI / Kiln

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能chain-of-thoughtcollaborationdataset-generationfine-tuning机器学习macOSollamaopenaipromptprompt-engineeringPythonrlhfsynthetic-dataWindowsevalsevaluation
Python 3.86 k
10 小时前
https://static.github-zh.com/github_avatars/e-p-armstrong?size=40
e-p-armstrong / augmentoolkit

Create Custom LLMs

人工智能dataset-generationfinetuning-llms
Python 1.65 k
1 天前
nfstream/nfstream
https://static.github-zh.com/github_avatars/nfstream?size=40
nfstream / nfstream

#计算机科学#NFStream: a Flexible Network Data Analysis Framework.

数据科学数据分析data-miningnetwork-analysisnetwork-securitynetwork-monitoringCybersecurity机器学习人工智能dataset-generationdeep-packet-inspectionnetflowtraffic-analysispcappacket-capturepacket-analyserPythonndpi
Python 1.16 k
1 年前
https://static.github-zh.com/github_avatars/aitorzip?size=40
aitorzip / DeepGTAV

#计算机科学#A plugin for GTAV that transforms it into a vision-based self-driving car research environment.

dataset-generationreinforcement-learninggtav深度学习self-driving-car
C++ 1.14 k
5 年前
https://static.github-zh.com/github_avatars/rodrigopivi?size=40
rodrigopivi / Chatito

#自然语言处理#🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

nludataset自然语言处理text-classificationnamed-entity-recognitionnlgdataset-generation聊天机器人chatbots
TypeScript 884
2 年前
https://static.github-zh.com/github_avatars/aqeelanwar?size=40
aqeelanwar / MaskTheFace

#人脸识别#Convert face dataset to masked dataset

face-recognitiondataset-generationhacktoberfest2020dataset
Python 595
2 年前
https://static.github-zh.com/github_avatars/DIYer22?size=40
DIYer22 / bpycv

#计算机科学#Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)

blender机器视觉深度学习instance-segmentationdepthdataset-generation6dof-poseblender-python
Python 484
1 年前
https://static.github-zh.com/github_avatars/HeegyuKim?size=40
HeegyuKim / open-korean-instructions

언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.

datasetdataset-generationinstructionslanguage-model
Python 423
3 个月前
https://static.github-zh.com/github_avatars/remyxai?size=40
remyxai / VQASynth

Compose multimodal datasets 🎹

dataset-generationmultimodal-datasetsmultimodal-deep-learningsynthetic-dataset-generation
Python 421
20 天前
https://static.github-zh.com/github_avatars/SimGus?size=40
SimGus / Chatette

#自然语言处理#A powerful dataset generator for Rasa NLU, inspired by Chatito

聊天机器人dataset-generationPythonchatbots自然语言处理rasanlubotkit命令行界面sentenceParsingnlg
Python 320
4 年前
https://static.github-zh.com/github_avatars/fjxmlzn?size=40
fjxmlzn / DoppelGANger

#数据仓库#[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

dataset-generation隐私time-seriestimeseriesGenerative Adversarial Networkganssynthetic-datasynthetic-dataset-generationsynthetic-data-generation数据集
Python 306
2 年前
https://static.github-zh.com/github_avatars/radi-cho?size=40
radi-cho / datasetGPT

A command-line interface to generate textual and conversational datasets with LLMs.

命令行界面dataset-generationlarge-language-modelsPython
Python 299
2 年前
https://static.github-zh.com/github_avatars/facebookresearch?size=40
facebookresearch / stopes

#计算机科学# A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

datasetdataset-generation机器学习nmttranslationmachine-translation
Python 280
5 个月前
https://static.github-zh.com/github_avatars/davidmartinrius?size=40
davidmartinrius / speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

audio-analysisaudio-processingdataset-generationspeech-recognitionspeech-to-texttext-to-speech
Python 245
1 年前
https://static.github-zh.com/github_avatars/ylogx?size=40
ylogx / aesthetics

#数据仓库#Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader

图像处理Imageaestheticavalivedataset数据集dataset-generationdataset-creation
Python 226
2 年前
https://static.github-zh.com/github_avatars/jabberjabberjabber?size=40
jabberjabberjabber / ImageIndexer

Creates an index of images, queries a local LLM and adds tags to the image metadata

人工智能dataset-generationexif-metadataexiftoolimage-classification图像处理image-recognitionkeywordslarge-language-modelsllamacpplocalmultimodaltags
Python 222
12 天前
https://static.github-zh.com/github_avatars/google?size=40
google / imageinwords

Data release for the ImageInWords (IIW) paper.

evaluationimage-captioningimage-to-textdatasetdataset-generation
JavaScript 215
7 个月前
https://static.github-zh.com/github_avatars/firmai?size=40
firmai / datagene

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

dataset-generationsynthetic-datasynthetic-dataset-generationencodingfinance数据结构model-checkingTestingsimilarity-measures
Jupyter Notebook 204
3 年前
https://static.github-zh.com/github_avatars/pprp?size=40
pprp / voc2007_for_yolo_torch

👊 Prepare VOC format datasets for ultralytics/yolov3 & yolov5

vocdataset-generationyolov3
Python 198
2 年前
https://static.github-zh.com/github_avatars/ZhangYuanhan-AI?size=40
ZhangYuanhan-AI / Bamboo

[IJCV] Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.

active-learningdataset-generationpre-training
Python 177
1 年前
loading...