dataset-generation

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能 chain-of-thought collaboration dataset-generation fine-tuning 机器学习 macOS ollama openai prompt prompt-engineering Python rlhf synthetic-data Windows evals evaluation evaluation-framework mcp

Python 4.28 k

11 小时前

e-p-armstrong / augmentoolkit

Create Custom LLMs

人工智能 dataset-generation finetuning-llms

Python 1.75 k

19 天前

nfstream / nfstream

#计算机科学#NFStream: a Flexible Network Data Analysis Framework.

数据科学数据分析 data-mining network-analysis network-security network-monitoring Cybersecurity 机器学习人工智能 dataset-generation deep-packet-inspection netflow traffic-analysis pcap packet-capture packet-analyser Python ndpi

Python 1.17 k

1 年前

aitorzip / DeepGTAV

#计算机科学#A plugin for GTAV that transforms it into a vision-based self-driving car research environment.

dataset-generation reinforcement-learning gtav 深度学习 self-driving-car

C++ 1.16 k

6 年前

rodrigopivi / Chatito

#自然语言处理#🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

nlu dataset 自然语言处理 text-classification named-entity-recognition nlg dataset-generation 聊天机器人 chatbots

TypeScript 885

2 年前

aqeelanwar / MaskTheFace

#人脸识别#Convert face dataset to masked dataset

face-recognition dataset-generation hacktoberfest2020 dataset

Python 602

2 年前

DIYer22 / bpycv

#计算机科学#Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)

blender 机器视觉深度学习 instance-segmentation depth dataset-generation 6dof-pose blender-python

Python 491

2 个月前

remyxai / VQASynth

Compose multimodal datasets 🎹

dataset-generation multimodal-datasets multimodal-deep-learning synthetic-dataset-generation

Python 484

2 个月前

HeegyuKim / open-korean-instructions

언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.

dataset dataset-generation instructions language-model

Python 435

6 个月前

SimGus / Chatette

#自然语言处理#A powerful dataset generator for Rasa NLU, inspired by Chatito

聊天机器人 dataset-generation Python chatbots 自然语言处理 rasa nlu botkit 命令行界面 sentence Parsing nlg

Python 319

4 年前

fjxmlzn / DoppelGANger

#数据仓库#[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

dataset-generation 隐私 time-series timeseries Generative Adversarial Network gans synthetic-data synthetic-dataset-generation synthetic-data-generation 数据集

Python 307

2 年前

radi-cho / datasetGPT

A command-line interface to generate textual and conversational datasets with LLMs.

命令行界面 dataset-generation large-language-models Python

Python 297

2 年前

facebookresearch / stopes

#计算机科学# A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

dataset dataset-generation 机器学习 nmt translation machine-translation

Python 281

9 个月前

jabberjabberjabber / ImageIndexer

Creates an index of images, queries a local LLM and adds tags to the image metadata

人工智能 dataset-generation exif-metadata exiftool image-classification 图像处理 image-recognition keywords large-language-models llamacpp local multimodal tags

Python 269

3 个月前

davidmartinrius / speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

audio-analysis audio-processing dataset-generation speech-recognition speech-to-text text-to-speech

Python 251

1 年前

ylogx / aesthetics

#数据仓库#Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader

图像处理 Image aesthetic ava live dataset 数据集 dataset-generation dataset-creation

Python 226

2 年前

google / imageinwords

Data release for the ImageInWords (IIW) paper.

evaluation image-captioning image-to-text dataset dataset-generation

JavaScript 220

1 年前

firmai / datagene

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

dataset-generation synthetic-data synthetic-dataset-generation encoding finance 数据结构 model-checking Testing similarity-measures

Jupyter Notebook 206

4 年前

pprp / voc2007_for_yolo_torch

👊 Prepare VOC format datasets for ultralytics/yolov3 & yolov5

voc dataset-generation yolov3

Python 197

2 年前

ZhangYuanhan-AI / Bamboo

[IJCV] Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.

active-learning dataset-generation pre-training

Python 181

2 年前

Website
Wikipedia