#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
#计算机科学#NFStream: a Flexible Network Data Analysis Framework.
#计算机科学#A plugin for GTAV that transforms it into a vision-based self-driving car research environment.
#自然语言处理#🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
#人脸识别#Convert face dataset to masked dataset
#计算机科学#Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
Compose multimodal datasets 🎹
#数据仓库#[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
A command-line interface to generate textual and conversational datasets with LLMs.
#计算机科学# A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
#数据仓库#Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Creates an index of images, queries a local LLM and adds tags to the image metadata
Data release for the ImageInWords (IIW) paper.
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
👊 Prepare VOC format datasets for ultralytics/yolov3 & yolov5
[IJCV] Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.