captioning · GitHub Topics

facebookresearch / mmf

#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

PyTorch vqa pretrained-models multimodal 深度学习 captioning dialog textvqa hateful-memes multi-tasking

Python 5.59 k

5 个月前

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision transformers vision-and-language vqa qwen2-vl

Python 2.63 k

2 天前

fpgaminer / joycaption

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

captioning vlm

Jupyter Notebook 826

1 个月前

ltguo19 / VSUA-Captioning

#自然语言处理#Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

captioning language-generation 深度学习 PyTorch 自然语言处理

Python 258

6 年前

DavidHuji / CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

captioning clip gpt-2 multimodal-deep-learning zero-shot-learning

Python 199

2 年前

Labbeti / aac-datasets

#数据仓库#Audio Captioning datasets for PyTorch.

PyTorch audio caption 数据集 captioning dataset 深度学习

Python 120

2 个月前

HaydenFaulkner / Tennis

#计算机科学#A Tennis dataset and models for event detection & commentary generation

机器学习机器视觉 dataset fine-grained captioning Video mxnet gluon

Python 108

3 个月前

mitvis / vistext

VisText is a benchmark dataset for semantically rich chart captioning.

captioning charts dataset t5

Jupyter Notebook 95

1 个月前

drethage / fully-convolutional-point-network

#计算机科学#Fully-Convolutional Point Networks for Large-Scale Point Clouds

机器视觉 3D semantic-segmentation 深度学习深度神经网络 point-clouds captioning Point cloud meshes

Python 86

6 年前

Mauville / MedCLIP

#计算机科学#Medical image captioning using OpenAI's CLIP

深度学习 clip captioning 机器学习 Medical imaging

Jupyter Notebook 85

3 年前

audio-captioning / clotho-dataset

#自然语言处理#Python code for handling the Clotho dataset.

audio 深度学习自然语言处理 captioning

Python 84

5 年前

wangleihitcs / MedicalReportGeneration

A Base Tensorflow Project for Medical Report Generation

Tensorflow captioning

Python 70

6 年前

ParitoshParmar / MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

multitask-learning video-understanding video-processing video-captioning PyTorch action-recognition representation-learning lstm captioning

Python 69

4 个月前

aimagelab / pacscore

[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

captioning captioning-videos 机器视觉 cvpr cvpr2023 vision-and-language

Python 64

2 个月前

TheShadow29 / VidSitu

#自然语言处理#[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

vision vision-and-language grounding 自然语言处理 Video srl captioning-videos captioning

Python 61

4 年前

42lux / CaptainCaption

A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.

captioning gpt-4-vision gradio openai-api tagging

Python 60

10 个月前

Labbeti / aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

audio captioning 监控 text

Python 58

2 个月前

lucidrains / AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

attention attention-mechanism vqa visual-question-answering captioning

Python 43

5 年前

DavidMChan / caption-by-committee

#大语言模型#Using LLMs and pre-trained caption models for super-human performance on image captioning.

人工智能 captioning ChatGPT 深度学习 Image 机器学习 Python

Python 42

2 年前

audio-captioning / dcase-2020-baseline

#计算机科学#Audio captioning baseline system for DCASE 2020 challenge.

captioning 深度学习深度神经网络机器学习 signal-processing

Python 38

2 年前