image-captioning · GitHub Topics

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习 deep-learning-library image-captioning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering multimodal-datasets multimodal-deep-learning

Jupyter Notebook 10.89 k

10 个月前

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-language vision-and-language-pre-training image-text-retrieval image-captioning visual-question-answering vision-language-transformer

Jupyter Notebook 5.48 k

1 年前

OpenGVLab / InternGPT

#大语言模型#InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...

ChatGPT foundation-model gpt gpt-4 gradio husky image-captioning langchain 大语言模型 multimodal vqa llama vicuna video-generation sam segment-anything click draggan

Python 3.22 k

1 年前

sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

PyTorch pytorch-tutorial show-attend-and-tell image-captioning encoder-decoder attention-mechanism 机器视觉 mscoco

Python 2.87 k

3 年前

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

multimodal pretraining image-captioning text-to-image-synthesis visual-question-answering referring-expression-comprehension vision-language pretrained-models prompt prompt-tuning 中文

Python 2.53 k

1 年前

ttengwang / Caption-Anything

#大语言模型#Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...

ChatGPT controllable-generation segment-anything controllable-image-captioning image-captioning

Python 1.76 k

2 年前

peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqa visual-question-answering faster-rcnn caffe image-captioning mscoco

Jupyter Notebook 1.45 k

3 年前

imaginary-cloud / CameraManager

#IOS#Simple Swift class to provide all the configurations you need to create custom camera view in your app

Swift iOS camera video-recording image-captioning cocoapods swift-package-manager carthage qrcode-reader

Swift 1.39 k

1 年前

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

Python 1.31 k

2 年前

jhc13 / taggui

Tag manager and captioner for image datasets

image-captioning pyside6 stable-diffusion llava cogvlm florence-2

Python 1.11 k

4 个月前

microsoft / Oscar

Oscar and VinVL

vision-and-language pre-training image-captioning vqa oscar

Python 1.05 k

2 年前

ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

image-captioning

Python 1.01 k

2 年前

YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

image-captioning video-captioning vision-and-language pretraining cross-modal-retrieval visual-question-answering tden

Python 968

3 年前

yunjey / show-attend-and-tell

TensorFlow Implementation of "Show, Attend and Tell"

Tensorflow image-captioning show-attend-and-tell attention-mechanism

Jupyter Notebook 906

7 年前

cuixing158 / Awesome-CV-MasterHub

🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works

Awesome Lists image-captioning image-classification image-denoising image-enhancement image-generation keypoint-detection object-detection panoptic-segmentation pose-estimation video-generation video-understanding vision-transformer paper-list image-segmentation low-level-vision

639

2 天前

SkalskiP / awesome-foundation-and-multimodal-models

#自然语言处理#👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

blip clip foundational-models grounding-dino llava multimodal segment-anything 机器视觉自然语言处理 open-vocabulary-detection open-vocabulary-segmentation image-captioning

Python 634

2 年前

kuanghuei / SCAN

#计算机科学#PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

cross-modal image-captioning 神经网络深度学习 PyTorch 机器视觉

Python 568

2 年前

kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

image-captioning coco-dataset pretrained-models model-zoo cvpr2021

Python 564

23 天前

aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

image-captioning transformer PyTorch cvpr2020

Python 543

3 年前

gokayfem / ComfyUI_VLM_nodes

#大语言模型#Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

nodes comfyui custom-nodes llava 大语言模型 image-captioning mllm vlm

Python 527

7 个月前