blip2 · GitHub Topics

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

large-language-models video-language-pretraining vision-language-pretraining blip2 llama minigpt4 cross-modal-pretraining multi-modal-chatgpt

Python 3.05 k

1 年前

sled-group / chat-with-nerf

#大语言模型#[ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.

blip2 ChatGPT gpt-4 nerf

Python 313

1 年前

zli12321 / Vision-Language-Models-Overview

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

blip2 claude clip deepseek gemini-pro gpt-4v llava multimodal-models reinforcement-learning world-models

300

1 天前

mlpc-ucsd / BLIVA

#大语言模型#(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

blip2 聊天机器人 instruction-tuning llama 大语言模型 multimodal visual-language-learning lora

Python 261

1 年前

gongzix / NeuroClips

Official code base for NeuroClips

fmri blip2

MATLAB 92

2 个月前

SmithaUpadhyaya / fashion_image_caption

Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features ...

blip2 huggingface-transformers Image transformer multimodal-deep-learning

Jupyter Notebook 56

2 年前

kyegomez / qformer

#计算机科学#Implementation of Qformer from BLIP2 in Zeta Lego blocks.

人工智能 attention-mechanism blip2 machine 机器学习 multi-modal multi-modality

Python 40

9 个月前

eric-ai-lab / ComCLIP

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

blip2 causality clip compositionality image-text-matching image-text-retrieval vision-and-language

Python 36

1 年前

BUAADreamer / SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

blip blip2 clip data-generation image-retrieval llama llava multimodal-learning transformer cross-modal-retrieval

Python 35

1 个月前

nngocson2002 / ViVQA

The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)

beit-3 blip2 efficientnet multimodal-deep-learning vqa

Python 20

1 年前

zer0int / CLIP-Interrogator-LongCLIP-hallucinwords

CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!

blip blip2 clip

Python 17

14 天前

arashsajjadi / ai-powered-video-analyzer

#大语言模型#An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Oll...

blip2 GUI image-captioning 大语言模型 object-detection ollama ollama-api 隐私 Whisper whisper-ai yolo yolo11

Python 16

5 个月前

ZhaoPeiduo / BLIP2-Japanese

Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.

captioning japanese PyTorch blip2 multimodal-deep-learning

Python 12

7 个月前

matlok-ai / bampe-weights

#大语言模型#This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for ...