image-understanding · GitHub Topics

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

image-understanding large-language-models video-understanding vision-language-model

Python 932

1 年前

PKU-YuanGroup / UniWorld-V1

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

diffusion high-level-feature image-editing image-understanding low-level-vision text-to-image-generation unify unify-ai vlm

Python 701

1 个月前

yohasebe / openai-chat-api-workflow

🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image generation/editing/understanding 🖼️, speech-to-text conversion 🎤, and text-to-speech synthesis...

alfred openai workflow 人工智能 gpt dall-e image-generation speech-to-text Whisper text-to-speech 聊天机器人 image-understanding

317

1 个月前

suprosanna / relationformer

A Unified Framework for Image-to-Graph Generation. Paper accepted @ ECCV22.

image-understanding road-network scene-graph transformer

133

2 年前

DmitryRyumin / WACV-2024-Papers

#人脸识别#WACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support ...

3d-computer-vision adversarial-attacks autonomous-driving biometrics 机器视觉数据集 face-recognition generative-models gesture-recognition image-recognition image-understanding low-level 机器学习 Robotics video-recognition vision-transformer 可视化

Python 96

1 年前

KyanChen / DynamicVis

This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"

机器视觉 foundation-models image-understanding remote-sensing change-detection image-segmentation object-detection image-retrieval instance-segmentation

Python 70

3 个月前

KleinYuan / image2text

#自然语言处理#A deep learning project to tell a story with an image or a video.

深度学习 real-time 人工智能神经网络 image-understanding 自然语言处理 word2vec cnn Tensorflow convolutional-neural-networks 机器学习 theano lasagne

Python 43

8 年前

The-Martyr / Awesome-Multimodal-Reasoning

#大语言模型#Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models

chain-of-thought cot large-language-models 大语言模型 mllm video-understanding multimodal-learning reinforcement-learning rl o1 image-generation image-understanding video-generation

5 天前

sopermanspace / Unity_OpenAI

This GitHub repository shows how to integrate openai GPT-3 language model and ChatGPT API into a Unity project. It can be a useful way to add natural language processing capabilities to your applicat...

openai Unity chatgpt3 gpt-3 聊天机器人 gpt4 人工智能游戏开发 integration openai-chatgpt image-understanding text-to-speech

C# 36

2 年前

wangqingbaidu / CV-Datasets

#数据仓库#Collection of open datasets in computer vision.

机器视觉数据集 image-understanding video-understanding

7 年前

ddw2AIGROUP2CQUPT / HumanVLM

HumanVLM (LLaVA-based): Foundation for Human-Scene Vision-Language Model （Journal of Information Fusion 2025）

human image-understanding vision-language-model

Python 11

8 个月前

back-kh / CVIU78101

📘 [Teaching] Class CVIU78101: Introduction to Computer Vision for Image Understanding Course

机器视觉 course image-understanding

Jupyter Notebook 11

1 个月前

kimtth / rag-multimodal-semantic-chunking

🖼️📄E2E Multi-modal Document Preprocessing for Search Indexing with Azure Document Intelligence

chunking image-understanding workshop

Python 5

2 个月前

UjjwalSaini07 / OllamaMulti-RAG

#大语言模型#OllamaMulti-RAG 🚀 is a multimodal AI chat app combining Whisper AI for audio, LLaVA for images, and Chroma DB for PDFs, enhanced with Ollama and OpenAI API. 📄 Built for AI enthusiasts, it welcomes c...

ai-chatbot chat-application image-understanding langchain ollama openai pdf-processing rag vector-database Whisper 大语言模型 trend trending-topics

Python 2

10 天前

gasparyanartur / brain-image-implementation

#计算机科学#A reimplementation of the paper Human-Aligned Image Models Improve Visual Decoding from the Brain

深度学习 image-understanding research

Jupyter Notebook 1

6 天前

Dulyaaa / IUP_Labs

🏷This repository contains the lab sheets of Image Understanding & Processing (SE4130) Module in Year 4 Semester 1.

OpenCV NumPy matplotlib Python 图像处理 image-understanding

Jupyter Notebook 0

3 年前

chrisputzu / annuncio-hackathon-aria-allegro

#大语言模型#Annuncio generates product advertisements from user inputs, utilizing Aria for descriptions, Allegro for promotional videos, and hashtags for social media discoverability.

人工智能 aria content-creation e-commerce genai Hackathon image-understanding 大语言模型 video-generation

Python 0

10 个月前