[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image generation/editing/understanding 🖼️, speech-to-text conversion 🎤, and text-to-speech synthesis...
A Unified Framework for Image-to-Graph Generation. Paper accepted @ ECCV22.
#人脸识别#WACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support ...
This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"
#自然语言处理#A deep learning project to tell a story with an image or a video.
This GitHub repository shows how to integrate openai GPT-3 language model and ChatGPT API into a Unity project. It can be a useful way to add natural language processing capabilities to your applicat...
#数据仓库#Collection of open datasets in computer vision.
#大语言模型#Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
HumanVLM (LLaVA-based): Foundation for Human-Scene Vision-Language Model (Journal of Information Fusion 2025)
#计算机科学#A reimplementation of the paper Human-Aligned Image Models Improve Visual Decoding from the Brain
🖼️📄E2E Multi-modal Document Preprocessing for Search Indexing with Azure Document Intelligence
🏷This repository contains the lab sheets of Image Understanding & Processing (SE4130) Module in Year 4 Semester 1.
#大语言模型#Annuncio generates product advertisements from user inputs, utilizing Aria for descriptions, Allegro for promotional videos, and hashtags for social media discoverability.
2022-1 Image Understanding Assignments & Projects
This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"