GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

visual-language-learning

Website
Wikipedia
https://static.github-zh.com/github_avatars/haotian-liu?size=40
haotian-liu / LLaVA

#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手

gpt-4聊天机器人ChatGPTllamamultimodalllavafoundation-modelsinstruction-tuningmulti-modalityvisual-language-learningllama-2llama2vision-language-model
Python 22.78 k
10 个月前
https://static.github-zh.com/github_avatars/NExT-GPT?size=40
NExT-GPT / NExT-GPT

#大语言模型#Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

ChatGPTfoundation-modelsgpt-4instruction-tuninglarge-language-models大语言模型multi-modal-chatgptmultimodalvisual-language-learningmllm
Python 3.51 k
1 个月前
EvolvingLMMs-Lab/Otter
https://static.github-zh.com/github_avatars/EvolvingLMMs-Lab?size=40
EvolvingLMMs-Lab / Otter

#大语言模型#🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

gpt-4visual-language-learningartificial-inteligence深度学习foundation-modelsmulti-modality机器学习ChatGPTinstruction-tuninglarge-scale-modelsembodied-ai
Python 3.25 k
1 年前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPTvisual-language-learningmulti-modalityfoundationgpt-4instruction-tuningmllmmultimodalvision-language-modellanguage-model大语言模型large-vision-language-modelvision-transformergpt
Python 2.84 k
20 天前
https://static.github-zh.com/github_avatars/xiaoachen98?size=40
xiaoachen98 / Open-LLaVA-NeXT

#大语言模型#An open-source implementation for training LLaVA-NeXT.

聊天机器人ChatGPTgpt-4gpt4olarge-multimodal-modelsllamallama3llavamulti-modalitymultimodalvision-language-modelvisual-language-learning
Python 398
8 个月前
https://static.github-zh.com/github_avatars/RLHF-V?size=40
RLHF-V / RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

聊天机器人gpt-4llamamulti-modalitymultimodalvisual-language-learning
Python 280
9 个月前
https://static.github-zh.com/github_avatars/mlpc-ucsd?size=40
mlpc-ucsd / BLIVA

#大语言模型#(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

blip2聊天机器人instruction-tuningllama大语言模型multimodalvisual-language-learninglora
Python 260
1 年前
https://static.github-zh.com/github_avatars/thomas-yanxin?size=40
thomas-yanxin / KarmaVLM

🧘🏻‍♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.

llama2llavaqwen2vlmvision-language-modelvisual-language-learning
Python 88
1 年前
https://static.github-zh.com/github_avatars/AdrianBZG?size=40
AdrianBZG / llama-multimodal-vqa

#大语言模型#Multimodal Instruction Tuning for Llama 3

聊天机器人ChatGPTgpt-4huggingfaceinstruction-tuninglanguage-modelsllamallama2llama3multimodalvisual-language-learningvisual-question-answeringvqa
Python 49
1 年前
https://static.github-zh.com/github_avatars/xinyanghuang7?size=40
xinyanghuang7 / Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

large-language-modelsvisual-language-learningvisual-language-models
Python 41
1 年前
https://static.github-zh.com/github_avatars/Skyline-9?size=40
Skyline-9 / Shotluck-Holmes

#自然语言处理#[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding

大语言模型自然语言处理Pythonvideo-captioningmulti-modalityvision-language-modelvisual-language-learning
Python 11
8 个月前
https://static.github-zh.com/github_avatars/ashleykleynhans?size=40
ashleykleynhans / llava-docker

#大语言模型#Docker image for LLaVA: Large Language and Vision Assistant

人工智能聊天机器人ChatGPTDockerDocker Imagefoundation-modelsgpt-4instruction-tuningllamallama-2llama2llava大语言模型multimodalrunpodvision-language-modelvisual-language-learning
Shell 2
1 个月前
https://static.github-zh.com/github_avatars/MuhammadAliS?size=40
MuhammadAliS / CLIP

PyTorch implementation of OpenAI's CLIP model for image classification, visual search, and visual question answering (VQA).

深度神经网络huggingfacepytorch-implementationtransformersvisual-language-learningvisual-question-answering
Jupyter Notebook 2
9 个月前
https://static.github-zh.com/github_avatars/ecoxial2007?size=40
ecoxial2007 / EffVideoQA

Efficient Video Question Answering

机器视觉video-question-answeringvisual-language-learning
Python 1
2 年前