GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

multi-modality

Website
Wikipedia
https://static.github-zh.com/github_avatars/haotian-liu?size=40
haotian-liu / LLaVA

#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手

gpt-4聊天机器人ChatGPTllamamultimodalllavafoundation-modelsinstruction-tuningmulti-modalityvisual-language-learningllama-2llama2vision-language-model
Python 22.78 k
10 个月前
https://static.github-zh.com/github_avatars/BradyFU?size=40
BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

instruction-tuninginstruction-followinglarge-vision-language-modelvisual-instruction-tuningmulti-modalityin-context-learninglarge-language-modelslarge-vision-language-modelsmultimodal-chain-of-thoughtmultimodal-in-context-learningmultimodal-large-language-modelschain-of-thought
15.53 k
2 天前
jina-ai/clip-as-service
https://static.github-zh.com/github_avatars/jina-ai?size=40
jina-ai / clip-as-service

#计算机科学#🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

bertsentence-encoding深度学习clip-modelclip-as-servicebert-as-servicecross-modal-retrievalmulti-modalityneural-searchopenaiPyTorchonnxcross-modality
Python 12.68 k
1 年前
kyegomez/swarms
https://static.github-zh.com/github_avatars/kyegomez?size=40
kyegomez / swarms

#大语言模型#The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

人工智能attention-mechanismgpt4langchain机器学习multi-modal-imagingmulti-modalitymultimodalswarmstransformer-modelsagentsprompt-engineeringprompt-toolkitpromptingtree-of-thoughtsChatGPTgpt4allhuggingfacelangchain-python
Python 4.93 k
1 天前
https://static.github-zh.com/github_avatars/lucidrains?size=40
lucidrains / deep-daze

#计算机科学#Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

人工智能深度学习transformerssirenimplicit-neural-representationtext-to-imagemulti-modality
Python 4.36 k
3 年前
EvolvingLMMs-Lab/Otter
https://static.github-zh.com/github_avatars/EvolvingLMMs-Lab?size=40
EvolvingLMMs-Lab / Otter

#大语言模型#🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

gpt-4visual-language-learningartificial-inteligence深度学习foundation-modelsmulti-modality机器学习ChatGPTinstruction-tuninglarge-scale-modelsembodied-ai
Python 3.25 k
1 年前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPTvisual-language-learningmulti-modalityfoundationgpt-4instruction-tuningmllmmultimodalvision-language-modellanguage-model大语言模型large-vision-language-modelvision-transformergpt
Python 2.84 k
20 天前
https://static.github-zh.com/github_avatars/DLR-RM?size=40
DLR-RM / 3DObjectTracking

Algorithms and Publications on 3D Object Tracking

pose-estimation机器视觉Bukkitcvpr2022real-timeobject-trackingmulti-modalityrgbdtracking
C++ 893
25 天前
https://static.github-zh.com/github_avatars/OpenBMB?size=40
OpenBMB / VisRAG

Parsing-free RAG supported by VLMs

ragretrievalretrieval-augmented-generationvision-language-modelmulti-modalmulti-modalitydocument-retrievaldocument-understanding
Python 732
4 个月前
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / Multi-Modality-Arena

#大语言模型#Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...

chat聊天机器人ChatGPTgradiolarge-language-models大语言模型vqamulti-modalityvision-language-model
Python 529
1 年前
https://static.github-zh.com/github_avatars/kyegomez?size=40
kyegomez / Gemini

#计算机科学#The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

人工智能geminigpt4机器学习multi-modalitymultimodla
Python 461
13 天前
https://static.github-zh.com/github_avatars/researchmm?size=40
researchmm / MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

audio-generationcontent-creationdiffusion-modelsmulti-modalityvideo-generation
Python 429
1 年前
https://static.github-zh.com/github_avatars/yuanze-lin?size=40
yuanze-lin / Olympus

#大语言模型#[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

大语言模型multimodal聊天机器人ChatGPT深度学习foundation-modelsinstruction-tuningllavamulti-modalityPyTorchvision-language-model
Python 427
16 天前
https://static.github-zh.com/github_avatars/ziqihuangg?size=40
ziqihuangg / Collaborative-Diffusion

[CVPR 2023] Collaborative Diffusion

diffusion-modelsface-generationimage-editingimage-generationlatent-diffusion-modelsmulti-modalityaigcgen-aistable-diffusion
Python 425
2 年前
https://static.github-zh.com/github_avatars/xiaoachen98?size=40
xiaoachen98 / Open-LLaVA-NeXT

#大语言模型#An open-source implementation for training LLaVA-NeXT.

聊天机器人ChatGPTgpt-4gpt4olarge-multimodal-modelsllamallama3llavamulti-modalitymultimodalvision-language-modelvisual-language-learning
Python 398
8 个月前
https://static.github-zh.com/github_avatars/LSXI7?size=40
LSXI7 / MINIMA

[CVPR 2025] MINIMA: Modality Invariant Image Matching

image-matchingmulti-modality
Python 398
18 天前
https://static.github-zh.com/github_avatars/kyegomez?size=40
kyegomez / Sophia

#大语言模型# Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

人工智能ChatGPT深度学习multi-modality神经网络optimizer
Python 381
1 年前
https://static.github-zh.com/github_avatars/dvlab-research?size=40
dvlab-research / VisionZip

Official repository for VisionZip (CVPR 2025)

efficiencymulti-modalityvision-language-modelvlms
Python 287
20 天前
https://static.github-zh.com/github_avatars/RLHF-V?size=40
RLHF-V / RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

聊天机器人gpt-4llamamulti-modalitymultimodalvisual-language-learning
Python 280
9 个月前
https://static.github-zh.com/github_avatars/DerrickWang005?size=40
DerrickWang005 / CRIS.pytorch

An official PyTorch implementation of the CRIS paper

contrastive-learningmulti-modality
Python 272
1 年前
loading...