GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

multimodal-models

Website
Wikipedia
https://static.github-zh.com/github_avatars/uncbiag?size=40
uncbiag / Awesome-Foundation-Models

A curated list of foundation models for vision and language tasks

foundation-modelsvision-transformerlarge-language-modelstransformer-modelsmultimodal-models
1.06 k
1 个月前
https://static.github-zh.com/github_avatars/AIDC-AI?size=40
AIDC-AI / Awesome-Unified-Multimodal-Models

Awesome Unified Multimodal Models

multimodal-large-language-modelstext-to-image-generationmultimodal-modelsvision-language-model
499
1 个月前
https://static.github-zh.com/github_avatars/YingqingHe?size=40
YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigclarge-language-modelslarge-vision-language-modelsmultimodal-generationmultimodal-large-language-modelsmultimodal-modelsmultimodalitytext-to-3dtext-to-audiotext-to-imagetext-to-speechtext-to-video大语言模型mllm
HTML 494
4 个月前
https://static.github-zh.com/github_avatars/zli12321?size=40
zli12321 / Vision-Language-Models-Overview

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

blip2claudeclipdeepseekgemini-progpt-4vllavamultimodal-modelsreinforcement-learningworld-models
300
1 天前
https://static.github-zh.com/github_avatars/thaoshibe?size=40
thaoshibe / awesome-personalized-lmms

#Awesome#A curated list of Awesome Personalized Large Multimodal Models resources

Awesome Listslarge-language-modelslarge-multimodal-modelsmultimodal-modelspersonalizationpersonalized-generation
31
2 个月前
https://static.github-zh.com/github_avatars/AmitPeleg?size=40
AmitPeleg / CLIC

Implementation of the paper "Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning", arXiv, 2025

clipcompositionalitymultimodal-modelsretrieval
Python 7
1 个月前
https://static.github-zh.com/github_avatars/pokarats?size=40
pokarats / LAP-final-project

Multimodal Bi-Transformers (MMBT) in Biomedical Text/Image Classification

bertimage-classificationtext-classificationtransfer-learningbiomedical-image-processingtransformerattention-mechanismmultimodal-modelshuggingface-transformers
Jupyter Notebook 3
4 年前
https://static.github-zh.com/github_avatars/antonio-f?size=40
antonio-f / Phi-3-Vision

#计算机科学#Phi-3-Vision model test - running locally

人工智能机器视觉hugging-faceJupyter Notebook大语言模型机器学习multimodal-learningmultimodal-modelsphi-3-visionimage-to-text
Jupyter Notebook 0
1 年前
https://static.github-zh.com/github_avatars/sitamgithub-MSIT?size=40
sitamgithub-MSIT / videollama3-litserve

#计算机科学#Leverage VideoLLaMA 3's capabilities using LitServe.

人工智能深度学习FastAPImultimodal-modelsPythonPyTorchtransformersvideo-understanding
Python 0
5 个月前
https://static.github-zh.com/github_avatars/sitamgithub-MSIT?size=40
sitamgithub-MSIT / gemma3-litserve

#计算机科学#Leverage Gemma 3's capabilities using LitServe.

人工智能深度学习FastAPIgemma3multilingualmultimodal-modelsPythontransformers
Python 0
4 个月前
https://static.github-zh.com/github_avatars/RubenCasal?size=40
RubenCasal / owl_vit_detector

NanoOWL Detection System enables real-time open-vocabulary object detection in ROS 2 using a TensorRT-optimized OWL-ViT model. Describe objects in natural language and detect them instantly on panoram...

机器视觉multimodal-modelsnatural-languageobject-detectiontransformers
C++ 0
3 个月前