GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

visual-language-models

Website
Wikipedia
https://static.github-zh.com/github_avatars/THUDM?size=40
THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

cross-modalitylanguage-modelmulti-modalpretrained-modelsvisual-language-models
Python 6.58 k
1 年前
https://static.github-zh.com/github_avatars/camel-ai?size=40
camel-ai / crab

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

language-model-agentlarge-language-modelsmulti-agent-systemsvisual-language-models
Python 344
12 天前
https://static.github-zh.com/github_avatars/MiniMax-AI?size=40
MiniMax-AI / One-RL-to-See-Them-All

The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning

rlvlmvisual-language-models
Python 232
15 天前
https://static.github-zh.com/github_avatars/bilel-bj?size=40
bilel-bj / ROSGPT_Vision

#大语言模型#Commanding robots using only Language Models' prompts

prompt-engineeringRoboticsros2ChatGPTlanguage-modelslanguage-models-are-nextlarge-language-models大语言模型visual-language-models
Python 99
4 个月前
https://static.github-zh.com/github_avatars/hk-zh?size=40
hk-zh / language-conditioned-robot-manipulation-models

https://arxiv.org/abs/2312.10807

foundation-modelsimitation-learningreinforcement-learningvisual-language-modelsrobot-manipulation
72
6 个月前
https://static.github-zh.com/github_avatars/xinyanghuang7?size=40
xinyanghuang7 / Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

large-language-modelsvisual-language-learningvisual-language-models
Python 41
1 年前
https://static.github-zh.com/github_avatars/tianyu-z?size=40
tianyu-z / VCR

#计算机科学#Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

benchmark深度学习visual-language-models
Python 32
4 个月前
https://static.github-zh.com/github_avatars/AlignGPT-VL?size=40
AlignGPT-VL / AlignGPT

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

large-language-modelsmultimodal-large-language-modelsvisual-language-models
Python 32
1 年前
https://static.github-zh.com/github_avatars/jaisidhsingh?size=40
jaisidhsingh / CoN-CLIP

#计算机科学#Implementation of the "Learn No to Say Yes Better" paper.

compositionality深度学习image-text-matchingmultimodalPyTorchvisual-language-models
Python 31
19 天前
https://static.github-zh.com/github_avatars/kesimeg?size=40
kesimeg / awesome-turkish-language-models

#Awesome#A curated list of Turkish AI models, datasets, papers

large-language-models大语言模型speechturkishvisual-language-modelsvlmAwesome Lists
30
23 天前
https://static.github-zh.com/github_avatars/BioMedIA-MBZUAI?size=40
BioMedIA-MBZUAI / FetalCLIP

Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

人工智能foundation-modelsMedical imagingvisual-language-models
Python 29
3 个月前
https://static.github-zh.com/github_avatars/Sid2697?size=40
Sid2697 / HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

datasetlarge-language-modelsvisual-language-modelsvlm
Python 27
1 年前
https://static.github-zh.com/github_avatars/amathislab?size=40
amathislab / wildclip

Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models

behaviorclip机器视觉visual-language-models
Python 22
1 年前
https://static.github-zh.com/github_avatars/sduzpf?size=40
sduzpf / UAP_VLP

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

adversarial-attacks深度神经网络visual-language-models
Python 14
3 个月前
https://static.github-zh.com/github_avatars/csebuetnlp?size=40
csebuetnlp / IllusionVQA

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

visual-language-modelsvqa
Jupyter Notebook 14
2 个月前
https://static.github-zh.com/github_avatars/declare-lab?size=40
declare-lab / Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodalityvideo-understandingvideo-question-answeringvisual-language-models
Python 12
1 年前
https://static.github-zh.com/github_avatars/CristianoPatricio?size=40
CristianoPatricio / concept-based-interpretability-VLM

#计算机科学#Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).

clip深度学习explainable-aiinterpretabilityMedical imagingvisual-language-models
Jupyter Notebook 11
1 年前
https://static.github-zh.com/github_avatars/GraphPKU?size=40
GraphPKU / CoI

#大语言模型#Chain of Images for Intuitively Reasoning

聊天机器人ChatGPTgpt4vllamallavamultimodalvisual-language-models
Python 9
2 年前
https://static.github-zh.com/github_avatars/NxtGenLegend?size=40
NxtGenLegend / TreeHacks-ZoneOut

#自然语言处理##3 Winner of Best Use of Zoom API at Stanford TreeHacks 2024! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

ai-assistant人工智能audio-processingconversational-aiHackathonJavaScript机器学习meeting-assistant自然语言处理ragspeech-recognitionvideo-analysisvisual-language-modelsvlmWebSocketzoom-api
JavaScript 7
4 个月前
https://static.github-zh.com/github_avatars/ArthurBabkin?size=40
ArthurBabkin / Parimate

#自然语言处理#A Telegram bot for validating audio and video content using CV models, SR models, and VLMs, with deepfake detection leveraging metadata analysis.

机器视觉deepfake-detectionface-recognitionliveness-detectionmvpPostgreSQLspeech-recognitionTelegramvisual-language-modelsaudio-processing自然语言处理
Python 6
1 个月前
loading...