GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vlms

Website
Wikipedia
https://static.github-zh.com/github_avatars/oumi-ai?size=40
oumi-ai / oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

dpoevaluationfine-tuninginferencellama大语言模型sftvlms
Python 8.34 k
12 小时前
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / docext

#自然语言处理#An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

documentdocument-analysisextraction大语言模型机器学习自然语言处理OCRragunstructured-datavlmsonpremdocument-data-extractionocr-onpremisellm-ocronprem-ocronprem-visiononpremisetable-extractiondocument-information-extractionocr-benchmark
Python 1.57 k
1 个月前
https://static.github-zh.com/github_avatars/yueliu1999?size=40
yueliu1999 / Awesome-Jailbreak-on-LLMs

#大语言模型#Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

人工智能jailbreak大语言模型隐私safety安全vlmvlms
821
8 天前
https://static.github-zh.com/github_avatars/dvlab-research?size=40
dvlab-research / VisionZip

Official repository for VisionZip (CVPR 2025)

efficiencymulti-modalityvision-language-modelvlms
Python 324
10 天前
https://static.github-zh.com/github_avatars/tianyi-lab?size=40
tianyi-lab / HallusionBench

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmarkvlmsgpt-4gpt-4vllavabenchmarkshallucination大语言模型lmmlarge-language-modelslarge-vision-language-models
Python 293
9 个月前
https://static.github-zh.com/github_avatars/cequence-io?size=40
cequence-io / openai-scala-client

#大语言模型#Scala client for OpenAI API and other major LLM providers

ChatGPTopenaiScalagemini-aigroq-api大语言模型nlp-libraryvertex-ai-gemini-apivlmsaws-bedrockanthropicgemini
Scala 230
2 个月前
https://static.github-zh.com/github_avatars/Beckschen?size=40
Beckschen / ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

vlms
Python 207
1 年前
https://static.github-zh.com/github_avatars/Alpha-Innovator?size=40
Alpha-Innovator / OmniCaptioner

Official Repository of OmniCaptioner

deepseek-r1multi-modalvlms
Python 155
3 个月前
https://static.github-zh.com/github_avatars/MCG-NJU?size=40
MCG-NJU / AWT

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

clip机器视觉video-understandingvlmszero-shot-learningtransfer-learning
Python 103
10 个月前
https://static.github-zh.com/github_avatars/aim-uofa?size=40
aim-uofa / SegAgent

[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

agentsegment-anythingvlms
60
5 个月前
https://static.github-zh.com/github_avatars/TUM-AVS?size=40
TUM-AVS / FM-AD-Survey

This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.

diffusion-models大语言模型vlmsworld-models
59
9 天前
https://static.github-zh.com/github_avatars/foundation-multimodal-models?size=40
foundation-multimodal-models / CAL

[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

vlms
Python 58
10 个月前
https://static.github-zh.com/github_avatars/video-db?size=40
video-db / ocr-benchmark

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

arxivbenchmarkeasyocrOCRrapidocrresearch-papervlms
Python 43
6 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / KITAB-Bench

[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding

arabicbenchmarklayout-detectionOCRpdf-to-texttable-detectionvlmsvqa
Python 42
2 个月前
https://static.github-zh.com/github_avatars/Mamadou-Keita?size=40
Mamadou-Keita / VLM-DETECT

[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

deepfake-detectiondiffusion-models大语言模型text-to-image-generationvlms
Python 31
6 个月前
https://static.github-zh.com/github_avatars/ShenzheZhu?size=40
ShenzheZhu / JailDAM

[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

人工智能aisecurityvlms
15
20 天前
https://static.github-zh.com/github_avatars/ThomasVonWu?size=40
ThomasVonWu / Awesome-VLMs-Strawberry

#大语言模型#A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.

大语言模型multimodal-learningvision-language-transformervlms
10
8 个月前
https://static.github-zh.com/github_avatars/FSoft-AI4Code?size=40
FSoft-AI4Code / VisualCoder

[NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

ai4codecfgvlms
Jupyter Notebook 10
6 个月前
https://static.github-zh.com/github_avatars/iBz-04?size=40
iBz-04 / reeltek

A small VLM that sees everything

huggingfacellamacppllm-inferencePythonvision-language-modelvlmsgpu-accelerationOCR
HTML 8
2 个月前
https://static.github-zh.com/github_avatars/kyegomez?size=40
kyegomez / VLM-Mamba

We introduce VLM-Mamba, the first Vision-Language Model built entirely on State Space Models (SSMs), specifically leveraging the Mamba architecture.

人工智能attentionmamba机器学习PyTorchssmtransformersvision-language-modelvision-transformervlms
Python 7
10 天前
loading...