GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vlms

Website
Wikipedia
https://static.github-zh.com/github_avatars/oumi-ai?size=40
oumi-ai / oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

dpoevaluationfine-tuninginferencellama大语言模型sftvlms
Python 8.18 k
2 天前
https://static.github-zh.com/github_avatars/yueliu1999?size=40
yueliu1999 / Awesome-Jailbreak-on-LLMs

#大语言模型#Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

人工智能jailbreak大语言模型隐私safety安全vlmvlms
733
5 天前
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / docext

#自然语言处理#An on-premises, OCR-free unstructured data extraction and benchmarking toolkit. (https://idp-leaderboard.org/)

documentdocument-analysisextraction大语言模型机器学习自然语言处理OCRragunstructured-datavlmstable-extraction
Python 612
5 天前
https://static.github-zh.com/github_avatars/dvlab-research?size=40
dvlab-research / VisionZip

Official repository for VisionZip (CVPR 2025)

efficiencymulti-modalityvision-language-modelvlms
Python 287
20 天前
https://static.github-zh.com/github_avatars/tianyi-lab?size=40
tianyi-lab / HallusionBench

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmarkvlmsgpt-4gpt-4vllavabenchmarkshallucination大语言模型lmmlarge-language-modelslarge-vision-language-models
Python 284
7 个月前
https://static.github-zh.com/github_avatars/cequence-io?size=40
cequence-io / openai-scala-client

#大语言模型#Scala client for OpenAI API and other major LLM providers

ChatGPTopenaiScalagemini-aigroq-api大语言模型nlp-libraryvertex-ai-gemini-apivlmsaws-bedrockanthropicgemini
Scala 224
23 天前
https://static.github-zh.com/github_avatars/Beckschen?size=40
Beckschen / ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

vlms
Python 206
1 年前
https://static.github-zh.com/github_avatars/MCG-NJU?size=40
MCG-NJU / AWT

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

clip机器视觉video-understandingvlmszero-shot-learningtransfer-learning
Python 101
8 个月前
https://static.github-zh.com/github_avatars/foundation-multimodal-models?size=40
foundation-multimodal-models / CAL

[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

vlms
Python 56
9 个月前
https://static.github-zh.com/github_avatars/aim-uofa?size=40
aim-uofa / SegAgent

[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

agentsegment-anythingvlms
51
3 个月前
https://static.github-zh.com/github_avatars/video-db?size=40
video-db / ocr-benchmark

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

arxivbenchmarkeasyocrOCRrapidocrresearch-papervlms
Python 41
4 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / KITAB-Bench

[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding

arabicbenchmarklayout-detectionOCRpdf-to-texttable-detectionvlmsvqa
Python 39
22 天前
https://static.github-zh.com/github_avatars/Mamadou-Keita?size=40
Mamadou-Keita / VLM-DETECT

[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

deepfake-detectiondiffusion-models大语言模型text-to-image-generationvlms
Python 30
5 个月前
https://static.github-zh.com/github_avatars/ShenzheZhu?size=40
ShenzheZhu / JailDAM

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

人工智能aisecurityvlms
12
21 天前
https://static.github-zh.com/github_avatars/ThomasVonWu?size=40
ThomasVonWu / Awesome-VLMs-Strawberry

#大语言模型#A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.

大语言模型multimodal-learningvision-language-transformervlms
10
7 个月前
https://static.github-zh.com/github_avatars/FSoft-AI4Code?size=40
FSoft-AI4Code / VisualCoder

[NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

ai4codecfgvlms
Jupyter Notebook 9
4 个月前
https://static.github-zh.com/github_avatars/TUM-AVS?size=40
TUM-AVS / FM-AD-Survey

This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.

diffusion-models大语言模型vlmsworld-models
7
5 天前
https://static.github-zh.com/github_avatars/logic-OT?size=40
logic-OT / BobVLM

#自然语言处理#BobVLM – A 1.5B multimodal model built from scratch and pre-trained on a single P100 GPU capable of image descriptions and moderate question answering. 🤗🎉

深度学习experimentgpuhuggingfacehuggingface-transformersLibrary大语言模型multimodal自然语言处理vision-transformervlms
Python 6
4 个月前
https://static.github-zh.com/github_avatars/Imageomics?size=40
Imageomics / VLM4Bio

Code for VLM4Bio, a benchmark dataset of scientific question-answer pairs used to evaluate pretrained VLMs for trait discovery from biological images.

benchmarksbiologybutterflycvfriendly interactive shellimage-classificationimage-recognitiontraitsvlms
Python 6
8 个月前
https://static.github-zh.com/github_avatars/PGSmall?size=40
PGSmall / clip-pgs

Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"

clipmasked-image-modelingvision-language-pretrainingvlms
Python 5
3 个月前
loading...