GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vision-language-transformer

Website
Wikipedia
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / LAVIS

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习deep-learning-libraryimage-captioningsalesforcevision-and-languagevision-frameworkvision-language-pretrainingvision-language-transformervisual-question-anwseringmultimodal-datasetsmultimodal-deep-learning
Jupyter Notebook 10.63 k
7 个月前
https://static.github-zh.com/github_avatars/IDEA-Research?size=40
IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer
Python 8.24 k
10 个月前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-languagevision-and-language-pre-trainingimage-text-retrievalimage-captioningvisual-question-answeringvision-language-transformer
Jupyter Notebook 5.31 k
10 个月前
https://static.github-zh.com/github_avatars/AlibabaResearch?size=40
AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

人工智能documentaimultimodalmultimodal-deep-learningOCR机器视觉vision-language-transformerend-to-end-ocrscene-text-detectionscene-text-detection-recognitionscene-text-recognitiontext-detectiontext-recognitionvision-languagedocumentdocument-analysisdocument-recognitiondocument-understandingdocument-intelligencevision-language-model
C++ 1.73 k
2 个月前
https://static.github-zh.com/github_avatars/henghuiding?size=40
henghuiding / ReLA

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

multimodal-learningreferring-expression-comprehensionreferring-expression-segmentationvision-language-transformercvpr2023
Python 702
2 年前
https://static.github-zh.com/github_avatars/shenyunhang?size=40
shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

image-segmentationobject-detectionopen-worldreferring-expression-comprehensionvision-language-transformer
Python 569
1 年前
https://static.github-zh.com/github_avatars/henghuiding?size=40
henghuiding / Vision-Language-Transformer

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

vision-languagetransformerTensorflowKerasiccv2021vision-language-transformer
Python 358
3 年前
https://static.github-zh.com/github_avatars/haoliuhl?size=40
haoliuhl / instructrl

#计算机科学#Instruction Following Agents with Multimodal Transforemrs

flaxinstruction-followinginstructionsjax机器学习reinforcement-learningtransformervision-language-transformer
Python 52
3 年前
https://static.github-zh.com/github_avatars/sMamooler?size=40
sMamooler / CLIP_Explainability

#计算机科学#code for studying OpenAI's CLIP explainability

机器视觉机器学习model-explainabilityvision-language-transformer
Jupyter Notebook 31
3 年前
https://static.github-zh.com/github_avatars/yiren-jian?size=40
yiren-jian / BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

multimodal-deep-learningvision-language-pretrainingvision-language-transformer
Python 25
2 年前
https://static.github-zh.com/github_avatars/unitaryai?size=40
unitaryai / VTC

VTC: Improving Video-Text Retrieval with User Comments

multimodal-deep-learningvideo-understandingvision-language-pretrainingvision-language-transformercomments
Python 11
3 个月前
https://static.github-zh.com/github_avatars/deepmancer?size=40
deepmancer / vlm-toolbox

#计算机科学#Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation

clip深度学习deep-learning-librarymultimodal-datasetsmultimodal-deep-learningmultimodal-learningprompt-tuningvision-and-languagevision-frameworkvision-language-transformerzero-shot-classificationPyTorchtransformers
Jupyter Notebook 10
4 个月前
https://static.github-zh.com/github_avatars/ThomasVonWu?size=40
ThomasVonWu / Awesome-VLMs-Strawberry

#大语言模型#A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.

大语言模型multimodal-learningvision-language-transformervlms
10
7 个月前
https://static.github-zh.com/github_avatars/akusayudodograu?size=40
akusayudodograu / Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

agentic-aiagentic-ragagentic-workflowgenerative-aimultimodalmultimodal-deep-learningmultimodal-large-language-modelsmultimodal-learningvision-languagevision-language-modelvision-language-transformer
9
9 天前
https://static.github-zh.com/github_avatars/marialymperaiou?size=40
marialymperaiou / knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

image-text-matchingimage-text-retrievalknowledge-graphmultimodal-deep-learningvision-and-languagevision-and-language-pre-trainingvision-language-transformervisual-commonsense-reasoningvisual-question-answeringmulti-task-learning
7
3 年前
https://static.github-zh.com/github_avatars/fork123aniket?size=40
fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

conversational-agentconversational-aiinternvlmultimodalmultimodal-deep-learningmultimodal-large-language-modelsmultimodal-learningvision-languagevision-language-modelvision-language-transformergenerative-ai
Python 3
5 个月前
https://static.github-zh.com/github_avatars/fork123aniket?size=40
fork123aniket / Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

agentic-aiagentic-ragagentic-workflowgenerative-aimultimodalmultimodal-deep-learningmultimodal-large-language-modelsmultimodal-learningvision-languagevision-language-modelvision-language-transformer
Python 2
5 个月前
https://static.github-zh.com/github_avatars/PrateekJannu?size=40
PrateekJannu / Vision-GPT

#大语言模型#Coding a Multi-Modal vision model like GPT-4o from scratch, inspired by @hkproj and PaliGemma

geminiGoogle大语言模型large-language-modelsOpen Sourcetransformer-architecturetransformer-modelsvision-language-modelvision-language-transformervision-transformer人工智能gpt-4o机器学习
Python 1
7 个月前
https://static.github-zh.com/github_avatars/aurooj?size=40
aurooj / VLM_SS

Mini-batch selective sampling for knowledge adaption of VLMs for mammography.

Medical imagingmultimodal-learningvision-and-languagevision-language-transformer
Jupyter Notebook 1
8 个月前
https://static.github-zh.com/github_avatars/atharva-naik?size=40
atharva-naik / MMML-TermProject-VizWiz-VQA-Challenge

#自然语言处理#VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)

carnegie-mellon-university机器视觉图像处理自然语言处理Open Sourceopen-source-projectOpenCVPyTorchquestion-answeringvision-languagevision-language-transformervisual-question-answering
Python 0
2 年前
loading...