GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

image-captioning

Website
Wikipedia
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / LAVIS

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习deep-learning-libraryimage-captioningsalesforcevision-and-languagevision-frameworkvision-language-pretrainingvision-language-transformervisual-question-anwseringmultimodal-datasetsmultimodal-deep-learning
Jupyter Notebook 10.63 k
7 个月前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-languagevision-and-language-pre-trainingimage-text-retrievalimage-captioningvisual-question-answeringvision-language-transformer
Jupyter Notebook 5.31 k
10 个月前
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / InternGPT

#大语言模型#InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...

ChatGPTfoundation-modelgptgpt-4gradiohuskyimage-captioninglangchain大语言模型multimodalvqallamavicunavideo-generationsamsegment-anythingclickdraggan
Python 3.21 k
10 个月前
https://static.github-zh.com/github_avatars/sgrvinod?size=40
sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

PyTorchpytorch-tutorialshow-attend-and-tellimage-captioningencoder-decoderattention-mechanism机器视觉mscoco
Python 2.85 k
3 年前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

multimodalpretrainingimage-captioningtext-to-image-synthesisvisual-question-answeringreferring-expression-comprehensionvision-languagepretrained-modelspromptprompt-tuning中文
Python 2.5 k
1 年前
https://static.github-zh.com/github_avatars/ttengwang?size=40
ttengwang / Caption-Anything

#大语言模型#Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...

ChatGPTcontrollable-generationsegment-anythingcontrollable-image-captioningimage-captioning
Python 1.75 k
2 年前
https://static.github-zh.com/github_avatars/peteanderson80?size=40
peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqavisual-question-answeringfaster-rcnncaffeimage-captioningmscoco
Jupyter Notebook 1.45 k
2 年前
https://static.github-zh.com/github_avatars/imaginary-cloud?size=40
imaginary-cloud / CameraManager

#IOS#Simple Swift class to provide all the configurations you need to create custom camera view in your app

SwiftiOScameravideo-recordingimage-captioningcocoapodsswift-package-managercarthageqrcode-reader
Swift 1.38 k
1 年前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioninglanguage-modelmulti-modal-learningmulti-task-learningvision-language-modelvision-and-languagevqa
Python 1.31 k
1 年前
microsoft/Oscar
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / Oscar

Oscar and VinVL

vision-and-languagepre-trainingimage-captioningvqaoscar
Python 1.05 k
2 年前
https://static.github-zh.com/github_avatars/jhc13?size=40
jhc13 / taggui

Tag manager and captioner for image datasets

image-captioningpyside6stable-diffusionllavacogvlmflorence-2
Python 1.02 k
1 个月前
https://static.github-zh.com/github_avatars/ruotianluo?size=40
ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

image-captioning
Python 1 k
2 年前
https://static.github-zh.com/github_avatars/YehLi?size=40
YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

image-captioningvideo-captioningvision-and-languagepretrainingcross-modal-retrievalvisual-question-answeringtden
Python 969
2 年前
https://static.github-zh.com/github_avatars/yunjey?size=40
yunjey / show-attend-and-tell

TensorFlow Implementation of "Show, Attend and Tell"

Tensorflowimage-captioningshow-attend-and-tellattention-mechanism
Jupyter Notebook 907
7 年前
https://static.github-zh.com/github_avatars/SkalskiP?size=40
SkalskiP / awesome-foundation-and-multimodal-models

#自然语言处理#👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

blipclipfoundational-modelsgrounding-dinollavamultimodalsegment-anything机器视觉自然语言处理open-vocabulary-detectionopen-vocabulary-segmentationimage-captioning
Python 621
1 年前
https://static.github-zh.com/github_avatars/kuanghuei?size=40
kuanghuei / SCAN

#计算机科学#PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

cross-modalimage-captioning神经网络深度学习PyTorch机器视觉
Python 565
2 年前
https://static.github-zh.com/github_avatars/kdexd?size=40
kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

image-captioningcoco-datasetpretrained-modelsmodel-zoocvpr2021
Python 563
1 年前
https://static.github-zh.com/github_avatars/aimagelab?size=40
aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

image-captioningtransformerPyTorchcvpr2020
Python 538
2 年前
https://static.github-zh.com/github_avatars/subho406?size=40
subho406 / OmniNet

#自然语言处理#Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

机器学习深度学习神经网络人工智能transformer自然语言处理image-captioningvideo-recognitionmultitask-learningmultimodal-learning
Python 512
5 年前
https://static.github-zh.com/github_avatars/gokayfem?size=40
gokayfem / ComfyUI_VLM_nodes

#大语言模型#Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

nodescomfyuicustom-nodesllava大语言模型image-captioningmllmvlm
Python 496
4 个月前
loading...