GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

image-text-retrieval

Website
Wikipedia
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / InternVL

#大语言模型#[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classificationimage-text-retrieval大语言模型semantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6bmulti-modalgptgpt-4vgpt-4o
Python 8.33 k
17 天前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-languagevision-and-language-pre-trainingimage-text-retrievalimage-captioningvisual-question-answeringvision-language-transformer
Jupyter Notebook 5.31 k
10 个月前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / Chinese-CLIP

#自然语言处理#本项目为CLIP模型的中文版本,使用大规模中文数据进行训练(~2亿图文对),旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务

中文机器视觉multi-modal-learning自然语言处理PyTorchvision-and-language-pre-trainingimage-text-retrievalclippretrained-modelsvision-language深度学习multi-modalcontrastive-losstransformerscoreml-models
Python 5.28 k
10 个月前
https://static.github-zh.com/github_avatars/slavabarkov?size=40
slavabarkov / tidy

#自然语言处理#Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine

Androidclip机器视觉深度学习image-retrievalKotlin自然语言处理onnxquantizationimage-text-retrievalcross-modal-retrievalimage-text-matchingimage-searchsemantic-search
Kotlin 442
1 年前
https://static.github-zh.com/github_avatars/Paranioar?size=40
Paranioar / Awesome_Matching_Pretraining_Transfering

#Awesome#The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...

cross-modal-retrieval教程Awesome Listsimage-text-matchingimage-text-retrievallarge-language-modelslarge-vision-language-modelsmultimodal-pretrainingparameter-efficient-fine-tuningvision-and-languagemultimodal-large-language-models大语言模型text-to-image-generationtext-to-image-synthesistext-to-video-generation
423
6 个月前
https://static.github-zh.com/github_avatars/greyovo?size=40
greyovo / PicQuery

#安卓#🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android 上用自然语言搜索本地图片 (基于 OpenAI 的 CLIP 模型)

Androidclipimage-text-retrievalmaterial-design-3openaiJetpack Compose
Kotlin 413
4 个月前
https://static.github-zh.com/github_avatars/Paranioar?size=40
Paranioar / SGRAF

[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

cross-modal-retrievalimage-text-matchingimage-retrievalimage-text-retrievaltext-matchingaaai
Python 215
1 年前
https://static.github-zh.com/github_avatars/chuhaojin?size=40
chuhaojin / Text2Poster-ICASSP-22

#计算机科学#Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"

aigc深度学习multimodal-generation图像处理image-retrievalartificial-neural-networksPyTorchobject-detectionimage-text-retrieval
Python 211
1 年前
https://static.github-zh.com/github_avatars/alipay?size=40
alipay / Ant-Multi-Modal-Framework

Research Code for Multimodal-Cognition Team in Ant Group

image-text-retrievalmultimodal-learningvideo-editing
Python 150
1 个月前
https://static.github-zh.com/github_avatars/howard-hou?size=40
howard-hou / BagFormer

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

cross-modal-retrievalimage-text-retrievalvision-language
Python 99
2 年前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / mPLUG

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

image-captioningimage-text-retrievalmultimodalpretrainingPyTorchtransformervqa
Python 92
2 年前
https://static.github-zh.com/github_avatars/hpc203?size=40
hpc203 / Chinese-CLIP-opencv-onnxrun

使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序

clipimage-text-retrievalopencv-dnnmultimodal-large-language-models
C++ 75
1 年前
https://static.github-zh.com/github_avatars/MILVLG?size=40
MILVLG / rosita

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

vision-and-languagevqapre-trainingimage-text-retrievalreferring-expression-comprehension
Python 56
2 年前
https://static.github-zh.com/github_avatars/cobanov?size=40
cobanov / image-captioning

Image captioning using python and BLIP

image-captioningblipimage-text-retrievalvision-language
Python 47
2 年前
https://static.github-zh.com/github_avatars/eric-ai-lab?size=40
eric-ai-lab / ComCLIP

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

blip2causalityclipcompositionalityimage-text-matchingimage-text-retrievalvision-and-language
Python 35
10 个月前
https://static.github-zh.com/github_avatars/eric-ai-lab?size=40
eric-ai-lab / CPL

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

causal-inferenceimage-classificationimage-text-retrievalprompt-tuningvision-and-languagevqa
Python 34
3 年前
https://static.github-zh.com/github_avatars/Paranioar?size=40
Paranioar / RCAR

[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”

cross-modal-retrievalimage-text-matchingimage-retrievalimage-text-retrievaltext-matchingtip
Python 33
1 年前
https://static.github-zh.com/github_avatars/ytaek-oh?size=40
ytaek-oh / fsc-clip

[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

image-text-retrievalzero-shot-classificationcompositionality
Python 16
8 个月前
https://static.github-zh.com/github_avatars/alipay?size=40
alipay / PC2-NoiseofWeb

Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.

benchmarkcross-modal-retrievaldatasetimage-text-matchingimage-text-retrievalmultimodal-learning
Python 12
7 个月前
https://static.github-zh.com/github_avatars/frank-chris?size=40
frank-chris / ImageTextRetrieval

In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also pro...

image-text-retrievalcross-modal-retrievalcross-modal-learningPyTorchTensorflowFlask
Jupyter Notebook 11
4 年前
loading...