GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vision-language

Website
Wikipedia
https://static.github-zh.com/github_avatars/IDEA-Research?size=40
IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer
Python 8.58 k
1 年前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / Chinese-CLIP

#自然语言处理#本项目为CLIP模型的中文版本,使用大规模中文数据进行训练(~2亿图文对),旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务

中文机器视觉multi-modal-learning自然语言处理PyTorchvision-and-language-pre-trainingimage-text-retrievalclippretrained-modelsvision-language深度学习multi-modalcontrastive-losstransformerscoreml-models
Python 5.43 k
1 年前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-languagevision-and-language-pre-trainingimage-text-retrievalimage-captioningvisual-question-answeringvision-language-transformer
Jupyter Notebook 5.41 k
1 年前
marqo-ai/marqo
https://static.github-zh.com/github_avatars/marqo-ai?size=40
marqo-ai / marqo

#搜索#Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

深度学习information-retrieval机器学习vector-searchtensor-searchclipmulti-modal搜索引擎transformersvision-languagesemantic-searchvisual-search自然语言处理hnswknnHacktoberfestChatGPTgptlarge-language-models
Python 4.91 k
12 小时前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

multimodalpretrainingimage-captioningtext-to-image-synthesisvisual-question-answeringreferring-expression-comprehensionvision-languagepretrained-modelspromptprompt-tuning中文
Python 2.51 k
1 年前
https://static.github-zh.com/github_avatars/AlibabaResearch?size=40
AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

人工智能documentaimultimodalmultimodal-deep-learningOCR机器视觉vision-language-transformerend-to-end-ocrscene-text-detectionscene-text-detection-recognitionscene-text-recognitiontext-detectiontext-recognitionvision-languagedocumentdocument-analysisdocument-recognitiondocument-understandingdocument-intelligencevision-language-model
C++ 1.75 k
4 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...

聊天机器人clipgpt-4llamallavavicunavision-languagevision-language-pretraining
Python 1.41 k
4 个月前
https://static.github-zh.com/github_avatars/llm-jp?size=40
llm-jp / awesome-japanese-llm

#大语言模型#日本語LLMまとめ - Overview of Japanese LLMs

language-modellanguage-models大语言模型large-language-modelsjapanesejapanese-languagevision-and-languagefoundation-modelsmultimodalvision-languagevision-language-modelgenerative-aigenerative-modelgenerative-models
TypeScript 1.21 k
18 天前
https://static.github-zh.com/github_avatars/OpenDriveLab?size=40
OpenDriveLab / DriveLM

#大语言模型#[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering

autonomous-drivinglarge-language-modelsvision-languagechain-of-thoughtgraph-of-thoughts大语言模型promptingtree-of-thoughtsprompt-engineering
HTML 1.11 k
1 个月前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

foundation-modelsmultimodalrepresentation-learningvision-languageaudio-languagevision-and-languagevision-transformercontrastive-loss
Python 1.05 k
10 个月前
https://static.github-zh.com/github_avatars/2U1?size=40
2U1 / Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

聊天机器人multimodalqwen2-vlvision-languagevision-language-modelqwen2-5
Python 1.01 k
6 天前
https://static.github-zh.com/github_avatars/google-research?size=40
google-research / pix2seq

#计算机科学#Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

object-detection机器视觉vision-language深度学习tensorflow2
Jupyter Notebook 920
2 年前
https://static.github-zh.com/github_avatars/TinyLLaVA?size=40
TinyLLaVA / TinyLLaVA_Factory

#自然语言处理#A Framework of Small-scale Large Multimodal Models

large-multimodal-modelsllamallava自然语言处理transformersvision-language
Python 863
3 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / LLaVA-pp

#大语言模型#🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

conversationllama3llava大语言模型lmmsphi3vision-languagellama-3-llavallama-3-visionllama3-llavaphi-3-visionphi3-vision
Python 839
1 年前
https://static.github-zh.com/github_avatars/SunzeY?size=40
SunzeY / AlphaCLIP

#计算机科学#[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

深度学习机器学习vision-languagevision-language-modelvision-transformervision-and-language
Jupyter Notebook 833
11 天前
https://static.github-zh.com/github_avatars/Algolzw?size=40
Algolzw / daclip-uir

#计算机科学#[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.

diffusion-modelsimage-restorationpromptvision-languageimage-deblurringimage-denoisingimage-deraininglow-level-visionPyTorch深度学习
Python 775
1 年前
https://static.github-zh.com/github_avatars/longzw1997?size=40
longzw1997 / Open-GroundingDino

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

object-detectionopen-worldopen-world-detectionvision-language
Python 656
5 天前
https://static.github-zh.com/github_avatars/mees?size=40
mees / calvin

#自然语言处理#CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

自然语言处理Robotics深度学习groundingvision-languagemanipulation机器视觉PyTorchvisionvision-and-language
Python 635
24 天前
https://static.github-zh.com/github_avatars/AILab-CVC?size=40
AILab-CVC / SEED

Official implementation of SEED-LLaMA (ICLR 2024).

foundation-modelmultimodalvision-language
Python 618
10 个月前
https://static.github-zh.com/github_avatars/cliport?size=40
cliport / cliport

#自然语言处理#CLIPort: What and Where Pathways for Robotic Manipulation

clipRoboticsvision深度学习自然语言处理groundingvision-languagemanipulationPyTorchrearrangement机器视觉
Jupyter Notebook 504
2 年前
loading...