GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

video-captioning

Website
Wikipedia
https://static.github-zh.com/github_avatars/YehLi?size=40
YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

image-captioningvideo-captioningvision-and-languagepretrainingcross-modal-retrievalvisual-question-answeringtden
Python 969
2 年前
https://static.github-zh.com/github_avatars/xiadingZ?size=40
xiadingZ / video-caption.pytorch

#计算机科学#pytorch implementation of video captioning

PyTorchvideo-captioning深度学习
Python 399
6 年前
https://static.github-zh.com/github_avatars/scopeInfinity?size=40
scopeInfinity / Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

深度神经网络cnn-kerasimage-captioningvideo-captioningvideo-processingaudio-processing
Python 348
3 年前
https://static.github-zh.com/github_avatars/xid32?size=40
xid32 / NAACL_2025_TWM

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into exi...

multimodal-large-language-modelsaudio-visual-learningquestion-answeringvideo-captioning
Python 309
5 个月前
https://static.github-zh.com/github_avatars/tomchang25?size=40
tomchang25 / whisper-auto-transcribe

#计算机科学#Auto transcribe tool based on whisper

asrtext-to-speech深度学习speech-recognitionspeech-to-textlanguage-modelPyTorchspeech-processingvoice-activity-detectiongradiogradio-interfacevideo-captioning
Python 225
2 年前
https://static.github-zh.com/github_avatars/antoyang?size=40
antoyang / VidChapters

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

multimodal-learningpre-trainingvideo-captioningvideo-understandingvision-and-language
Jupyter Notebook 190
2 年前
https://static.github-zh.com/github_avatars/jayleicn?size=40
jayleicn / recurrent-transformer

[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

PyTorchvideo-captioning
Jupyter Notebook 171
5 年前
https://static.github-zh.com/github_avatars/vijayvee?size=40
vijayvee / video-captioning

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the ...

video-captioningTensorflowsequence-to-sequencemultimodal-deep-learningseq2seq
Python 166
6 年前
https://static.github-zh.com/github_avatars/JasonYao81000?size=40
JasonYao81000 / MLDS2018SPRING

Machine Learning and having it Deep and Structured (MLDS) in 2018 spring

ntuseq2seqsequence-to-sequenceGenerative Adversarial Networkreinforcement-learningpolicy-gradientdeep-q-networkactor-critic聊天机器人video-captioningimage-generationtext-to-image2018Spring
Python 145
6 年前
https://static.github-zh.com/github_avatars/bytedance?size=40
bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmarkdatasetlarge-language-modelsvideo-language-pretrainingvideo-question-answeringvision-languagevideo-captioningresearch
Python 136
5 个月前
https://static.github-zh.com/github_avatars/jpthu17?size=40
jpthu17 / EMCL

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

cross-modal-retrievalneuripsvideo-captioningvideo-question-answering
Python 134
1 年前
https://static.github-zh.com/github_avatars/jssprz?size=40
jssprz / video_captioning_datasets

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

video-captioningvision-and-language代码审查state-of-the-art
Jupyter Notebook 123
2 年前
https://static.github-zh.com/github_avatars/terry-r123?size=40
terry-r123 / Awesome-Captioning

A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

image-captioningvideo-captioning
110
3 年前
https://static.github-zh.com/github_avatars/jayleicn?size=40
jayleicn / TVCaption

[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset

video-captioningdatasetPyTorch
Python 90
2 年前
https://static.github-zh.com/github_avatars/Kamino666?size=40
Kamino666 / Video-Captioning-Transformer

这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。 视频描述生成任务指的是:输入一个视频,输出一句描述整个视频内容的文字(前提是视频较短且可以用一句话来描述)。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境,促进“无障碍视频”的发展。

PyTorchtransformervideo-captioning
Python 89
3 年前
https://static.github-zh.com/github_avatars/nasib-ullah?size=40
nasib-ullah / video-captioning-models-in-Pytorch

#计算机科学#A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.

video-captioning深度学习sequence-to-sequencePyTorchpytorch-implementationVideo
Python 70
2 年前
https://static.github-zh.com/github_avatars/ParitoshParmar?size=40
ParitoshParmar / MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

multitask-learningvideo-understandingvideo-processingvideo-captioningPyTorchaction-recognitionrepresentation-learninglstmcaptioning
Python 68
1 个月前
https://static.github-zh.com/github_avatars/UARK-AICV?size=40
UARK-AICV / VLTinT

[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

aaai2023transformer-architecturevideo-captioningvision-languagePyTorch
Jupyter Notebook 66
1 年前
https://static.github-zh.com/github_avatars/amazon-science?size=40
amazon-science / crossmodal-contrastive-learning

#自然语言处理#CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

multi-modalityVideovideo-captioning机器视觉自然语言处理transformerscontrastive-learning
Python 63
3 年前
https://static.github-zh.com/github_avatars/jacobswan1?size=40
jacobswan1 / Video2Commonsense

Video captioning baseline models on Video2Commonsense Dataset.

video-captioning
Python 56
4 年前
loading...