GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

video-understanding

Website
Wikipedia
https://static.github-zh.com/github_avatars/open-mmlab?size=40
open-mmlab / mmaction2

#计算机科学#OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

action-recognitiontemporal-action-localizationPyTorchvideo-understandingtsni3dslowfastavaspatial-temporal-action-detectionbenchmarktsmnon-local深度学习openmmlabvideo-classification
Python 4.71 k
1 年前
https://static.github-zh.com/github_avatars/jinwchoi?size=40
jinwchoi / awesome-action-recognition

#Awesome#A curated list of action recognition and related area resources

Awesome Listsaction-recognitionaction-detectionactivity-recognitionvideo-understandingvideo-recognitionvideo-processingobject-recognitionpose-estimation
3.92 k
2 年前
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / Ask-Anything

#大语言模型#[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

captioning-videosChatGPTgradiolangchainvideo-question-answeringvideo-understandingstablelmchatVideobig-modelfoundation-modelslarge-language-models
Python 3.28 k
6 个月前
https://static.github-zh.com/github_avatars/mit-han-lab?size=40
mit-han-lab / temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

accelerationlow-latencytemporal-modelingvideo-understandingefficient-modelnvidia-jetson-nanotsm
Python 2.13 k
1 年前
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

foundation-modelsvideo-understandingvision-transformeraction-recognitionmultimodaltemporal-action-localizationvideo-question-answeringzero-shot-classificationbenchmarkcontrastive-learningself-supervisedinstruction-tuningvideo-clip
Python 1.98 k
1 个月前
https://static.github-zh.com/github_avatars/open-mmlab?size=40
open-mmlab / mmaction

An open-source toolbox for action understanding based on PyTorch

action-recognitionaction-detectionvideo-understandingPyTorchtemporal-action-detectiontemporal-action-localizationspatial-temporal-action-detection
Python 1.87 k
3 年前
https://static.github-zh.com/github_avatars/PaddlePaddle?size=40
PaddlePaddle / PaddleVideo

基于模块化的设计,提供丰富的视频算法实现、产业级的视频算法优化与应用,包括安防、体育、互联网、媒体等行业的动作定位与识别、行为分析、智能封面、视频标注、视频打标签等,涵盖动作识别与视频分类、动作定位、动作检测、多模态文本视频检索等技术。

video-recognitiontsmslowfasttsnbmnaction-recognitionyoutube-8mkinetics400video-understandingactivitynetaction-detectiontemporal-action-detectionava
Python 1.63 k
6 个月前
https://static.github-zh.com/github_avatars/yjxiong?size=40
yjxiong / temporal-segment-networks

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

action-recognitionvideo-understanding
Python 1.56 k
5 年前
https://static.github-zh.com/github_avatars/MCG-NJU?size=40
MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

self-supervised-learningaction-recognitionvideo-understandingtransformervision-transformerPyTorchvideo-analysisneurips-2022
Python 1.55 k
2 年前
https://static.github-zh.com/github_avatars/bytedance?size=40
bytedance / SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

audioaudio-processinglarge-language-modelsmulti-modalspeechspeech-recognitionbytedancetsinghua-universitymusiciclr2024researchVideovideo-understanding
Python 1.29 k
24 天前
https://static.github-zh.com/github_avatars/TheShadow29?size=40
TheShadow29 / awesome-grounding

#自然语言处理#awesome grounding: A curated list of research papers in visual grounding

机器视觉自然语言处理groundingAwesome Listspapersarxivvideo-understandingcaptioning-videosembodied-agentmultimodal-deep-learninglanguage-groundingBukkit
1.09 k
8 天前
https://static.github-zh.com/github_avatars/yjxiong?size=40
yjxiong / tsn-pytorch

#计算机科学#Temporal Segment Networks (TSN) in PyTorch

action-recognition深度学习video-understandingPyTorch
Python 1.08 k
6 年前
https://static.github-zh.com/github_avatars/zai-org?size=40
zai-org / GLM-4.1V-Thinking

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.

image2textvideo-understandingvlmreasoning
Python 950
11 天前
https://static.github-zh.com/github_avatars/PKU-YuanGroup?size=40
PKU-YuanGroup / Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

image-understandinglarge-language-modelsvideo-understandingvision-language-model
Python 944
9 个月前
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

cvpr2023foundation-modelself-supervised-learningvideo-understandingaction-detectionaction-recognitiontemporal-action-detection
Python 668
10 个月前
https://static.github-zh.com/github_avatars/yjxiong?size=40
yjxiong / action-detection

temporal action detection with SSN

action-recognitionaction-detectionvideo-understanding
Python 645
6 年前
https://static.github-zh.com/github_avatars/Vision-CAIR?size=40
Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

video-question-answeringvideo-understanding
Python 627
8 个月前
https://static.github-zh.com/github_avatars/cuixing158?size=40
cuixing158 / Awesome-CV-MasterHub

🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works

Awesome Listsimage-captioningimage-classificationimage-denoisingimage-enhancementimage-generationkeypoint-detectionobject-detectionpanoptic-segmentationpose-estimationvideo-generationvideo-understandingvision-transformerpaper-listimage-segmentationlow-level-vision
552
1 天前
https://static.github-zh.com/github_avatars/aws-samples?size=40
aws-samples / swift-chat

#安卓#A lightning-fast, cross-platform AI chat application built with React Native.

AndroidAmazon Web ServicesiOSmacOSReact Nativeimage-generationbedrock-clientmobile-appvideo-understandingswift-chatswiftchatamazon-bedrockamazon-novachatollamaollama-clientdeepseekclaude-4-sonnetgpt-4
TypeScript 545
4 天前
https://static.github-zh.com/github_avatars/henghuiding?size=40
henghuiding / MeViS

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

multimodal-learningreferring-expression-comprehensionreferring-expression-segmentationreferring-video-object-segmentationvideo-understanding
Python 530
1 年前
loading...