#计算机科学#OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
#Awesome#A curated list of action recognition and related area resources
#大语言模型#[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
An open-source toolbox for action understanding based on PyTorch
基于模块化的设计,提供丰富的视频算法实现、产业级的视频算法优化与应用,包括安防、体育、互联网、媒体等行业的动作定位与识别、行为分析、智能封面、视频标注、视频打标签等,涵盖动作识别与视频分类、动作定位、动作检测、多模态文本视频检索等技术。
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Code & Models for Temporal Segment Networks (TSN) in ECCV 2016
SALMONN family: A suite of advanced multi-modal LLMs
#自然语言处理#awesome grounding: A curated list of research papers in visual grounding
#计算机科学#Temporal Segment Networks (TSN) in PyTorch
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
temporal action detection with SSN
🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
#数据仓库#A collection of recent video understanding datasets, under construction!