The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
#向量搜索引擎#向量相似性搜索库,为稠密向量提供高效相似度搜索和聚类
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Fairseq 是一个Python编写的 Seq2seq 建模工具包,可用于翻译、摘要、语言建模和其他文本生成任务训练自定义模型
Open-Sora: 完全开源的高效复现类Sora视频生成方案
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[CVPR 2024] Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
OCR, layout analysis, reading order, table recognition in 90+ languages
The #1 open-source voice interface for desktop, mobile, and ESP32 chips.
#计算机科学#Unified Training of Universal Time Series Forecasting Transformers
Code of [CVPR 2024] "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling"
High-Fidelity Lip-Syncing with Wav2Lip and Real-ESRGAN
#自然语言处理#DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
DeepSeek-VL: Towards Real-World Vision-Language Understanding
0 条讨论