#

speech-representation

s3prl/s3prl
https://static.github-zh.com/github_avatars/s3prl?size=40
Python 2.45 k
3 个月前
https://static.github-zh.com/github_avatars/jishengpeng?size=40

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1.19 k
7 个月前
https://static.github-zh.com/github_avatars/ddlBoJack?size=40

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 947
9 个月前
https://static.github-zh.com/github_avatars/jishengpeng?size=40

A Survey of Spoken Dialogue Models (60 pages)

308
10 个月前
https://static.github-zh.com/github_avatars/Ereboas?size=40

#大语言模型#A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.

Python 98
3 个月前
https://static.github-zh.com/github_avatars/mechanicalsea?size=40

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Python 74
3 年前
https://static.github-zh.com/github_avatars/QiangChunyu?size=40

Ultra-low bitrate speech codec (0.27-1 kbps) with cross-modal alignment and real-time capabilities

Python 64
22 天前
https://static.github-zh.com/github_avatars/gyt1145028706?size=40

This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs. Demos, technical insights and experimental results are presented on

Python 62
2 个月前
https://static.github-zh.com/github_avatars/ryota-komatsu?size=40

音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料

Jupyter Notebook 59
3 个月前
https://static.github-zh.com/github_avatars/vectominist?size=40

A mini, simple, and fast end-to-end automatic speech recognition toolkit.

Jupyter Notebook 53
3 年前
https://static.github-zh.com/github_avatars/bshall?size=40

DUSTED: Spoken-Term Discovery using Discrete Speech Units

Jupyter Notebook 17
1 年前
https://static.github-zh.com/github_avatars/seorim0?size=40

#计算机科学#Causal Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using Self-Supervised Speech Embeddings

Python 17
3 个月前
https://static.github-zh.com/github_avatars/jefflai108?size=40

Semi-supervised spoken language understanding (SLU) via self-supervised speech and language model pretraining

Python 12
4 年前
Website
Wikipedia