#

ml-efficiency

https://static.github-zh.com/github_avatars/stsxxx?size=40

MoDM is a cache-aware, hybrid serving system that accelerates image generation by dynamically combining small and large diffusion models for efficient, high-quality output.

Python 3
2 个月前
https://static.github-zh.com/github_avatars/MyDarapy?size=40

#大语言模型#(Unofficial) building Hugging Face SmolLM-blazingly fast and small language model with PyTorch implementation of grouped query attention (GQA)

Python 1
9 个月前
Website
Wikipedia