FlashInfer: Kernel Library for LLM Serving
2023-07-22
否
2025-09-10T06:10:38Z
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Open-Sora: 完全开源的高效复现类Sora视频生成方案
#大语言模型#Code examples and resources for DBRX, a large language model developed by Databricks
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. D...
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Training LLMs with QLoRA + FSDP
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Fast and memory-efficient exact attention
Flash Attention in ~100 lines of CUDA (forward pass only)
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-L...
0 条讨论