GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

kv-cache

Website
Wikipedia
https://static.github-zh.com/github_avatars/HDT3213?size=40
HDT3213 / godis

Go 语言实现的 Redis 服务器和分布式集群

kv-cacheGoredis-serverRedisgodisclusterredis-cluster
Go 3.73 k
1 个月前
https://static.github-zh.com/github_avatars/LMCache?size=40
LMCache / LMCache

#大语言模型#Supercharge Your LLM with the Fastest KV Cache Layer

amdCUDAinferencekv-cache大语言模型PyTorchrocmvllmfastspeed
Python 3.72 k
21 小时前
https://static.github-zh.com/github_avatars/MemTensor?size=40
MemTensor / MemOS

#大语言模型#MemOS (Preview) | Intelligence Begins with Memory

agentkv-cachelanguage-model大语言模型loramemoryNeo4jtreellm-memorylong-term-memorymemory-managementragretrieval-augmented-generation
Python 2.08 k
2 天前
https://static.github-zh.com/github_avatars/Zefan-Cai?size=40
Zefan-Cai / KVCache-Factory

#大语言模型#Unified KV Cache Compression Methods for Auto-Regressive Models

kv-cache大语言模型
Python 1.22 k
7 个月前
https://static.github-zh.com/github_avatars/harleyszhang?size=40
harleyszhang / llm_note

#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

大语言模型llm-inferencevllmcuda-programmingkv-cachetransformer-models
Python 805
15 小时前
https://static.github-zh.com/github_avatars/therealoliver?size=40
therealoliver / Deepdive-llama3-from-scratch

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

inferencekv-cachellama大语言模型attentionattention-mechanismgptlanguage-modelmaskParsingtransformer
Jupyter Notebook 602
5 个月前
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / kvpress

#大语言模型#LLM KV cache compression made easy

大语言模型inferencekv-cachelong-contextPythonPyTorchtransformerslarge-language-models
Python 560
2 天前
https://static.github-zh.com/github_avatars/FMInference?size=40
FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

gpt-3high-throughputkv-cachelarge-language-modelssparsity
Python 462
1 年前
https://static.github-zh.com/github_avatars/dipampaul17?size=40
dipampaul17 / KVSplit

#大语言模型#Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality lo...

apple-silicongenerative-aikv-cachellama-cpp大语言模型m1m3memory-optimizationmetaloptimizationquantization
Python 356
2 个月前
https://static.github-zh.com/github_avatars/Zefan-Cai?size=40
Zefan-Cai / Awesome-LLM-KV-Cache

#大语言模型#Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

kv-cache大语言模型
342
5 个月前
https://static.github-zh.com/github_avatars/raymin0223?size=40
raymin0223 / mixture_of_recursions

#大语言模型#Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

kv-cache大语言模型router
Python 332
3 天前
https://static.github-zh.com/github_avatars/NVIDIA-Merlin?size=40
NVIDIA-Merlin / HierarchicalKV

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on hig...

CUDAgpuhashtablerecommender-systemkey-value-storekv-cache
Cuda 163
2 天前
https://static.github-zh.com/github_avatars/itsnamgyu?size=40
itsnamgyu / block-transformer

#大语言模型#Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

kv-cache大语言模型llm-inference
Python 160
4 个月前
https://static.github-zh.com/github_avatars/kddubey?size=40
kddubey / cappr

Completion After Prompt Probability. Make your LLM make a choice

text-classificationzero-shothuggingfaceprompt-engineeringllamacppprobabilityllm-inferencekv-cache
Python 80
9 个月前
https://static.github-zh.com/github_avatars/aju22?size=40
aju22 / LLaMA2

#自然语言处理#This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...

attentiongptkv-cachellamallama2大语言模型自然语言处理transformer
Python 69
2 年前
https://static.github-zh.com/github_avatars/hkproj?size=40
hkproj / pytorch-llama-notes

Notes about LLaMA 2 model

attention-is-all-you-needkv-cachellama2study-notes
Python 66
2 年前
https://static.github-zh.com/github_avatars/DRSY?size=40
DRSY / EasyKV

#大语言模型#Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

kv-cache大语言模型
Python 63
1 年前
https://static.github-zh.com/github_avatars/MaxBelitsky?size=40
MaxBelitsky / cache-steering

#大语言模型#KV Cache Steering for Inducing Reasoning in Small Language Models

kv-cachelarge-language-models大语言模型reasoning
Python 35
8 天前
https://static.github-zh.com/github_avatars/NoakLiu?size=40
NoakLiu / PiKV

PiKV: MoE KV Cache Management System [Efficient ML System]

moeparallel-computingkv-cachemanagement-systemmixture-of-experts
Python 14
1 个月前
https://static.github-zh.com/github_avatars/DongmingShenDS?size=40
DongmingShenDS / Mistral_From_Scratch

Mistral and Mixtral (MoE) from scratch

kv-cachelarge-language-modelsmistral-7bmixtral-8x7bmixture-of-experts
Python 9
1 年前
loading...