GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

kv-cache

Website
Wikipedia
https://static.github-zh.com/github_avatars/HDT3213?size=40
HDT3213 / godis

Go 语言实现的 Redis 服务器和分布式集群

kv-cacheGoredis-serverRedisgodisclusterredis-cluster
Go 3.7 k
14 天前
https://static.github-zh.com/github_avatars/Zefan-Cai?size=40
Zefan-Cai / KVCache-Factory

#大语言模型#Unified KV Cache Compression Methods for Auto-Regressive Models

kv-cache大语言模型
Python 1.12 k
5 个月前
https://static.github-zh.com/github_avatars/harleyszhang?size=40
harleyszhang / llm_note

#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

大语言模型llm-inferencevllmcuda-programmingkv-cachetransformer-models
Python 778
5 天前
https://static.github-zh.com/github_avatars/therealoliver?size=40
therealoliver / Deepdive-llama3-from-scratch

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

inferencekv-cachellama大语言模型attentionattention-mechanismgptlanguage-modelmaskParsingtransformer
Jupyter Notebook 586
4 个月前
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / kvpress

#大语言模型#LLM KV cache compression made easy

大语言模型inferencekv-cachelong-contextPythonPyTorchtransformerslarge-language-models
Python 499
6 天前
https://static.github-zh.com/github_avatars/FMInference?size=40
FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

gpt-3high-throughputkv-cachelarge-language-modelssparsity
Python 449
10 个月前
https://static.github-zh.com/github_avatars/dipampaul17?size=40
dipampaul17 / KVSplit

#大语言模型#Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality lo...

apple-silicongenerative-aikv-cachellama-cpp大语言模型m1m3memory-optimizationmetaloptimizationquantization
Python 351
1 个月前
https://static.github-zh.com/github_avatars/Zefan-Cai?size=40
Zefan-Cai / Awesome-LLM-KV-Cache

#大语言模型#Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

kv-cache大语言模型
313
3 个月前
https://static.github-zh.com/github_avatars/itsnamgyu?size=40
itsnamgyu / block-transformer

#大语言模型#Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

kv-cache大语言模型llm-inference
Python 157
2 个月前
https://static.github-zh.com/github_avatars/NVIDIA-Merlin?size=40
NVIDIA-Merlin / HierarchicalKV

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on hig...

CUDAgpuhashtablerecommender-systemkey-value-storekv-cache
Cuda 149
22 天前
https://static.github-zh.com/github_avatars/kddubey?size=40
kddubey / cappr

Completion After Prompt Probability. Make your LLM make a choice

text-classificationzero-shothuggingfaceprompt-engineeringllamacppprobabilityllm-inferencekv-cache
Python 78
7 个月前
https://static.github-zh.com/github_avatars/aju22?size=40
aju22 / LLaMA2

#自然语言处理#This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...

attentiongptkv-cachellamallama2大语言模型自然语言处理transformer
Python 68
2 年前
https://static.github-zh.com/github_avatars/DRSY?size=40
DRSY / EasyKV

#大语言模型#Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

kv-cache大语言模型
Python 63
1 年前
https://static.github-zh.com/github_avatars/hkproj?size=40
hkproj / pytorch-llama-notes

Notes about LLaMA 2 model

attention-is-all-you-needkv-cachellama2study-notes
Python 61
2 年前
https://static.github-zh.com/github_avatars/DongmingShenDS?size=40
DongmingShenDS / Mistral_From_Scratch

Mistral and Mixtral (MoE) from scratch

kv-cachelarge-language-modelsmistral-7bmixtral-8x7bmixture-of-experts
Python 7
1 年前
https://static.github-zh.com/github_avatars/mehdihosseinimoghadam?size=40
mehdihosseinimoghadam / AVA-Mistral-7B

#自然语言处理#Fine-Tuned Mistral 7B Persian Large Language Model LLM / Persian Mistral 7B

ava深度学习kv-cachelarge-language-models大语言模型mistralmistral-7b自然语言处理
Jupyter Notebook 6
3 个月前
https://static.github-zh.com/github_avatars/reshalfahsi?size=40
reshalfahsi / image-captioning-mobilenet-llama3

#自然语言处理#Image Captioning With MobileNet-LLaMA 3

image-captioningllama3mobilenetv3PyTorchpytorch-lightningkv-cachecnntransformer自然语言处理
Jupyter Notebook 5
1 年前
https://static.github-zh.com/github_avatars/s-chh?size=40
s-chh / PyTorch-Scratch-LLM

#大语言模型#Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (R...

大语言模型mixture-of-expertsmoekv-cache
Python 4
7 个月前
https://static.github-zh.com/github_avatars/glisses?size=40
glisses / Efficient-Effective-KV-Cache-Replacement-Policy-for-LLMs

#大语言模型#SCAC strategy for efficient and effective KV cache eviction in LLMs

kv-cache大语言模型
Python 2
3 个月前
https://static.github-zh.com/github_avatars/jaameypr?size=40
jaameypr / keyvalue-caching

Java-based caching solution designed to temporarily store key-value pairs with a specified time-to-live (TTL) duration.

cachingJavajava-17keyvaluekeyvaluestorekv-cacheMaven
Java 2
5 个月前
loading...