GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

flash-attention

Website
Wikipedia
https://static.github-zh.com/github_avatars/QwenLM?size=40
QwenLM / Qwen

#自然语言处理#通义千问-7B(Qwen-7B) 是阿里云研发的通义千问大模型系列的70亿参数规模的模型

中文large-language-models自然语言处理flash-attention大语言模型pretrained-models
Python 18.89 k
8 天前
ymcui/Chinese-LLaMA-Alpaca-2
https://static.github-zh.com/github_avatars/ymcui?size=40
ymcui / Chinese-LLaMA-Alpaca-2

#自然语言处理#中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

alpacallama大语言模型llama-2large-language-models自然语言处理alpaca-2flash-attentionllama2alpaca2Yarnrlhf
Python 7.17 k
16 天前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternLM

#大语言模型#Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

聊天机器人gpt大语言模型long-contextrlhffine-tuning-llm中文flash-attentionpretrained-models
Python 7.01 k
7 天前
xlite-dev/LeetCUDA
https://static.github-zh.com/github_avatars/xlite-dev?size=40
xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

CUDAcuda-kernelsflash-attentioncuda-librarycuda-cpp
Cuda 5.79 k
1 天前
xlite-dev/Awesome-LLM-Inference
https://static.github-zh.com/github_avatars/xlite-dev?size=40
xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

flash-attentiontensorrt-llmvllmllm-inferencedeepseekdeepseek-v3deepseek-r1qwen3
Python 4.32 k
1 天前
https://static.github-zh.com/github_avatars/MoonshotAI?size=40
MoonshotAI / MoBA

#大语言模型#MoBA: Mixture of Block Attention for Long-Context LLMs

flash-attention大语言模型llm-servingllm-trainingmoePyTorchtransformer
Python 1.85 k
4 个月前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

gemmainternlminternlm2llama3llavallm-frameworkllm-trainingmulti-modalpipeline-parallelismflash-attentionPyTorch
Python 400
9 天前
https://static.github-zh.com/github_avatars/DAMO-NLP-SG?size=40
DAMO-NLP-SG / Inf-CLIP

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training schem...

contrastive-learningflash-attentionmemory-efficientclip
Python 262
6 个月前
https://static.github-zh.com/github_avatars/xlite-dev?size=40
xlite-dev / ffpa-attn

⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉

attentionCUDAflash-attentionmlsysdeepseekdeepseek-r1deepseek-v3
Cuda 193
3 个月前
https://static.github-zh.com/github_avatars/alexzhang13?size=40
alexzhang13 / flashattention2-custom-mask

#计算机科学#Triton implementation of FlashAttention2 that adds Custom Masks.

attentionattention-mechanismcuda-kernels深度学习flash-attentiontriton
Python 127
1 年前
https://static.github-zh.com/github_avatars/CoinCheung?size=40
CoinCheung / gdGPT

#自然语言处理#Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

deepspeed大语言模型pipeline自然语言处理PyTorchbloomflash-attentionbaichuan2-7bmixtral-8x7bllama2
Python 97
1 年前
https://static.github-zh.com/github_avatars/Bruce-Lee-LY?size=40
Bruce-Lee-LY / decoding_attention

#大语言模型#Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

CUDAgpuinference大语言模型mhaNvidiaflash-attention
C++ 40
2 个月前
https://static.github-zh.com/github_avatars/Bruce-Lee-LY?size=40
Bruce-Lee-LY / flash_attention_inference

#大语言模型#Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

CUDAflash-attentiongpuinference大语言模型Nvidiacutlassmha
C++ 39
5 个月前
https://static.github-zh.com/github_avatars/kklemon?size=40
kklemon / FlashPerceiver

#自然语言处理#Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

attention-mechanism深度学习flash-attention自然语言处理transformer
Python 26
9 个月前
https://static.github-zh.com/github_avatars/RulinShao?size=40
RulinShao / FastCkpt

Python package for rematerialization-aware gradient checkpointing

flash-attention
Python 25
2 年前
https://static.github-zh.com/github_avatars/erfanzar?size=40
erfanzar / jax-flash-attn2

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

flash-attentionjax
Python 24
5 个月前
https://static.github-zh.com/github_avatars/Naman-ntc?size=40
Naman-ntc / FastCode

Utilities for efficient fine-tuning, inference and evaluation of code generation models

code-generationefficientfinetuninginferencetransformersflash-attention
Python 21
2 年前
https://static.github-zh.com/github_avatars/kyegomez?size=40
kyegomez / FlashMHA

An simple pytorch implementation of Flash MultiHead Attention

人工智能artificial-neural-networksattentionattention-mechanismsgpt4transformerflash-attention
Jupyter Notebook 20
1 年前
https://static.github-zh.com/github_avatars/pxl-th?size=40
pxl-th / NNop.jl

Flash Attention & friends in pure Julia

gpgpugpuJulia 语言amdgpuCUDAflash-attention
Julia 11
2 个月前
https://static.github-zh.com/github_avatars/AI-DarwinLabs?size=40
AI-DarwinLabs / amd-mi300-ml-stack

#计算机科学#🚀 Automated deployment stack for AMD MI300 GPUs with optimized ML/DL frameworks and HPC-ready configurations

conda深度学习deepspeedflash-attentiongpu-computinghpc机器学习slurmrocm
Shell 11
8 个月前
loading...