GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

flash-attention

Website
Wikipedia
https://static.github-zh.com/github_avatars/QwenLM?size=40
QwenLM / Qwen

#自然语言处理#通义千问-7B(Qwen-7B) 是阿里云研发的通义千问大模型系列的70亿参数规模的模型

中文large-language-models自然语言处理flash-attention大语言模型pretrained-models
Python 18.49 k
2 个月前
ymcui/Chinese-LLaMA-Alpaca-2
https://static.github-zh.com/github_avatars/ymcui?size=40
ymcui / Chinese-LLaMA-Alpaca-2

#自然语言处理#中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

alpacallama大语言模型llama-2large-language-models自然语言处理alpaca-2flash-attentionllama2alpaca2Yarnrlhf
Python 7.16 k
9 个月前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternLM

#大语言模型#Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

聊天机器人gpt大语言模型long-contextrlhffine-tuning-llm中文flash-attentionpretrained-models
Python 6.94 k
4 个月前
xlite-dev/LeetCUDA
https://static.github-zh.com/github_avatars/xlite-dev?size=40
xlite-dev / LeetCUDA

📚LeetCUDA: 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

CUDAcuda-kernelsflash-attentioncuda-librarycuda-cpp
Cuda 4.76 k
5 天前
xlite-dev/Awesome-LLM-Inference
https://static.github-zh.com/github_avatars/xlite-dev?size=40
xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM Inference Papers with Codes.

flash-attentiontensorrt-llmvllmllm-inferencedeepseekdeepseek-v3deepseek-r1qwen3
Python 4.12 k
7 天前
https://static.github-zh.com/github_avatars/MoonshotAI?size=40
MoonshotAI / MoBA

#大语言模型#MoBA: Mixture of Block Attention for Long-Context LLMs

flash-attention大语言模型llm-servingllm-trainingmoePyTorchtransformer
Python 1.8 k
2 个月前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

gemmainternlminternlm2llama3llavallm-frameworkllm-trainingmulti-modalpipeline-parallelismflash-attentionPyTorch
Python 392
4 天前
https://static.github-zh.com/github_avatars/DAMO-NLP-SG?size=40
DAMO-NLP-SG / Inf-CLIP

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training schem...

contrastive-learningflash-attentionmemory-efficientclip
Python 250
5 个月前
https://static.github-zh.com/github_avatars/xlite-dev?size=40
xlite-dev / ffpa-attn

📚FFPA: Extend FA-2 with Split-D for large headdim, 2x↑ vs SDPA.

attentionCUDAflash-attentionmlsysdeepseekdeepseek-r1deepseek-v3
Cuda 186
1 个月前
https://static.github-zh.com/github_avatars/alexzhang13?size=40
alexzhang13 / flashattention2-custom-mask

#计算机科学#Triton implementation of FlashAttention2 that adds Custom Masks.

attentionattention-mechanismcuda-kernels深度学习flash-attentiontriton
Python 119
10 个月前
https://static.github-zh.com/github_avatars/CoinCheung?size=40
CoinCheung / gdGPT

#自然语言处理#Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

deepspeed大语言模型pipeline自然语言处理PyTorchbloomflash-attentionbaichuan2-7bmixtral-8x7bllama2
Python 96
1 年前
https://static.github-zh.com/github_avatars/Bruce-Lee-LY?size=40
Bruce-Lee-LY / flash_attention_inference

#大语言模型#Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

CUDAflash-attentiongpuinference大语言模型Nvidiacutlassmha
C++ 38
4 个月前
https://static.github-zh.com/github_avatars/Bruce-Lee-LY?size=40
Bruce-Lee-LY / decoding_attention

#大语言模型#Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

CUDAgpuinference大语言模型mhaNvidiaflash-attention
C++ 37
4 天前
https://static.github-zh.com/github_avatars/kklemon?size=40
kklemon / FlashPerceiver

#自然语言处理#Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

attention-mechanism深度学习flash-attention自然语言处理transformer
Python 26
7 个月前
https://static.github-zh.com/github_avatars/RulinShao?size=40
RulinShao / FastCkpt

Python package for rematerialization-aware gradient checkpointing

flash-attention
Python 25
2 年前
https://static.github-zh.com/github_avatars/erfanzar?size=40
erfanzar / jax-flash-attn2

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

flash-attentionjax
Python 24
3 个月前
https://static.github-zh.com/github_avatars/Naman-ntc?size=40
Naman-ntc / FastCode

Utilities for efficient fine-tuning, inference and evaluation of code generation models

code-generationefficientfinetuninginferencetransformersflash-attention
Python 21
2 年前
https://static.github-zh.com/github_avatars/kyegomez?size=40
kyegomez / FlashMHA

An simple pytorch implementation of Flash MultiHead Attention

人工智能artificial-neural-networksattentionattention-mechanismsgpt4transformerflash-attention
Jupyter Notebook 21
1 年前
https://static.github-zh.com/github_avatars/AI-DarwinLabs?size=40
AI-DarwinLabs / amd-mi300-ml-stack

#计算机科学#🚀 Automated deployment stack for AMD MI300 GPUs with optimized ML/DL frameworks and HPC-ready configurations

conda深度学习deepspeedflash-attentiongpu-computinghpc机器学习slurmrocm
Shell 11
7 个月前
https://static.github-zh.com/github_avatars/pxl-th?size=40
pxl-th / NNop.jl

Flash Attention & friends in pure Julia

gpgpugpuJulia 语言amdgpuCUDAflash-attention
Julia 10
1 个月前
loading...