#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, Int...
Megatron was a telegram file management bot that helped a lot of users, specially movie channel managers to upload their files to telegram by just providing a link to it. The project initially started...
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
#大语言模型#A LLaMA1/LLaMA12 Megatron implement.
#计算机科学#Tiny-Megatron, a minimalistic re-implementation of the Megatron library
Megatron was a telegram file management bot that helped a lot of users, specially movie channel managers to upload their files to telegram by just providing a link to it. The project initially started...
#大语言模型#Wrapped Megatron: As User-Friendly as HuggingFace, As Powerful as Megatron-LM | Megatron封装:和HuggingFace一样方便,和Megatron-LM一样强大