SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
#大语言模型#Large Language Models for All, 🦙 Cult and More, Stay in touch !
#大语言模型#Run any Large Language Model behind a unified API
#大语言模型#🪶 Lightweight OpenAI drop-in replacement for Kubernetes
#大语言模型#A guide about how to use GPTQ models with langchain
#大语言模型#An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API
#大语言模型#Run gguf LLM models in Latest Version TextGen-webui and koboldcpp
Private self-improvement coaching with open-source LLMs
#大语言模型#ChatSakura:Open-source multilingual conversational model.(开源多语言对话大模型)
#大语言模型#This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for ...
#大语言模型#A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system
#大语言模型#This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.
#自然语言处理#Code for NAACL paper When Quantization Affects Confidence of Large Language Models?
#自然语言处理#Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.
#大语言模型#LLM quantization techniques: absmax, zero-point, GPTQ and GGUF