SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM. Export your models effortlessly to autogpt...
#大语言模型#Large Language Models for All, 🦙 Cult and More, Stay in touch !
#大语言模型#Run any Large Language Model behind a unified API
#大语言模型#🪶 Lightweight OpenAI drop-in replacement for Kubernetes
#大语言模型#A guide about how to use GPTQ models with langchain
Private self-improvement coaching with open-source LLMs
#大语言模型#An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API
#大语言模型#ChatSakura:Open-source multilingual conversational model.(开源多语言对话大模型)
#大语言模型#Run gguf LLM models in Latest Version TextGen-webui
#大语言模型#This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for ...
#大语言模型#A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system
#大语言模型#This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.
#自然语言处理#Code for NAACL paper When Quantization Affects Confidence of Large Language Models?
#自然语言处理#Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.