An innovative library for efficient LLM inference via low-bit quantization
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
#大语言模型#JAX Scalify: end-to-end scaled arithmetics
#计算机科学#A modular, accelerator‑ready ML framework that speaks float8/16/32/64, imports models via ONNX, and trains Transformer‑class networks entirely in Go.