prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
#计算机科学#[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Multi-agent workflows with Llama3: A private on-device multi-agent framework