prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
Multi-agent workflows with Llama3: A private on-device multi-agent framework