DeepEval 是大语言模型评估框架,专为评估和测试大语言模型系统而设计。它类似于 Pytest,但专注于对 LLM 输出进行单元测试。
The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: benchmarks, methods, evaluations, models etc. are easily extens...
#大语言模型#LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
#大语言模型#[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Run a prompt against all, or some, of your models running on Ollama. Creates web pages with the output, performance statistics and model info. All in a single Bash shell script.
#大语言模型#Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
#数据仓库#Measure of estimated confidence for non-hallucinative nature of outputs generated by Large Language Models.
This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.
#大语言模型#Tools for systematic large language model evaluations
In this we evaluate the LLM responses and find accuracy