GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

llm-evaluation-metrics

Website
Wikipedia
https://static.github-zh.com/github_avatars/confident-ai?size=40
confident-ai / deepeval

DeepEval 是大语言模型评估框架,专为评估和测试大语言模型系统而设计。它类似于 Pytest,但专注于对 LLM 输出进行单元测试。

evaluation-metricsevaluation-frameworkllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 9.66 k
1 天前
https://static.github-zh.com/github_avatars/locuslab?size=40
locuslab / open-unlearning

The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: benchmarks, methods, evaluations, models etc. are easily extens...

privacy-protectionbenchmarksllm-evaluation-metrics大语言模型Open Source
Python 334
12 天前
https://static.github-zh.com/github_avatars/cvs-health?size=40
cvs-health / langfair

#大语言模型#LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

人工智能biasbias-detectionfairnessfairness-aifairness-mlfairness-testinglarge-language-models大语言模型responsible-aiPythonai-safetyllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 222
3 天前
https://static.github-zh.com/github_avatars/zhuohaoyu?size=40
zhuohaoyu / KIEval

#大语言模型#[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

explainable-ai大语言模型llm-evaluationllm-evaluation-frameworkllm-evaluation-metrics机器学习
Python 37
1 年前
https://static.github-zh.com/github_avatars/attogram?size=40
attogram / ollama-multirun

Run a prompt against all, or some, of your models running on Ollama. Creates web pages with the output, performance statistics and model info. All in a single Bash shell script.

人工智能ollamallm-evaluationollama-interfaceBashollama-appllm-evaluation-metricsllm-evalstatic-site-generator
Shell 8
5 天前
https://static.github-zh.com/github_avatars/pyladiesams?size=40
pyladiesams / eval-llm-based-apps-jan2025

#大语言模型#Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

大语言模型llmopsworkshopllm-evalllm-evaluation-frameworkllm-evaluation-metricsllm-monitoring
Jupyter Notebook 7
3 个月前
https://static.github-zh.com/github_avatars/ronniross?size=40
ronniross / llm-confidence-scorer

#数据仓库#Measure of estimated confidence for non-hallucinative nature of outputs generated by Large Language Models.

大语言模型llm-evaluationllm-evaluation-frameworkllm-evaluation-metricsllm-trainingdataset数据集
Python 3
12 天前
https://static.github-zh.com/github_avatars/ritwickbhargav80?size=40
ritwickbhargav80 / quick-llm-model-evaluations

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

llm-evaluation-metrics大语言模型retrieval-augmented-generationStreamlit
Python 0
1 年前
https://static.github-zh.com/github_avatars/nhsengland?size=40
nhsengland / evalsense

#大语言模型#Tools for systematic large language model evaluations

llm-evaluationllm-evaluation-metrics大语言模型evaluation-frameworkevaluation-metricsllm-evaluation-framework
Python 0
14 天前
https://static.github-zh.com/github_avatars/Pavansomisetty21?size=40
Pavansomisetty21 / GEval-Metrics-Analyzing-the-Reliability-of-LLM-Responses

In this we evaluate the LLM responses and find accuracy

llm-evaluation-metrics
Python 0
24 天前