GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

llm-evaluation-framework

Website
Wikipedia
https://static.github-zh.com/github_avatars/confident-ai?size=40
confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metricsevaluation-frameworkllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 8 k
3 天前
https://static.github-zh.com/github_avatars/promptfoo?size=40
promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

大语言模型prompt-engineeringpromptsllmopsprompt-testingTestingragevaluationevaluation-frameworkllm-evalllm-evaluationllm-evaluation-framework持续集成CI/CDpentestingred-teamingvulnerability-scanners
TypeScript 7.2 k
17 小时前
msoedov/agentic_security
https://static.github-zh.com/github_avatars/msoedov?size=40
msoedov / agentic_security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

llm-securityai-red-teamllm-evaluationllm-evaluation-frameworkprompt-testingagent-framework
Python 1.47 k
5 天前
https://static.github-zh.com/github_avatars/JinjieNi?size=40
JinjieNi / MixEval

The official evaluation suite and dynamic data release for MixEval.

benchmarkevaluationevaluation-frameworkfoundation-models大语言模型large-language-modelslarge-multimodal-modelsllm-evaluationllm-evaluation-frameworkllm-inference
Python 242
7 个月前
https://static.github-zh.com/github_avatars/cvs-health?size=40
cvs-health / langfair

#大语言模型#LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

人工智能biasbias-detectionfairnessfairness-aifairness-mlfairness-testinglarge-language-models大语言模型responsible-aiPythonai-safetyllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 215
4 天前
https://static.github-zh.com/github_avatars/parea-ai?size=40
parea-ai / parea-sdk-py

#大语言模型#Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

大语言模型llm-evaluationllm-toolsllmopsllm-evalllm-evaluation-frameworkprompt-engineeringgenerative-aigood-first-issue监控
Python 78
4 个月前
https://static.github-zh.com/github_avatars/Addepto?size=40
Addepto / contextcheck

#大语言模型# MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

大语言模型llm-evaluationragTestingchatbot-frameworkOpen Sourceai-chatai-testing-toollarge-language-models持续集成llm-evaluation-framework
Python 74
6 个月前
https://static.github-zh.com/github_avatars/multinear?size=40
multinear / multinear

#大语言模型#Develop reliable AI apps

evaluation大语言模型reliabilityllm-evalllm-evaluationllm-evaluation-framework
Svelte 39
2 个月前
https://static.github-zh.com/github_avatars/flexpa?size=40
flexpa / llm-fhir-eval

#大语言模型#Benchmarking Large Language Models for FHIR

evalsfhir大语言模型llm-evaluation-framework
TypeScript 37
16 天前
https://static.github-zh.com/github_avatars/zhuohaoyu?size=40
zhuohaoyu / KIEval

#大语言模型#[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

explainable-ai大语言模型llm-evaluationllm-evaluation-frameworkllm-evaluation-metrics机器学习
Python 36
1 年前
https://static.github-zh.com/github_avatars/aws-samples?size=40
aws-samples / fm-leaderboarder

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

llm-evaluationllm-evaluation-framework
Python 18
8 个月前
https://static.github-zh.com/github_avatars/honeyhiveai?size=40
honeyhiveai / realign

Realign is a testing and simulation framework for AI applications.

人工智能alignmentevaluation大语言模型prompt-engineeringred-teamingSimulationllm-evalllm-evaluationllm-evaluation-frameworkllmopsrag
Python 16
6 个月前
https://static.github-zh.com/github_avatars/Networks-Learning?size=40
Networks-Learning / prediction-powered-ranking

Code for "Prediction-Powered Ranking of Large Language Models", NeurIPS 2024.

llm-evalllm-evaluationllm-evaluation-frameworkranking-algorithm
Jupyter Notebook 9
8 个月前
https://static.github-zh.com/github_avatars/pyladiesams?size=40
pyladiesams / eval-llm-based-apps-jan2025

#大语言模型#Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

大语言模型llmopsworkshopllm-evalllm-evaluation-frameworkllm-evaluation-metricsllm-monitoring
Jupyter Notebook 7
1 个月前
https://static.github-zh.com/github_avatars/yukinagae?size=40
yukinagae / genkitx-promptfoo

#大语言模型#Community Plugin for Genkit to use Promptfoo

人工智能evaluationevaluation-frameworkFirebasegenkit大语言模型llm-evalllm-evaluationllm-evaluation-frameworkllmops插件promptprompt-testingpromptsTesting
TypeScript 4
5 个月前
https://static.github-zh.com/github_avatars/parea-ai?size=40
parea-ai / parea-sdk-ts

#大语言模型#TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

大语言模型llm-evaluationllm-evaluation-frameworkllm-toolsllm-evalprompt-engineering
TypeScript 4
5 个月前
https://static.github-zh.com/github_avatars/stair-lab?size=40
stair-lab / melt

Multilingual Evaluation Toolkits

llm-evaluation-frameworkmultilingual
Python 4
7 个月前
https://static.github-zh.com/github_avatars/yuzu-ai?size=40
yuzu-ai / ShinRakuda

#大语言模型#Shin Rakuda is a comprehensive framework for evaluating and benchmarking Japanese large language models, offering researchers and developers a flexible toolkit for assessing LLM performance across div...

大语言模型llm-evalllm-evaluationllm-evaluation-frameworkjapanese
Python 3
9 个月前
https://static.github-zh.com/github_avatars/ronniross?size=40
ronniross / llm-confidence-scorer

#数据仓库#A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

大语言模型llm-evaluationllm-evaluation-frameworkllm-evaluation-metricsllm-trainingdataset数据集
Python 2
21 天前
https://static.github-zh.com/github_avatars/jaaack-wang?size=40
jaaack-wang / multi-problem-eval-llm

#大语言模型#Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities

explainable-ailarge-language-models大语言模型llm-evalllm-evaluation-frameworkllm-prompting
Jupyter Notebook 2
1 年前
loading...