GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

evaluation-framework

Website
Wikipedia
https://static.github-zh.com/github_avatars/EleutherAI?size=40
EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

evaluation-frameworklanguage-modeltransformer
Python 9.27 k
4 天前
https://static.github-zh.com/github_avatars/confident-ai?size=40
confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metricsevaluation-frameworkllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 8 k
3 天前
https://static.github-zh.com/github_avatars/promptfoo?size=40
promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

大语言模型prompt-engineeringpromptsllmopsprompt-testingTestingragevaluationevaluation-frameworkllm-evalllm-evaluationllm-evaluation-framework持续集成CI/CDpentestingred-teamingvulnerability-scanners
TypeScript 7.2 k
17 小时前
https://static.github-zh.com/github_avatars/huggingface?size=40
huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluationevaluation-frameworkevaluation-metricshuggingface
Python 1.62 k
2 天前
https://static.github-zh.com/github_avatars/MaurizioFD?size=40
MaurizioFD / RecSys2019_DeepLearning_Evaluation

#计算机科学#This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.

recommender-systemrecommendation-systemrecommendation-algorithms深度学习evaluation-framework神经网络collaborative-filteringcontent-based-recommendationhybrid-recommender-systemreproducibilityreproducible-researchknnmatrix-factorization
Python 986
2 年前
https://static.github-zh.com/github_avatars/relari-ai?size=40
relari-ai / continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

evaluation-frameworkevaluation-metricsinformation-retrievalllm-evaluationllmopsragretrieval-augmented-generation
Python 497
5 个月前
https://static.github-zh.com/github_avatars/ServiceNow?size=40
ServiceNow / AgentLab

#大语言模型#AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.

agentsbenchmarkevaluation-framework大语言模型llm-agentspromptingagentlab
Python 346
3 天前
https://static.github-zh.com/github_avatars/TonicAI?size=40
TonicAI / tonic_validate

#大语言模型#Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metricslarge-language-models大语言模型llmopsragretrieval-augmented-generationevaluation-framework
Python 304
1 个月前
https://static.github-zh.com/github_avatars/athina-ai?size=40
athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluationevaluation-frameworkevaluation-metricsllm-evalllm-evaluationllm-opsllmops
Python 281
9 天前
https://static.github-zh.com/github_avatars/aiverify-foundation?size=40
aiverify-foundation / moonshot

#大语言模型#Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

benchmarkingevaluation-framework大语言模型red-teaming
Python 246
7 天前
https://static.github-zh.com/github_avatars/JinjieNi?size=40
JinjieNi / MixEval

The official evaluation suite and dynamic data release for MixEval.

benchmarkevaluationevaluation-frameworkfoundation-models大语言模型large-language-modelslarge-multimodal-modelsllm-evaluationllm-evaluation-frameworkllm-inference
Python 242
7 个月前
https://static.github-zh.com/github_avatars/diningphil?size=40
diningphil / PyDGN

A research library for automating experiments on Deep Graph Networks

evaluation-framework
Python 223
9 个月前
https://static.github-zh.com/github_avatars/zeno-ml?size=40
zeno-ml / zeno

#计算机科学#AI Data Management & Evaluation Platform

数据科学机器学习Python人工智能evaluationevaluation-framework
Svelte 215
2 年前
https://static.github-zh.com/github_avatars/lartpang?size=40
lartpang / PySODEvalToolkit

PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

Python监控metrics-visualizationsaliencysaliency-detectionsalient-object-detectionLaTeXevaluationevaluation-metricsevaluation-frameworkevaluatorcamouflaged-object-detection
Python 179
9 个月前
https://static.github-zh.com/github_avatars/symflower?size=40
symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

evaluationevaluation-framework大语言模型software-quality软件工程
Go 176
1 个月前
https://static.github-zh.com/github_avatars/bijington?size=40
bijington / expressive

Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.

evaluationevaluation-frameworkParsingcross-platformexpression-evaluatorexpression-parsernetstandardXamarinHacktoberfest
C# 171
8 个月前
https://static.github-zh.com/github_avatars/empirical-run?size=40
empirical-run / empirical

#大语言模型#Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application

evaluation-frameworkllm-inferencellmops大语言模型Test automationTesting
TypeScript 157
10 个月前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / eureka-ml-insights

#大语言模型#A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.

人工智能evaluation-framework大语言模型机器学习mllm
Python 152
3 天前
https://static.github-zh.com/github_avatars/AI21Labs?size=40
AI21Labs / lm-evaluation

Evaluation suite for large-scale language models.

language-modelevaluation-framework
Python 125
4 年前
https://static.github-zh.com/github_avatars/nlp-uoregon?size=40
nlp-uoregon / mlmm-evaluation

#自然语言处理#Multilingual Large Language Models Evaluation Benchmark

数据集evaluationevaluation-frameworklanguage-modellarge-language-modelsmultilingual自然语言处理
Python 122
10 个月前
loading...