GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

evaluation-metrics

Website
Wikipedia
https://static.github-zh.com/github_avatars/confident-ai?size=40
confident-ai / deepeval

DeepEval 是大语言模型评估框架,专为评估和测试大语言模型系统而设计。它类似于 Pytest,但专注于对 LLM 输出进行单元测试。

evaluation-metricsevaluation-frameworkllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 9.66 k
5 小时前
https://static.github-zh.com/github_avatars/AgentOps-AI?size=40
AgentOps-AI / agentops

#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and Ca...

agentagentops人工智能evalsevaluation-metrics大语言模型anthropicautogencost-estimationcrewaigroqlangchainmistralollamaopenaiagents-sdkopenai-agents
Python 4.72 k
1 天前
https://static.github-zh.com/github_avatars/datawhalechina?size=40
datawhalechina / tiny-universe

《大模型白盒子构建指南》:一个全手搓的Tiny-Universe

ragagentdiffusionevaluation-metricsllamaqwentransformers
Jupyter Notebook 3.42 k
3 个月前
https://static.github-zh.com/github_avatars/xinshuoweng?size=40
xinshuoweng / AB3DMOT

#计算机科学#(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

机器视觉机器学习Roboticstracking3d-trackingmulti-object-trackingreal-timeevaluation-metricsevaluation3d-multi-object-trackingkitti
Python 1.77 k
1 年前
https://static.github-zh.com/github_avatars/huggingface?size=40
huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluationevaluation-frameworkevaluation-metricshuggingface
Python 1.76 k
3 天前
https://static.github-zh.com/github_avatars/huggingface?size=40
huggingface / evaluation-guidebook

#大语言模型#Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

evaluationevaluation-metricsguidebooklarge-language-models大语言模型机器学习教程
Jupyter Notebook 1.49 k
7 个月前
https://static.github-zh.com/github_avatars/google-research?size=40
google-research / rliable

#计算机科学#[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

reinforcement-learningbenchmarkingevaluation-metrics机器学习Googlerl
Jupyter Notebook 837
1 年前
https://static.github-zh.com/github_avatars/MIND-Lab?size=40
MIND-Lab / OCTIS

#自然语言处理#OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

topic-modelingevaluation-metrics自然语言处理bayesian-optimizationhyperparameter-optimizationhyperparameter-tuninghyperparameter-searchtopic-modelsnlprocnlp-library
Python 775
1 年前
https://static.github-zh.com/github_avatars/jitsi?size=40
jitsi / jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

automatic-speech-recognitionPythonspeech-to-textevaluation-metrics
Python 762
5 个月前
https://static.github-zh.com/github_avatars/Unbabel?size=40
Unbabel / COMET

#自然语言处理# A Neural Framework for MT Evaluation

machine-translationevaluation-metrics自然语言处理机器学习人工智能
Python 637
1 个月前
https://static.github-zh.com/github_avatars/nekhtiari?size=40
nekhtiari / image-similarity-measures

#计算机科学#📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

Image监控机器学习evaluation-metricsp1processing
Python 620
1 年前
https://static.github-zh.com/github_avatars/AmenRa?size=40
AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

numbaPythonevaluationevaluation-metricsinformation-retrievalrecommender-systemsmetasearchcomparison
Python 579
9 天前
https://static.github-zh.com/github_avatars/relari-ai?size=40
relari-ai / continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

evaluation-frameworkevaluation-metricsinformation-retrievalllm-evaluationllmopsragretrieval-augmented-generation
Python 501
6 个月前
https://static.github-zh.com/github_avatars/proycon?size=40
proycon / pynlpl

#自然语言处理#PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...

自然语言处理Pythoncomputational-linguisticsLibraryfolia机器学习search-algorithmsevaluation-metricstext-processingnlp-library
Python 476
2 年前
https://static.github-zh.com/github_avatars/JokerJohn?size=40
JokerJohn / Cloud_Map_Evaluation

[RAL' 25 & IROS‘ 25] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.

open3dslamevaluation-metricslidar-point-cloudRobotics
C++ 385
16 天前
https://static.github-zh.com/github_avatars/v-iashin?size=40
v-iashin / SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

transformervqvaeGenerative Adversarial NetworkPyTorchaudio-generationmelganmulti-modalvideo-understandingevaluation-metricsaudioVideo
Jupyter Notebook 363
1 年前
https://static.github-zh.com/github_avatars/ziqihuangg?size=40
ziqihuangg / Awesome-Evaluation-of-Visual-Generation

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

Awesome Listsbenchmarkevaluationevaluation-metricsgenerative-modelsimage-generationvideo-generation
330
6 天前
https://static.github-zh.com/github_avatars/TonicAI?size=40
TonicAI / tonic_validate

#大语言模型#Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metricslarge-language-models大语言模型llmopsragretrieval-augmented-generationevaluation-framework
Python 315
21 天前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / factCC

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

evaluation-metrics
Python 303
3 个月前
https://static.github-zh.com/github_avatars/athina-ai?size=40
athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluationevaluation-frameworkevaluation-metricsllm-evalllm-evaluationllm-opsllmops
Python 289
2 个月前
loading...