GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

evaluation-metrics

Website
Wikipedia
https://static.github-zh.com/github_avatars/confident-ai?size=40
confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metricsevaluation-frameworkllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 8.01 k
3 天前
https://static.github-zh.com/github_avatars/AgentOps-AI?size=40
AgentOps-AI / agentops

#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI

agentagentops人工智能evalsevaluation-metrics大语言模型anthropicautogencost-estimationcrewaigroqlangchainmistralollamaopenaiagents-sdkopenai-agents
Python 4.54 k
3 天前
https://static.github-zh.com/github_avatars/datawhalechina?size=40
datawhalechina / tiny-universe

《大模型白盒子构建指南》:一个全手搓的Tiny-Universe

ragagentdiffusionevaluation-metricsllamaqwentransformers
Jupyter Notebook 3.06 k
2 个月前
https://static.github-zh.com/github_avatars/xinshuoweng?size=40
xinshuoweng / AB3DMOT

#计算机科学#(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

机器视觉机器学习Roboticstracking3d-trackingmulti-object-trackingreal-timeevaluation-metricsevaluation3d-multi-object-trackingkitti
Python 1.75 k
1 年前
https://static.github-zh.com/github_avatars/huggingface?size=40
huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluationevaluation-frameworkevaluation-metricshuggingface
Python 1.62 k
2 天前
https://static.github-zh.com/github_avatars/huggingface?size=40
huggingface / evaluation-guidebook

#大语言模型#Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

evaluationevaluation-metricsguidebooklarge-language-models大语言模型机器学习教程
Jupyter Notebook 1.41 k
5 个月前
https://static.github-zh.com/github_avatars/google-research?size=40
google-research / rliable

#计算机科学#[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

reinforcement-learningbenchmarkingevaluation-metrics机器学习Googlerl
Jupyter Notebook 832
10 个月前
https://static.github-zh.com/github_avatars/MIND-Lab?size=40
MIND-Lab / OCTIS

#自然语言处理#OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

topic-modelingevaluation-metrics自然语言处理bayesian-optimizationhyperparameter-optimizationhyperparameter-tuninghyperparameter-searchtopic-modelsnlprocnlp-library
Python 771
1 年前
https://static.github-zh.com/github_avatars/jitsi?size=40
jitsi / jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

automatic-speech-recognitionPythonspeech-to-textevaluation-metrics
Python 735
4 个月前
https://static.github-zh.com/github_avatars/nekhtiari?size=40
nekhtiari / image-similarity-measures

#计算机科学#📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

Image监控机器学习evaluation-metricsp1processing
Python 614
10 个月前
https://static.github-zh.com/github_avatars/Unbabel?size=40
Unbabel / COMET

#自然语言处理# A Neural Framework for MT Evaluation

machine-translationevaluation-metrics自然语言处理机器学习人工智能
Python 609
6 天前
https://static.github-zh.com/github_avatars/AmenRa?size=40
AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

numbaPythonevaluationevaluation-metricsinformation-retrievalrecommender-systemsmetasearchcomparison
Python 561
1 年前
https://static.github-zh.com/github_avatars/relari-ai?size=40
relari-ai / continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

evaluation-frameworkevaluation-metricsinformation-retrievalllm-evaluationllmopsragretrieval-augmented-generation
Python 497
5 个月前
https://static.github-zh.com/github_avatars/proycon?size=40
proycon / pynlpl

#自然语言处理#PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...

自然语言处理Pythoncomputational-linguisticsLibrary机器学习search-algorithmsevaluation-metricstext-processingnlp-library
Python 477
2 年前
https://static.github-zh.com/github_avatars/JokerJohn?size=40
JokerJohn / Cloud_Map_Evaluation

[RAL' 2025] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.

open3dPoint cloudslamevaluation-metricslidar-point-cloudRobotics
C++ 373
17 天前
https://static.github-zh.com/github_avatars/v-iashin?size=40
v-iashin / SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

transformervqvaeGenerative Adversarial NetworkPyTorchaudio-generationmelganmulti-modalvideo-understandingevaluation-metricsaudioVideo
Jupyter Notebook 362
1 年前
https://static.github-zh.com/github_avatars/ziqihuangg?size=40
ziqihuangg / Awesome-Evaluation-of-Visual-Generation

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

Awesome Listsbenchmarkevaluationevaluation-metricsgenerative-modelsimage-generationvideo-generation
307
6 天前
https://static.github-zh.com/github_avatars/TonicAI?size=40
TonicAI / tonic_validate

#大语言模型#Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metricslarge-language-models大语言模型llmopsragretrieval-augmented-generationevaluation-framework
Python 304
1 个月前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / factCC

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

evaluation-metrics
Python 297
1 个月前
https://static.github-zh.com/github_avatars/athina-ai?size=40
athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluationevaluation-frameworkevaluation-metricsllm-evalllm-evaluationllm-opsllmops
Python 281
9 天前
loading...