GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

evals

Website
Wikipedia
https://static.github-zh.com/github_avatars/mastra-ai?size=40
mastra-ai / mastra

#大语言模型#The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.

agents人工智能chatbotsJavaScript大语言模型NextNode.jsReactTypeScriptworkflowsevalsmcptts
TypeScript 15.4 k
10 小时前
Arize-ai/phoenix
https://static.github-zh.com/github_avatars/Arize-ai?size=40
Arize-ai / phoenix

#数据仓库#AI Observability & Evaluation

llmopsai-monitoringai-observabilityllm-eval数据集agents大语言模型prompt-engineeringanthropicevalsllm-evaluationopenailangchainllamaindexsmolagents
Jupyter Notebook 6.49 k
7 小时前
https://static.github-zh.com/github_avatars/AgentOps-AI?size=40
AgentOps-AI / agentops

#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and Ca...

agentagentops人工智能evalsevaluation-metrics大语言模型anthropicautogencost-estimationcrewaigroqlangchainmistralollamaopenaiagents-sdkopenai-agents
Python 4.72 k
2 天前
Kiln-AI/Kiln
https://static.github-zh.com/github_avatars/Kiln-AI?size=40
Kiln-AI / Kiln

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能chain-of-thoughtcollaborationdataset-generationfine-tuning机器学习macOSollamaopenaipromptprompt-engineeringPythonrlhfsynthetic-dataWindowsevalsevaluation
Python 4.01 k
11 小时前
https://static.github-zh.com/github_avatars/truera?size=40
truera / trulens

#计算机科学#Evaluation and Tracking for LLM Experiments and AI Agents

机器学习neural-networksexplainable-mlllmopsai-monitoringai-observabilityevalsllm-evaluation大语言模型ai-agentsllm-evalagentops
Python 2.68 k
2 天前
https://static.github-zh.com/github_avatars/lmnr-ai?size=40
lmnr-ai / lmnr

Laminar - open-source all-in-one platform for engineering AI products. Create data flywheel for your AI app. Traces, Evals, Datasets, Labels. YC S24.

aiopsdeveloper-toolsobservabilityagents人工智能Rustanalyticsllm-evaluationllm-observability监控Open Source自托管ai-observabilityllmopsevalsevaluationTypeScriptts
TypeScript 2.2 k
20 小时前
https://static.github-zh.com/github_avatars/superlinear-ai?size=40
superlinear-ai / raglite

#大语言模型#🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

大语言模型Markdownpdfragretrieval-augmented-generationSQLitevector-searchpgvectorPostgreSQLrerankinglate-chunkinglate-interactioncolbertevalsquery-adapterchainlitduckdb
Python 1.04 k
2 个月前
https://static.github-zh.com/github_avatars/mattpocock?size=40
mattpocock / evalite

Evaluate your LLM-powered apps with TypeScript

人工智能evalsTypeScript
TypeScript 770
6 天前
https://static.github-zh.com/github_avatars/keshik6?size=40
keshik6 / HourVideo

[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding

gemini-progpt-4multimodal-large-language-modelsnavigationperceptionsummarizationreasoningevals
Jupyter Notebook 153
19 天前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / promptpex

#大语言模型#Test Generation for Prompts

ChatGPTgpt-4o大语言模型prompt-engineeringTestingevals
TeX 110
8 天前
https://static.github-zh.com/github_avatars/METR?size=40
METR / vivaria

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

人工智能evals
TypeScript 100
2 天前
https://static.github-zh.com/github_avatars/mclenhard?size=40
mclenhard / mcp-evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and perfor...

人工智能evalsmcp
TypeScript 75
1 个月前
https://static.github-zh.com/github_avatars/dustalov?size=40
dustalov / evalica

#大语言模型#Evalica, your favourite evaluation toolkit

evalsevaluationLibrary大语言模型pyo3PythonRustrankingrating统计leaderboardHacktoberfest
Python 54
1 个月前
https://static.github-zh.com/github_avatars/Kylejeong2?size=40
Kylejeong2 / mcpvals

An MCP Evaluation Library

evalsmcp
TypeScript 41
18 天前
https://static.github-zh.com/github_avatars/flexpa?size=40
flexpa / llm-fhir-eval

#大语言模型#Benchmarking Large Language Models for FHIR

evalsfhir大语言模型llm-evaluation-framework
TypeScript 38
9 天前
https://static.github-zh.com/github_avatars/AIAnytime?size=40
AIAnytime / rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

evalevalsrag
Python 37
1 年前
https://static.github-zh.com/github_avatars/google?size=40
google / curie

#大语言模型#Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025

dataevals大语言模型science
Jupyter Notebook 26
3 个月前
https://static.github-zh.com/github_avatars/maragudk?size=40
maragudk / gai

#大语言模型#Go Artificial Intelligence (GAI) helps you work with foundational models, large language models, and other AI models.

人工智能Go大语言模型evalevalsembeddings
Go 26
1 个月前
https://static.github-zh.com/github_avatars/NirantK?size=40
NirantK / rag-to-riches

evalsragsearch
Jupyter Notebook 26
4 个月前
https://static.github-zh.com/github_avatars/The-Swarm-Corporation?size=40
The-Swarm-Corporation / StatisticalModelEvaluator

An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"

agents人工智能evals大语言模型机器学习multiagent
Python 16
10 天前
loading...