GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

evals

Website
Wikipedia
https://static.github-zh.com/github_avatars/mastra-ai?size=40
mastra-ai / mastra

#大语言模型#The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.

agents人工智能chatbotsJavaScript大语言模型NextNode.jsReactTypeScriptworkflowsevalsmcptts
TypeScript 14.18 k
2 天前
Arize-ai/phoenix
https://static.github-zh.com/github_avatars/Arize-ai?size=40
Arize-ai / phoenix

#数据仓库#AI Observability & Evaluation

llmopsai-monitoringai-observabilityllm-eval数据集agents大语言模型prompt-engineeringanthropicevalsllm-evaluationopenailangchainllamaindexsmolagents
Jupyter Notebook 5.97 k
1 天前
https://static.github-zh.com/github_avatars/AgentOps-AI?size=40
AgentOps-AI / agentops

#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI

agentagentops人工智能evalsevaluation-metrics大语言模型anthropicautogencost-estimationcrewaigroqlangchainmistralollamaopenaiagents-sdkopenai-agents
Python 4.54 k
3 天前
Kiln-AI/Kiln
https://static.github-zh.com/github_avatars/Kiln-AI?size=40
Kiln-AI / Kiln

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能chain-of-thoughtcollaborationfine-tuning机器学习macOSollamaopenaipromptprompt-engineeringPythonrlhfsynthetic-dataWindowsevalsevaluation
Python 3.74 k
8 小时前
https://static.github-zh.com/github_avatars/lmnr-ai?size=40
lmnr-ai / lmnr

Laminar - open-source all-in-one platform for engineering AI products. Create data flywheel for your AI app. Traces, Evals, Datasets, Labels. YC S24.

aiopsdeveloper-toolsobservabilityagents人工智能Rustanalyticsllm-evaluationllm-observability监控Open Source自托管ai-observabilityllmopsevalsevaluationTypeScriptts
TypeScript 2.07 k
2 天前
https://static.github-zh.com/github_avatars/superlinear-ai?size=40
superlinear-ai / raglite

#大语言模型#🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

大语言模型Markdownpdfragretrieval-augmented-generationSQLitevector-searchpgvectorPostgreSQLrerankinglate-chunkinglate-interactioncolbertevalsquery-adapterchainlitduckdb
Python 1.01 k
4 天前
https://static.github-zh.com/github_avatars/mattpocock?size=40
mattpocock / evalite

Test your LLM-powered apps with TypeScript. No API key required.

人工智能evalsTypeScript
TypeScript 577
2 个月前
https://static.github-zh.com/github_avatars/keshik6?size=40
keshik6 / HourVideo

[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding

gemini-progpt-4multimodal-large-language-modelsnavigationperceptionsummarizationreasoningevals
Jupyter Notebook 148
3 个月前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / promptpex

#大语言模型#Test Generation for Prompts

ChatGPTgpt-4o大语言模型prompt-engineeringTestingevals
TeX 100
5 天前
https://static.github-zh.com/github_avatars/METR?size=40
METR / vivaria

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

人工智能evals
TypeScript 94
5 天前
https://static.github-zh.com/github_avatars/mclenhard?size=40
mclenhard / mcp-evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and perfor...

人工智能evalsmcp
TypeScript 55
1 个月前
https://static.github-zh.com/github_avatars/flexpa?size=40
flexpa / llm-fhir-eval

#大语言模型#Benchmarking Large Language Models for FHIR

evalsfhir大语言模型llm-evaluation-framework
TypeScript 37
16 天前
https://static.github-zh.com/github_avatars/dustalov?size=40
dustalov / evalica

#大语言模型#Evalica, your favourite evaluation toolkit

evalsevaluationLibrary大语言模型pyo3PythonRustrankingrating统计leaderboardHacktoberfest
Python 37
14 天前
https://static.github-zh.com/github_avatars/AIAnytime?size=40
AIAnytime / rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

evalevalsrag
Python 36
10 个月前
https://static.github-zh.com/github_avatars/NirantK?size=40
NirantK / rag-to-riches

evalsragsearch
Jupyter Notebook 26
2 个月前
https://static.github-zh.com/github_avatars/maragudk?size=40
maragudk / gai

#大语言模型#Go Artificial Intelligence (GAI) helps you work with foundational models, large language models, and other AI models.

人工智能Go大语言模型evalevalsembeddings
Go 25
5 天前
https://static.github-zh.com/github_avatars/google?size=40
google / curie

#大语言模型#Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025

dataevals大语言模型science
Jupyter Notebook 23
2 个月前
https://static.github-zh.com/github_avatars/The-Swarm-Corporation?size=40
The-Swarm-Corporation / StatisticalModelEvaluator

An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"

agents人工智能evals大语言模型机器学习multiagent
Python 16
2 个月前
https://static.github-zh.com/github_avatars/root-signals?size=40
root-signals / rs-python-sdk

#大语言模型#Root Signals Python SDK

evaluation大语言模型llm-as-a-judgeobservabilityevals
Python 12
3 天前
https://static.github-zh.com/github_avatars/openlayer-ai?size=40
openlayer-ai / templates

Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.

人工智能evalsExample
Python 8
25 天前
loading...