GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

llm-evaluation

Website
Wikipedia
langfuse/langfuse
https://static.github-zh.com/github_avatars/langfuse?size=40
langfuse / langfuse

#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

analytics大语言模型llmopslarge-language-modelsopenai自托管ycombinator监控observabilityOpen Sourcelangchainllama-indexevaluationprompt-engineeringprompt-managementplaygroundllm-evaluationllm-observabilityautogen
TypeScript 12.61 k
2 小时前
https://static.github-zh.com/github_avatars/comet-ml?size=40
comet-ml / opik

#大语言模型#Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Open Sourcelangchainopenaiplaygroundprompt-engineeringllama-index大语言模型llm-evaluationllm-observabilityllmops
Python 9.76 k
7 小时前
https://static.github-zh.com/github_avatars/confident-ai?size=40
confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metricsevaluation-frameworkllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 8.01 k
3 天前
https://static.github-zh.com/github_avatars/promptfoo?size=40
promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

大语言模型prompt-engineeringpromptsllmopsprompt-testingTestingragevaluationevaluation-frameworkllm-evalllm-evaluationllm-evaluation-framework持续集成CI/CDpentestingred-teamingvulnerability-scanners
TypeScript 7.2 k
18 小时前
Arize-ai/phoenix
https://static.github-zh.com/github_avatars/Arize-ai?size=40
Arize-ai / phoenix

#数据仓库#AI Observability & Evaluation

llmopsai-monitoringai-observabilityllm-eval数据集agents大语言模型prompt-engineeringanthropicevalsllm-evaluationopenailangchainllamaindexsmolagents
Jupyter Notebook 5.97 k
1 天前
Giskard-AI/giskard
https://static.github-zh.com/github_avatars/Giskard-AI?size=40
Giskard-AI / giskard

#大语言模型#🐢 Open-Source Evaluation & Testing for AI & LLM systems

mlopsml-validationml-testingllmopsresponsible-aifairness-aillm-evalllm-evaluationrag-evaluationai-securityllm-securityai-red-teamred-team-tools大语言模型
Python 4.62 k
4 天前
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / garak

the LLM vulnerability scanner

人工智能llm-evaluationllm-securitysecurity-scannersvulnerability-assessment
Python 4.58 k
3 天前
Marker-Inc-Korea/AutoRAG
https://static.github-zh.com/github_avatars/Marker-Inc-Korea?size=40
Marker-Inc-Korea / AutoRAG

#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysisautomlbenchmarkingdocument-parserembeddingsevaluation大语言模型llm-evaluationllm-opsOpen SourceopsoptimizationpipelinePythonqaragrag-evaluationretrieval-augmented-generation
Python 4.03 k
1 个月前
https://static.github-zh.com/github_avatars/Helicone?size=40
Helicone / helicone

#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

large-language-modelsprompt-engineeringagent-monitoringanalyticsevaluationgptlangchainllama-index大语言模型llm-costllm-evaluationllm-observabilityllmops监控Open Sourceopenaiplaygroundprompt-managementycombinator
TypeScript 3.92 k
19 小时前
https://static.github-zh.com/github_avatars/PacktPublishing?size=40
PacktPublishing / LLM-Engineers-Handbook

#大语言模型#The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices

genai大语言模型llmopsmlopsragAmazon Web Servicesfine-tuning-llmllm-evaluationml-system-design
Python 3.48 k
3 个月前
Agenta-AI/agenta
https://static.github-zh.com/github_avatars/Agenta-AI?size=40
Agenta-AI / agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

llm-toolsprompt-engineeringprompt-managementllm-evaluationllm-frameworkrag-evaluationllm-observabilityllm-as-a-judgellm-monitoringllm-platformllm-playgroundllmops-platform
Python 2.83 k
2 天前
https://static.github-zh.com/github_avatars/lmnr-ai?size=40
lmnr-ai / lmnr

Laminar - open-source all-in-one platform for engineering AI products. Create data flywheel for your AI app. Traces, Evals, Datasets, Labels. YC S24.

aiopsdeveloper-toolsobservabilityagents人工智能Rustanalyticsllm-evaluationllm-observability监控Open Source自托管ai-observabilityllmopsevalsevaluationTypeScriptts
TypeScript 2.07 k
2 天前
msoedov/agentic_security
https://static.github-zh.com/github_avatars/msoedov?size=40
msoedov / agentic_security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

llm-securityai-red-teamllm-evaluationllm-evaluation-frameworkprompt-testingagent-framework
Python 1.47 k
5 天前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / prompty

Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...

generative-aillm-evaluation大语言模型promptengineering
Python 924
13 天前
https://static.github-zh.com/github_avatars/cvs-health?size=40
cvs-health / uqlm

#大语言模型#UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

ai-safetyhallucination大语言模型llm-evaluationuncertainty-estimationuncertainty-quantification
Python 692
2 天前
https://static.github-zh.com/github_avatars/cyberark?size=40
cyberark / FuzzyAI

#大语言模型#A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.

jailbreakjailbreaking大语言模型人工智能安全Fuzzing/Fuzz testingllm-evaluationllm-securityai-red-team
Jupyter Notebook 599
12 天前
https://static.github-zh.com/github_avatars/onejune2018?size=40
onejune2018 / Awesome-LLM-Eval

#自然语言处理#Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.

benchmarkbertchatglmChatGPTdatasetevaluationgpt3大语言模型leaderboard机器学习自然语言处理openaillamallm-evaluationqwenrag
540
8 个月前
https://static.github-zh.com/github_avatars/relari-ai?size=40
relari-ai / continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

evaluation-frameworkevaluation-metricsinformation-retrievalllm-evaluationllmopsragretrieval-augmented-generation
Python 497
5 个月前
https://static.github-zh.com/github_avatars/ValueByte-AI?size=40
ValueByte-AI / Awesome-LLM-in-Social-Science

Awesome papers involving LLMs in Social Science.

large-language-modelsllm-agent大语言模型simulation-environmentalignmenteconomicsllm-evaluationpolicypsychologysocial-network
494
1 个月前
https://static.github-zh.com/github_avatars/palico-ai?size=40
palico-ai / palico-ai

#大语言模型#Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework

langchainlangchain-jsllamaindex大语言模型llm-agentllm-evaluationllm-frameworkragJavaScriptTypeScriptllm-observabilityautogenanthropicopenai人工智能Node.jsfull-stackDocker
TypeScript 340
7 个月前
loading...