GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

llm-eval

Website
Wikipedia
https://static.github-zh.com/github_avatars/promptfoo?size=40
promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma...

大语言模型prompt-engineeringpromptsllmopsprompt-testingTestingragevaluationevaluation-frameworkllm-evalllm-evaluationllm-evaluation-framework持续集成CI/CDpentestingred-teamingvulnerability-scanners
TypeScript 8.37 k
11 小时前
Arize-ai/phoenix
https://static.github-zh.com/github_avatars/Arize-ai?size=40
Arize-ai / phoenix

#数据仓库#AI Observability & Evaluation

llmopsai-monitoringai-observabilityllm-eval数据集agents大语言模型prompt-engineeringanthropicevalsllm-evaluationopenailangchainllamaindexsmolagents
Jupyter Notebook 6.95 k
16 小时前
Giskard-AI/giskard-oss
https://static.github-zh.com/github_avatars/Giskard-AI?size=40
Giskard-AI / giskard-oss

#大语言模型#🐢 Open-Source Evaluation & Testing library for LLM Agents

mlopsml-validationml-testingllmopsresponsible-aifairness-aillm-evalllm-evaluationrag-evaluationai-securityllm-securityai-red-teamred-team-tools大语言模型
Python 4.86 k
3 天前
https://static.github-zh.com/github_avatars/truera?size=40
truera / trulens

#计算机科学#Evaluation and Tracking for LLM Experiments and AI Agents

机器学习neural-networksexplainable-mlllmopsai-monitoringai-observabilityevalsllm-evaluation大语言模型ai-agentsllm-evalagentops
Python 2.77 k
2 天前
iterative/datachain
https://static.github-zh.com/github_avatars/iterative?size=40
iterative / datachain

#大语言模型#ETL, Analytics, Versioning for Unstructured Data

人工智能cvdata-wrangling大语言模型llm-evalmultimodaldata-analyticsembeddingsmlops机器学习
Python 2.65 k
8 小时前
https://static.github-zh.com/github_avatars/uptrain-ai?size=40
uptrain-ai / uptrain

#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...

机器学习experimentationllm-promptingllmops监控prompt-engineeringevaluationllm-eval
Python 2.32 k
1 年前
AI-QL/tuui
https://static.github-zh.com/github_avatars/AI-QL?size=40
AI-QL / tuui

#大语言模型#A desktop MCP client designed as a tool unitary utility integration, accelerating AI adoption through the Model Context Protocol (MCP) and enabling cross-vendor LLM API orchestration.

agentagentic-ai人工智能deepseek大语言模型mcpopenai-apiqwenmcp-clientmcp-hostmodel-context-protocolai-playgroundllm-evalpromptTestinganthropicclaude
TypeScript 1.07 k
1 天前
https://static.github-zh.com/github_avatars/athina-ai?size=40
athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluationevaluation-frameworkevaluation-metricsllm-evalllm-evaluationllm-opsllmops
Python 292
3 个月前
https://static.github-zh.com/github_avatars/Re-Align?size=40
Re-Align / just-eval

#大语言模型#A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

evaluationgpt4大语言模型llm-evalllm-evaluation
Python 87
2 年前
https://static.github-zh.com/github_avatars/parea-ai?size=40
parea-ai / parea-sdk-py

#大语言模型#Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

大语言模型llm-evaluationllm-toolsllmopsllm-evalllm-evaluation-frameworkprompt-engineeringgenerative-aigood-first-issue监控
Python 78
7 个月前
https://static.github-zh.com/github_avatars/kuk?size=40
kuk / rulm-sbs2

Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat

llm-eval
Jupyter Notebook 60
2 年前
https://static.github-zh.com/github_avatars/grigio?size=40
grigio / llm-eval-simple

#大语言模型#llm-eval-simple is a simple LLM evaluation framework with intermediate actions and prompt pattern selection

大语言模型llm-eval
Python 46
6 天前
https://static.github-zh.com/github_avatars/multinear?size=40
multinear / multinear

#大语言模型#Develop reliable AI apps

evaluation大语言模型reliabilityllm-evalllm-evaluationllm-evaluation-framework
Python 44
12 天前
https://static.github-zh.com/github_avatars/whitecircle-ai?size=40
whitecircle-ai / circle-guard-bench

#大语言模型#First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

人工智能benchmark大语言模型large-language-modelsllm-evalllm-evaluationguardrailsbenchmarkingguardrailjailbreakllm-as-a-judgellm-security
Python 41
2 个月前
https://static.github-zh.com/github_avatars/Auto-Playground?size=40
Auto-Playground / ragrank

#大语言模型#🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

evaluationlanguage-model大语言模型llm-evalllmops机器学习prompt-engineeringrag
Python 40
3 个月前
https://static.github-zh.com/github_avatars/alan-turing-institute?size=40
alan-turing-institute / prompto

#自然语言处理#An open source library for asynchronous querying of LLM endpoints

hut23large-language-modelsllm-evalllm-evaluation大语言模型transformers深度学习机器学习自然语言处理Pythontransformer
Python 32
2 个月前
https://static.github-zh.com/github_avatars/genia-dev?size=40
genia-dev / vibraniumdome

#大语言模型#LLM Security Platform.

adversarial-attacksChatGPT大语言模型openaiprompt-injection安全llm-agentllm-securityllmopsprompt-engineeringpromptsllm-frameworkllm-inferencellm-servingllm-evaluationllm-eval
Python 22
1 年前
https://static.github-zh.com/github_avatars/Supahands?size=40
Supahands / llm-comparison-backend

#大语言模型#This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated ...

人工智能ChatGPT大语言模型llm-eval
Python 21
2 个月前
https://static.github-zh.com/github_avatars/honeyhiveai?size=40
honeyhiveai / realign

Realign is a testing and simulation framework for AI applications.

人工智能alignmentevaluation大语言模型prompt-engineeringred-teamingSimulationllm-evalllm-evaluationllm-evaluation-frameworkllmopsrag
Python 17
9 个月前
https://static.github-zh.com/github_avatars/attogram?size=40
attogram / ollama-multirun

Run a prompt against all, or some, of your models running on Ollama. Creates web pages with the output, performance statistics and model info. All in a single Bash shell script.

人工智能ollamallm-evaluationollama-interfaceBashollama-appllm-evaluation-metricsllm-evalstatic-site-generator
Shell 10
15 天前
loading...