GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

llm-eval

Website
Wikipedia
https://static.github-zh.com/github_avatars/promptfoo?size=40
promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

大语言模型prompt-engineeringpromptsllmopsprompt-testingTestingragevaluationevaluation-frameworkllm-evalllm-evaluationllm-evaluation-framework持续集成CI/CDpentestingred-teamingvulnerability-scanners
TypeScript 7.2 k
17 小时前
Arize-ai/phoenix
https://static.github-zh.com/github_avatars/Arize-ai?size=40
Arize-ai / phoenix

#数据仓库#AI Observability & Evaluation

llmopsai-monitoringai-observabilityllm-eval数据集agents大语言模型prompt-engineeringanthropicevalsllm-evaluationopenailangchainllamaindexsmolagents
Jupyter Notebook 5.97 k
1 天前
Giskard-AI/giskard
https://static.github-zh.com/github_avatars/Giskard-AI?size=40
Giskard-AI / giskard

#大语言模型#🐢 Open-Source Evaluation & Testing for AI & LLM systems

mlopsml-validationml-testingllmopsresponsible-aifairness-aillm-evalllm-evaluationrag-evaluationai-securityllm-securityai-red-teamred-team-tools大语言模型
Python 4.62 k
4 天前
iterative/datachain
https://static.github-zh.com/github_avatars/iterative?size=40
iterative / datachain

#大语言模型#ETL, Analytics, Versioning for Unstructured Data

人工智能cvdata-wrangling大语言模型llm-evalmultimodaldata-analyticsembeddingsmlops机器学习
Python 2.58 k
3 天前
https://static.github-zh.com/github_avatars/uptrain-ai?size=40
uptrain-ai / uptrain

#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...

机器学习experimentationllm-promptingllmops监控prompt-engineeringevaluationllm-eval
Python 2.28 k
10 个月前
https://static.github-zh.com/github_avatars/athina-ai?size=40
athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluationevaluation-frameworkevaluation-metricsllm-evalllm-evaluationllm-opsllmops
Python 281
9 天前
https://static.github-zh.com/github_avatars/Re-Align?size=40
Re-Align / just-eval

#大语言模型#A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

evaluationgpt4大语言模型llm-evalllm-evaluation
Python 85
1 年前
https://static.github-zh.com/github_avatars/parea-ai?size=40
parea-ai / parea-sdk-py

#大语言模型#Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

大语言模型llm-evaluationllm-toolsllmopsllm-evalllm-evaluation-frameworkprompt-engineeringgenerative-aigood-first-issue监控
Python 78
4 个月前
https://static.github-zh.com/github_avatars/kuk?size=40
kuk / rulm-sbs2

Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat

llm-eval
Jupyter Notebook 61
2 年前
https://static.github-zh.com/github_avatars/multinear?size=40
multinear / multinear

#大语言模型#Develop reliable AI apps

evaluation大语言模型reliabilityllm-evalllm-evaluationllm-evaluation-framework
Svelte 39
2 个月前
https://static.github-zh.com/github_avatars/whitecircle-ai?size=40
whitecircle-ai / circle-guard-bench

#大语言模型#First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

人工智能benchmark大语言模型large-language-modelsllm-evalllm-evaluationguardrailsbenchmarkingguardrailjailbreakllm-as-a-judgellm-security
Python 38
10 天前
https://static.github-zh.com/github_avatars/Auto-Playground?size=40
Auto-Playground / ragrank

#大语言模型#🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

evaluationlanguage-model大语言模型llm-evalllmops机器学习prompt-engineeringrag
Python 38
5 个月前
https://static.github-zh.com/github_avatars/alan-turing-institute?size=40
alan-turing-institute / prompto

#自然语言处理#An open source library for asynchronous querying of LLM endpoints

hut23large-language-modelsllm-evalllm-evaluation大语言模型transformers深度学习机器学习自然语言处理Pythontransformer
Python 28
23 天前
https://static.github-zh.com/github_avatars/Supahands?size=40
Supahands / llm-comparison-backend

#大语言模型#This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated ...

人工智能ChatGPT大语言模型llm-eval
Python 21
1 个月前
https://static.github-zh.com/github_avatars/genia-dev?size=40
genia-dev / vibraniumdome

#大语言模型#LLM Security Platform.

adversarial-attacksChatGPT大语言模型openaiprompt-injection安全llm-agentllm-securityllmopsprompt-engineeringpromptsllm-frameworkllm-inferencellm-servingllm-evaluationllm-eval
Python 17
8 个月前
https://static.github-zh.com/github_avatars/honeyhiveai?size=40
honeyhiveai / realign

Realign is a testing and simulation framework for AI applications.

人工智能alignmentevaluation大语言模型prompt-engineeringred-teamingSimulationllm-evalllm-evaluationllm-evaluation-frameworkllmopsrag
Python 16
6 个月前
https://static.github-zh.com/github_avatars/Networks-Learning?size=40
Networks-Learning / prediction-powered-ranking

Code for "Prediction-Powered Ranking of Large Language Models", NeurIPS 2024.

llm-evalllm-evaluationllm-evaluation-frameworkranking-algorithm
Jupyter Notebook 9
8 个月前
https://static.github-zh.com/github_avatars/amplifying-ai?size=40
amplifying-ai / ai-product-bench

datasetevalsllm-eval
HTML 7
19 天前
https://static.github-zh.com/github_avatars/pyladiesams?size=40
pyladiesams / eval-llm-based-apps-jan2025

#大语言模型#Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

大语言模型llmopsworkshopllm-evalllm-evaluation-frameworkllm-evaluation-metricsllm-monitoring
Jupyter Notebook 7
1 个月前
https://static.github-zh.com/github_avatars/prompt-foundry?size=40
prompt-foundry / python-sdk

#大语言模型#The prompt engineering, prompt management, and prompt evaluation tool for Python

大语言模型llm-evalllm-evaluationopen-aiprompt-engineeringprompt-managementPython
Python 7
9 个月前
loading...