#大语言模型#Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma...
#数据仓库#AI Observability & Evaluation
#大语言模型#🐢 Open-Source Evaluation & Testing library for LLM Agents
#计算机科学#Evaluation and Tracking for LLM Experiments and AI Agents
#大语言模型#ETL, Analytics, Versioning for Unstructured Data
#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
#大语言模型#A desktop MCP client designed as a tool unitary utility integration, accelerating AI adoption through the Model Context Protocol (MCP) and enabling cross-vendor LLM API orchestration.
Python SDK for running evaluations on LLM generated responses
#大语言模型#A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
#大语言模型#Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
#大语言模型#llm-eval-simple is a simple LLM evaluation framework with intermediate actions and prompt pattern selection
#大语言模型#Develop reliable AI apps
#大语言模型#First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)
#大语言模型#🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
#自然语言处理#An open source library for asynchronous querying of LLM endpoints
Realign is a testing and simulation framework for AI applications.
Run a prompt against all, or some, of your models running on Ollama. Creates web pages with the output, performance statistics and model info. All in a single Bash shell script.