#计算机科学#MLflow 是一个开源框架,旨在管理整个机器学习生命周期。 它可以在不同的平台上训练模型并为模型提供服务,让你能够使用相同的一组工具,而不管试验是在计算机本地、远程计算目标上、虚拟机上
#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
#大语言模型#Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
DeepEval 是大语言模型评估框架,专为评估和测试大语言模型系统而设计。它类似于 Pytest,但专注于对 LLM 输出进行单元测试。
#大语言模型#Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma...
#数据仓库#AI Observability & Evaluation
the LLM vulnerability scanner
#大语言模型#🐢 Open-Source Evaluation & Testing for AI & LLM systems
#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
#大语言模型#The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
#计算机科学#Evaluation and Tracking for LLM Experiments and AI Agents
Laminar - open-source all-in-one platform for engineering AI products. Create data flywheel for your AI app. Traces, Evals, Datasets, Labels. YC S24.
Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...
#大语言模型#UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
#大语言模型#A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.
#大语言模型#The open source post-building layer for agents. Our traces + evals power agent post-training (RL, SFT), monitoring, and regression testing.
#自然语言处理#Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.