#大语言模型#The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.
#数据仓库#AI Observability & Evaluation
#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI
#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Laminar - open-source all-in-one platform for engineering AI products. Create data flywheel for your AI app. Traces, Evals, Datasets, Labels. YC S24.
#大语言模型#🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
Test your LLM-powered apps with TypeScript. No API key required.
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
#大语言模型#Test Generation for Prompts
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and perfor...
#大语言模型#Benchmarking Large Language Models for FHIR
#大语言模型#Evalica, your favourite evaluation toolkit
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
#大语言模型#Go Artificial Intelligence (GAI) helps you work with foundational models, large language models, and other AI models.
An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"
#大语言模型#Root Signals Python SDK
Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.