#大语言模型#The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.
#数据仓库#AI Observability & Evaluation
#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and Ca...
#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
#计算机科学#Evaluation and Tracking for LLM Experiments and AI Agents
Laminar - open-source all-in-one platform for engineering AI products. Create data flywheel for your AI app. Traces, Evals, Datasets, Labels. YC S24.
#大语言模型#🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
#大语言模型#Test Generation for Prompts
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and perfor...
#大语言模型#Evalica, your favourite evaluation toolkit
#大语言模型#Benchmarking Large Language Models for FHIR
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
#大语言模型#Go Artificial Intelligence (GAI) helps you work with foundational models, large language models, and other AI models.
An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"