evals · GitHub Topics

#大语言模型#The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.

agents 人工智能 chatbots JavaScript 大语言模型 Next Node.js React TypeScript workflows evals mcp tts

TypeScript 16.54 k

1 天前

Arize-ai / phoenix

#数据仓库#AI Observability & Evaluation

llmops ai-monitoring ai-observability llm-eval 数据集 agents 大语言模型 prompt-engineering anthropic evals llm-evaluation openai langchain llamaindex smolagents

Jupyter Notebook 6.95 k

18 小时前

AgentOps-AI / agentops

#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and Ca...

agent agentops 人工智能 evals evaluation-metrics 大语言模型 anthropic autogen cost-estimation crewai groq langchain mistral ollama openai agents-sdk openai-agents

Python 4.88 k

4 天前

Kiln-AI / Kiln

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能 chain-of-thought collaboration dataset-generation fine-tuning 机器学习 macOS ollama openai prompt prompt-engineering Python rlhf synthetic-data Windows evals evaluation

Python 4.13 k

1 天前

truera / trulens

#计算机科学#Evaluation and Tracking for LLM Experiments and AI Agents

机器学习 neural-networks explainable-ml llmops ai-monitoring ai-observability evals llm-evaluation 大语言模型 ai-agents llm-eval agentops

Python 2.77 k

2 天前

lmnr-ai / lmnr

Laminar - open-source all-in-one platform for engineering AI products. Create data flywheel for your AI app. Traces, Evals, Datasets, Labels. YC S24.

aiops developer-tools observability agents 人工智能 Rust analytics llm-evaluation llm-observability 监控 Open Source 自托管 ai-observability llmops evals evaluation TypeScript ts

TypeScript 2.28 k

2 天前

superlinear-ai / raglite

#大语言模型#🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

大语言模型 Markdown pdf rag retrieval-augmented-generation SQLite vector-search pgvector PostgreSQL reranking late-chunking late-interaction colbert evals query-adapter chainlit duckdb

Python 1.06 k

5 天前

mattpocock / evalite

Evaluate your LLM-powered apps with TypeScript

人工智能 evals TypeScript

TypeScript 859

17 天前

keshik6 / HourVideo

[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding

gemini-pro gpt-4 multimodal-large-language-models navigation perception summarization reasoning evals spatial-intelligence

Jupyter Notebook 153

2 个月前

microsoft / promptpex

#大语言模型#Test Generation for Prompts

ChatGPT gpt-4o 大语言模型 prompt-engineering Testing evals

TeX 137

5 天前

METR / vivaria

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

人工智能 evals

TypeScript 110

7 天前

mclenhard / mcp-evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and perfor...

人工智能 evals mcp

TypeScript 94

3 个月前