evaluation · GitHub Topics

#计算机科学#MLflow 是一个开源框架，旨在管理整个机器学习生命周期。它可以在不同的平台上训练模型并为模型提供服务，让你能够使用相同的一组工具，而不管试验是在计算机本地、远程计算目标上、虚拟机上

机器学习人工智能 mlflow Apache Spark model-management agentops agents evaluation langchain llm-evaluation llmops observability Open Source openai prompt-engineering mlops

Python 21.4 k

4 小时前

langfuse / langfuse

#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

analytics 大语言模型 llmops large-language-models openai 自托管 ycombinator 监控 observability Open Source langchain llama-index evaluation prompt-engineering prompt-management playground llm-evaluation llm-observability autogen

TypeScript 14.47 k

1 小时前

mrgloom / awesome-semantic-segmentation

:metal: awesome-semantic-segmentation

semantic-segmentation benchmark evaluation 深度学习

10.7 k

4 年前

explodinggradients / ragas

#大语言模型#Supercharge Your LLM Application Evaluations 🚀

大语言模型 llmops evaluation

Python 10.14 k

2 天前

oumi-ai / oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

dpo evaluation fine-tuning inference llama 大语言模型 sft vlms

Python 8.34 k

11 小时前

promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma...

大语言模型 prompt-engineering prompts llmops prompt-testing Testing rag evaluation evaluation-framework llm-eval llm-evaluation llm-evaluation-framework 持续集成 CI/CD pentesting red-teaming vulnerability-scanners

TypeScript 7.8 k

7 小时前

open-compass / opencompass

#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

evaluation benchmark 大语言模型 ChatGPT llama2 openai llama3

Python 5.77 k

1 天前

Helicone / helicone

#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

large-language-models prompt-engineering agent-monitoring analytics evaluation gpt langchain llama-index 大语言模型 llm-cost llm-evaluation llm-observability llmops 监控 Open Source openai playground prompt-management ycombinator

TypeScript 4.25 k

10 小时前

coze-dev / coze-loop

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to...

agent 人工智能 agentops evaluation langchain llmops 监控 observability Open Source openai playground prompt-management llm-observability coze

Go 4.24 k

1 天前