#

evaluation

mlflow/mlflow
https://static.github-zh.com/github_avatars/mlflow?size=40

#计算机科学#MLflow 是一个开源框架,旨在管理整个机器学习生命周期。 它可以在不同的平台上训练模型并为模型提供服务,让你能够使用相同的一组工具,而不管试验是在计算机本地、远程计算目标上、虚拟机上

Python 22.09 k
1 小时前
langfuse/langfuse
https://static.github-zh.com/github_avatars/langfuse?size=40

#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 16.25 k
7 小时前
https://static.github-zh.com/github_avatars/explodinggradients?size=40
Python 10.78 k
10 小时前
https://static.github-zh.com/github_avatars/oumi-ai?size=40

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

Python 8.47 k
2 天前
https://static.github-zh.com/github_avatars/promptfoo?size=40

#大语言模型#Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma...

TypeScript 8.4 k
1 小时前
https://static.github-zh.com/github_avatars/open-compass?size=40

#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6.06 k
3 天前
https://static.github-zh.com/github_avatars/coze-dev?size=40

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to...

Go 4.91 k
15 小时前
MichaelGrupp/evo
https://static.github-zh.com/github_avatars/MichaelGrupp?size=40
Python 3.94 k
2 个月前
https://static.github-zh.com/github_avatars/Knetic?size=40

Arbitrary expression evaluation for golang

Go 3.91 k
6 个月前
https://static.github-zh.com/github_avatars/CLUEbenchmark?size=40

#大语言模型#SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

3.26 k
10 天前
https://static.github-zh.com/github_avatars/viebel?size=40
HTML 3.14 k
1 年前
https://static.github-zh.com/github_avatars/EvolvingLMMs-Lab?size=40

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3.09 k
1 天前
https://static.github-zh.com/github_avatars/open-compass?size=40
Python 3.06 k
20 小时前
loading...
Website
Wikipedia