GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

evaluation

Website
Wikipedia
https://static.github-zh.com/github_avatars/mlflow?size=40
mlflow / mlflow

#计算机科学#MLflow 是一个开源框架,旨在管理整个机器学习生命周期。 它可以在不同的平台上训练模型并为模型提供服务,让你能够使用相同的一组工具,而不管试验是在计算机本地、远程计算目标上、虚拟机上

机器学习人工智能mlflowApache Sparkmodel-managementagentopsagentsevaluationlangchainllm-evaluationllmopsobservabilityOpen Sourceopenaiprompt-engineeringmlops
Python 21.4 k
4 小时前
langfuse/langfuse
https://static.github-zh.com/github_avatars/langfuse?size=40
langfuse / langfuse

#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

analytics大语言模型llmopslarge-language-modelsopenai自托管ycombinator监控observabilityOpen Sourcelangchainllama-indexevaluationprompt-engineeringprompt-managementplaygroundllm-evaluationllm-observabilityautogen
TypeScript 14.47 k
1 小时前
https://static.github-zh.com/github_avatars/mrgloom?size=40
mrgloom / awesome-semantic-segmentation

:metal: awesome-semantic-segmentation

semantic-segmentationbenchmarkevaluation深度学习
10.7 k
4 年前
https://static.github-zh.com/github_avatars/explodinggradients?size=40
explodinggradients / ragas

#大语言模型#Supercharge Your LLM Application Evaluations 🚀

大语言模型llmopsevaluation
Python 10.14 k
2 天前
https://static.github-zh.com/github_avatars/oumi-ai?size=40
oumi-ai / oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

dpoevaluationfine-tuninginferencellama大语言模型sftvlms
Python 8.34 k
11 小时前
https://static.github-zh.com/github_avatars/promptfoo?size=40
promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma...

大语言模型prompt-engineeringpromptsllmopsprompt-testingTestingragevaluationevaluation-frameworkllm-evalllm-evaluationllm-evaluation-framework持续集成CI/CDpentestingred-teamingvulnerability-scanners
TypeScript 7.8 k
7 小时前
https://static.github-zh.com/github_avatars/open-compass?size=40
open-compass / opencompass

#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

evaluationbenchmark大语言模型ChatGPTllama2openaillama3
Python 5.77 k
1 天前
https://static.github-zh.com/github_avatars/Helicone?size=40
Helicone / helicone

#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

large-language-modelsprompt-engineeringagent-monitoringanalyticsevaluationgptlangchainllama-index大语言模型llm-costllm-evaluationllm-observabilityllmops监控Open Sourceopenaiplaygroundprompt-managementycombinator
TypeScript 4.25 k
10 小时前
https://static.github-zh.com/github_avatars/coze-dev?size=40
coze-dev / coze-loop

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to...

agent人工智能agentopsevaluationlangchainllmops监控observabilityOpen Sourceopenaiplaygroundprompt-managementllm-observabilitycoze
Go 4.24 k
1 天前
Marker-Inc-Korea/AutoRAG
https://static.github-zh.com/github_avatars/Marker-Inc-Korea?size=40
Marker-Inc-Korea / AutoRAG

#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysisautomlbenchmarkingdocument-parserembeddingsevaluation大语言模型llm-evaluationllm-opsOpen SourceopsoptimizationpipelinePythonqaragrag-evaluationretrieval-augmented-generation
Python 4.15 k
1 个月前
Kiln-AI/Kiln
https://static.github-zh.com/github_avatars/Kiln-AI?size=40
Kiln-AI / Kiln

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能chain-of-thoughtcollaborationdataset-generationfine-tuning机器学习macOSollamaopenaipromptprompt-engineeringPythonrlhfsynthetic-dataWindowsevalsevaluation
Python 4.01 k
10 小时前
MichaelGrupp/evo
https://static.github-zh.com/github_avatars/MichaelGrupp?size=40
MichaelGrupp / evo

Python package for the evaluation of odometry and SLAM

slamodometryevaluation监控Roboticstrajectorybenchmarkroskittitummappingros2trajectory-analysis
Python 3.9 k
1 天前
https://static.github-zh.com/github_avatars/Knetic?size=40
Knetic / govaluate

Arbitrary expression evaluation for golang

GoevaluationParsingexpression
Go 3.89 k
4 个月前
https://static.github-zh.com/github_avatars/sdiehl?size=40
sdiehl / write-you-a-haskell

Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)

编译器bookevaluationlambda-calculustypetype-checkingtype-system函数式编程functional-languagetype-inferencetype-theoryintermediate-representation
Haskell 3.4 k
5 年前
https://static.github-zh.com/github_avatars/CLUEbenchmark?size=40
CLUEbenchmark / SuperCLUE

#大语言模型#SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

ChatGPT中文evaluationfoundation-modelsgpt-4
3.23 k
3 个月前
https://static.github-zh.com/github_avatars/viebel?size=40
viebel / klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

ClojureClojureScriptJavaScriptRubyschemeprologReactcodemirror-editorevaluationPythonbrainfuckLuaOCamlReasonCommon Lisp
HTML 3.13 k
10 个月前
https://static.github-zh.com/github_avatars/zzw922cn?size=40
zzw922cn / Automatic_Speech_Recognition

#计算机科学#End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

automatic-speech-recognitionTensorflowtimit-datasetfeature-vectorphonemesdata-preprocessingrnnaudio深度学习lstmend-to-endcnnevaluationBukkitspeech-recognitionchinese-speech-recognition
Python 2.84 k
2 年前
https://static.github-zh.com/github_avatars/EvolvingLMMs-Lab?size=40
EvolvingLMMs-Lab / lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

agievaluationlarge-language-modelsmultimodal
Python 2.82 k
1 天前
https://static.github-zh.com/github_avatars/open-compass?size=40
open-compass / VLMEvalKit

#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4vlarge-language-modelsllavamulti-modalopenaivqa大语言模型openai-apiqwengpt机器视觉PyTorchgpt4ChatGPTclipvitevaluationclaudegemini
Python 2.82 k
1 天前
https://static.github-zh.com/github_avatars/ianarawjo?size=40
ianarawjo / ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.

人工智能evaluationlarge-language-modelsllmops大语言模型prompt-engineering
TypeScript 2.69 k
3 天前
loading...