GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

evaluation

Website
Wikipedia
langfuse/langfuse
https://static.github-zh.com/github_avatars/langfuse?size=40
langfuse / langfuse

#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

analytics大语言模型llmopslarge-language-modelsopenai自托管ycombinator监控observabilityOpen Sourcelangchainllama-indexevaluationprompt-engineeringprompt-managementplaygroundllm-evaluationllm-observabilityautogen
TypeScript 12.61 k
2 小时前
https://static.github-zh.com/github_avatars/mrgloom?size=40
mrgloom / awesome-semantic-segmentation

:metal: awesome-semantic-segmentation

semantic-segmentationbenchmarkevaluation深度学习
10.69 k
4 年前
https://static.github-zh.com/github_avatars/explodinggradients?size=40
explodinggradients / ragas

#大语言模型#Supercharge Your LLM Application Evaluations 🚀

大语言模型llmopsevaluation
Python 9.51 k
3 天前
https://static.github-zh.com/github_avatars/oumi-ai?size=40
oumi-ai / oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

dpoevaluationfine-tuninginferencellama大语言模型sftvlms
Python 8.18 k
2 天前
https://static.github-zh.com/github_avatars/promptfoo?size=40
promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

大语言模型prompt-engineeringpromptsllmopsprompt-testingTestingragevaluationevaluation-frameworkllm-evalllm-evaluationllm-evaluation-framework持续集成CI/CDpentestingred-teamingvulnerability-scanners
TypeScript 7.2 k
18 小时前
https://static.github-zh.com/github_avatars/open-compass?size=40
open-compass / opencompass

#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

evaluationbenchmark大语言模型ChatGPTllama2openaillama3
Python 5.51 k
3 天前
Marker-Inc-Korea/AutoRAG
https://static.github-zh.com/github_avatars/Marker-Inc-Korea?size=40
Marker-Inc-Korea / AutoRAG

#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysisautomlbenchmarkingdocument-parserembeddingsevaluation大语言模型llm-evaluationllm-opsOpen SourceopsoptimizationpipelinePythonqaragrag-evaluationretrieval-augmented-generation
Python 4.03 k
1 个月前
https://static.github-zh.com/github_avatars/Helicone?size=40
Helicone / helicone

#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

large-language-modelsprompt-engineeringagent-monitoringanalyticsevaluationgptlangchainllama-index大语言模型llm-costllm-evaluationllm-observabilityllmops监控Open Sourceopenaiplaygroundprompt-managementycombinator
TypeScript 3.92 k
20 小时前
https://static.github-zh.com/github_avatars/Knetic?size=40
Knetic / govaluate

Arbitrary expression evaluation for golang

GoevaluationParsingexpression
Go 3.88 k
3 个月前
MichaelGrupp/evo
https://static.github-zh.com/github_avatars/MichaelGrupp?size=40
MichaelGrupp / evo

Python package for the evaluation of odometry and SLAM

slamodometryevaluation监控Roboticstrajectorybenchmarkroskittitummappingros2trajectory-analysis
Python 3.82 k
7 天前
Kiln-AI/Kiln
https://static.github-zh.com/github_avatars/Kiln-AI?size=40
Kiln-AI / Kiln

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能chain-of-thoughtcollaborationfine-tuning机器学习macOSollamaopenaipromptprompt-engineeringPythonrlhfsynthetic-dataWindowsevalsevaluation
Python 3.74 k
8 小时前
https://static.github-zh.com/github_avatars/sdiehl?size=40
sdiehl / write-you-a-haskell

Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)

编译器bookevaluationlambda-calculustypetype-checkingtype-system函数式编程functional-languagetype-inferencetype-theoryintermediate-representation
Haskell 3.39 k
4 年前
https://static.github-zh.com/github_avatars/CLUEbenchmark?size=40
CLUEbenchmark / SuperCLUE

#大语言模型#SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

ChatGPT中文evaluationfoundation-modelsgpt-4
3.2 k
2 个月前
https://static.github-zh.com/github_avatars/viebel?size=40
viebel / klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

ClojureClojureScriptJavaScriptRubyschemeprologReactcodemirror-editorevaluationPythonbrainfuckLuaOCamlReasonCommon Lisp
HTML 3.13 k
8 个月前
https://static.github-zh.com/github_avatars/zzw922cn?size=40
zzw922cn / Automatic_Speech_Recognition

#计算机科学#End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

automatic-speech-recognitionTensorflowtimit-datasetfeature-vectorphonemesdata-preprocessingrnnaudio深度学习lstmend-to-endcnnevaluationBukkitspeech-recognitionchinese-speech-recognition
Python 2.84 k
2 年前
https://static.github-zh.com/github_avatars/ianarawjo?size=40
ianarawjo / ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.

人工智能evaluationlarge-language-modelsllmops大语言模型prompt-engineering
TypeScript 2.65 k
1 个月前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / promptbench

#大语言模型#A unified evaluation framework for large language models

adversarial-attacksChatGPTevaluationlarge-language-modelsrobustnesspromptprompt-engineeringbenchmark
Python 2.63 k
16 天前
https://static.github-zh.com/github_avatars/EvolvingLMMs-Lab?size=40
EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

agievaluationlarge-language-modelsmultimodal
Python 2.62 k
1 天前
https://static.github-zh.com/github_avatars/open-compass?size=40
open-compass / VLMEvalKit

#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4vlarge-language-modelsllavamulti-modalopenaivqa大语言模型openai-apiqwengpt机器视觉PyTorchgpt4ChatGPTclipvitevaluationclaudegemini
Python 2.52 k
3 天前
https://static.github-zh.com/github_avatars/uptrain-ai?size=40
uptrain-ai / uptrain

#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...

机器学习experimentationllm-promptingllmops监控prompt-engineeringevaluationllm-eval
Python 2.28 k
10 个月前
loading...