#计算机科学#MLflow 是一个开源框架,旨在管理整个机器学习生命周期。 它可以在不同的平台上训练模型并为模型提供服务,让你能够使用相同的一组工具,而不管试验是在计算机本地、远程计算目标上、虚拟机上
#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
:metal: awesome-semantic-segmentation
#大语言模型#Supercharge Your LLM Application Evaluations 🚀
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
#大语言模型#Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma...
#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to...
#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Python package for the evaluation of odometry and SLAM
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
#大语言模型#SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
#计算机科学#End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
An open-source visual programming environment for battle-testing prompts to LLMs.