#自然语言处理#为 Jax、PyTorch 和 TensorFlow 打造的先进的自然语言处理
Agent TARS 是一个通用的多模态 AI Agent Stack,它将 GUI Agent 和 Vision 的强大功能带入你的终端、计算机、浏览器和产品中。UI-TARS Desktop 是一个桌面应用程序,基于 UI-TARS 模型提供原生的 GUI Agent。
#大语言模型#SGLang is a fast serving framework for large language models and vision language models.
#计算机科学#A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, ...
#大语言模型#The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
#大语言模型#Effortless data labeling with AI support from Segment Anything and other awesome models.
#大语言模型#Solve Visual Understanding with Reinforced VLMs
#大语言模型#On device AI inference in minutes—now for MLX & GGUF and Qualcomm NPU, with Android and iOS coming soon.
#大语言模型#StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textu...
#大语言模型#The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
#大语言模型#Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.
#大语言模型#Build multimodal language agents for fast prototype and production
#大语言模型#An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes ...
#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
#自然语言处理#[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces,...
#大语言模型#A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
#自然语言处理#A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning