#

vlm

https://static.github-zh.com/github_avatars/bytedance?size=40

Agent TARS 是一个通用的多模态 AI Agent Stack,它将 GUI Agent 和 Vision 的强大功能带入你的终端、计算机、浏览器和产品中。UI-TARS Desktop 是一个桌面应用程序,基于 UI-TARS 模型提供原生的 GUI Agent。

TypeScript 18.75 k
1 小时前
https://static.github-zh.com/github_avatars/roboflow?size=40

#计算机科学#A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, ...

Jupyter Notebook 8.4 k
20 天前
https://static.github-zh.com/github_avatars/PaddlePaddle?size=40

#大语言模型#The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

Python 7.48 k
3 天前
https://static.github-zh.com/github_avatars/om-ai-lab?size=40
Python 5.54 k
17 天前
https://static.github-zh.com/github_avatars/NexaAI?size=40

#大语言模型#On device AI inference in minutes—now for MLX & GGUF and Qualcomm NPU, with Android and iOS coming soon.

Go 4.78 k
2 天前
https://static.github-zh.com/github_avatars/joanrod?size=40

#大语言模型#StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textu...

Python 4.03 k
5 个月前
https://static.github-zh.com/github_avatars/MiniMax-AI?size=40

#大语言模型#The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

Python 3.14 k
2 个月前
https://static.github-zh.com/github_avatars/SkyworkAI?size=40

#大语言模型#Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

Python 2.94 k
1 个月前
https://static.github-zh.com/github_avatars/QiuYannnn?size=40

#大语言模型#An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes ...

Python 2.55 k
1 年前
https://static.github-zh.com/github_avatars/BAAI-Agents?size=40

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

Python 2.27 k
10 个月前
https://static.github-zh.com/github_avatars/heshengtao?size=40

LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces,...

Python 1.91 k
8 天前
https://static.github-zh.com/github_avatars/modelscope?size=40

#大语言模型#A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Python 1.66 k
3 天前
https://static.github-zh.com/github_avatars/ThuCCSLab?size=40
1.66 k
1 天前
https://static.github-zh.com/github_avatars/zai-org?size=40

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 1.63 k
4 天前
https://static.github-zh.com/github_avatars/coderonion?size=40

#数据仓库#🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

1.59 k
4 个月前
loading...
Website
Wikipedia