#

multimodal-ai

https://static.github-zh.com/github_avatars/duixcom?size=40

Heygem是一款专为Windows系统设计的全离线视频合成工具,它能够精确克隆您的外貌和声音,让您的形象数字化。您可以通过文字和语音驱动虚拟形象,进行视频制作。无需联网,保护隐私的同时,也能享受到便捷和高效的数字体验。

C 11.42 k
17 天前
https://static.github-zh.com/github_avatars/NeuralNodeOne814?size=40

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your ...

290
5 个月前
https://static.github-zh.com/github_avatars/CyberLinkGamma314?size=40

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your ...

289
5 个月前
https://static.github-zh.com/github_avatars/BinarySyncBeta995?size=40

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your ...

287
5 个月前
https://static.github-zh.com/github_avatars/NanoNetGamma531?size=40

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your ...

268
5 个月前
https://static.github-zh.com/github_avatars/Denis2054?size=40

This GitHub repository contains the complete code for building Business-Ready Generative AI Systems (GenAISys) from scratch. It guides you through architecting and implementing advanced AI controllers...

Jupyter Notebook 103
2 个月前
https://static.github-zh.com/github_avatars/seehiong?size=40

A web app that dynamically generates playable 'Spot the Difference' games from a single text prompt using a multimodal pipeline with Google's Gemini and Imagen models.

TypeScript 32
1 个月前
https://static.github-zh.com/github_avatars/Livyatan-melvillei?size=40

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Python 25
2 个月前
https://static.github-zh.com/github_avatars/sinanuozdemir?size=40
Jupyter Notebook 25
9 个月前
https://static.github-zh.com/github_avatars/microsoft?size=40

Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image em...

HCL 16
1 个月前
https://static.github-zh.com/github_avatars/alperensumeroglu?size=40

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Python 15
6 个月前
https://static.github-zh.com/github_avatars/kiranbaby14?size=40

🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync

TypeScript 10
3 个月前
https://static.github-zh.com/github_avatars/NxtGenLegend?size=40

#自然语言处理##3 Winner of Best Use of Zoom API at Stanford TreeHacks 2025! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

JavaScript 7
8 个月前
https://static.github-zh.com/github_avatars/Sh1nr1?size=40

#大语言模型#Mai is an emotionally intelligent, voice-enabled AI assistant built with FastAPI, Together.ai LLMs, memory persistence via ChromaDB, and real-time sentiment analysis. Designed to feel alive, empatheti...

Python 6
4 个月前
https://static.github-zh.com/github_avatars/VectorInstitute?size=40

#自然语言处理#VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.

Python 6
4 个月前
https://static.github-zh.com/github_avatars/debanjan06?size=40

#计算机科学#AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface

Python 5
3 个月前
https://static.github-zh.com/github_avatars/UjjwalSaini07?size=40

#大语言模型#OllamaMulti-RAG 🚀 is a multimodal AI chat app combining Whisper AI for audio, LLaVA for images, and Chroma DB for PDFs, enhanced with Ollama and OpenAI API. 📄 Built for AI enthusiasts, it welcomes c...

Python 2
1 个月前
loading...
Website
Wikipedia