multimodality · GitHub Topics

#计算机科学#A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

人工智能深度学习 text-to-image Generative Adversarial Network multimodality

Python 2.57 k

3 年前

BAAI-Agents / Cradle

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python 2.22 k

9 个月前

hymie122 / RAG-Survey

#大语言模型#Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

aigc rag survey diffusion-models 大语言模型 multimodality

1.69 k

1 年前

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

聊天机器人 llama3 multimodal multimodal-large-language-models multimodality qwen vision-language-model

Python 998

2 小时前

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

multimodal-learning multimodality multimodal search ranking retrieval-model retrieval activitynet clip

Python 974

1 年前

PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems

recommender-system recommendation-algorithms recommendation-engine matrix-factorization collaborative-filtering multimodal-learning recommendation-system multimodality

Python 970

3 个月前

fnzhan / Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

aigc diffusion-model gans multimodality

TeX 758

2 年前

aimclub / FEDOT

#计算机科学#Automated modeling and machine learning framework FEDOT

automl 机器学习 evolutionary-algorithms automated-machine-learning hyperparameter-optimization parameter-tuning 自动化 multimodality

Python 682

10 天前

VITA-MLLM / Woodpecker

#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models 大语言模型 mllm multimodal-large-language-models multimodality

Python 640

7 个月前

jshilong / GPT4RoI

#大语言模型#GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

gpt 大语言模型 multimodality roi 机器视觉

Python 539

2 个月前

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip multimodality

Python 532

1 个月前

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-speech text-to-video 大语言模型 mllm

HTML 494

4 个月前

zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

multimodality vision-and-language

Python 481

3 年前

MMMU-Benchmark / MMMU

#自然语言处理#This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

机器视觉深度学习深度神经网络 evaluation foundation-models large-language-models large-multimodal-models 大语言模型机器学习 multimodal multimodal-deep-learning multimodal-learning multimodality 自然语言处理 question-answering STEM visual-question-answering

Python 470

2 个月前

afiaka87 / clip-guided-diffusion

#计算机科学#A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

multimodal image-generation text-to-image-synthesis text-to-image openai 深度学习人工智能 diffusion multimodality

Python 462

3 年前

HazyResearch / fonduer

#计算机科学#A knowledge base construction engine for richly formatted data

multimodality 机器学习

Python 410

4 年前

kyegomez / Med-PaLM

#计算机科学#Towards Generalist Biomedical AI

biomedical 深度学习 gpt4 multimodal multimodal-deep-learning multimodality Open Source

Python 409

1 年前

lium-lst / nmtpytorch

#计算机科学#Sequence-to-Sequence Framework in PyTorch

深度学习 PyTorch seq2seq nmt neural-machine-translation asr speech-recognition multimodality cnn

Jupyter Notebook 391

3 年前

OmicsML / dance

#计算机科学#DANCE: a deep learning library and benchmark platform for single-cell analysis

Bioinformatics 数据科学深度学习 graph-neural-networks 机器学习 multimodality Python benchmark computational-biology

Python 371

8 天前

kyegomez / CM3Leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

attention attention-is-all-you-need dalle multimodal multimodal-learning multimodality

Python 362

2 年前