Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
#大语言模型#Multimodal RAG to search and interact locally with technical documents of any kind
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
#计算机科学#This repository contains the dataset and source files to reproduce the results in the publication Müller-Budack et al. 2021: "Multimodal news analytics using measures of cross-modal entity and context...
Explores early fusion and late fusion approaches for Multimodal medical Image Retrieval
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
A Survey of Multimodal Retrieval-Augmented Generation
Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
The official code of "Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search"
Multimodal retrieval in art with context embeddings.
Official Implementation of GENIUS: A Generative Framework for Universal Multimodal Search, CVPR 2025
A list of research papers on knowledge-enhanced multimodal learning
A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.
The code used to train and run inference with MMDocIR
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
#计算机科学#Evaluating dense model-based approaches for Multimodal Medical Case retrieval.