Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
#大语言模型#Multimodal RAG to search and interact locally with technical documents of any kind
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
Official Implementation of GENIUS: A Generative Framework for Universal Multimodal Search, CVPR 2025
#计算机科学#This repository contains the dataset and source files to reproduce the results in the publication Müller-Budack et al. 2021: "Multimodal news analytics using measures of cross-modal entity and context...
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Explores early fusion and late fusion approaches for Multimodal medical Image Retrieval
A Survey of Multimodal Retrieval-Augmented Generation
The official code of "Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search"
Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
Multimodal retrieval in art with context embeddings.
A list of research papers on knowledge-enhanced multimodal learning
The code used to train and run inference with MMDocIR
Official Implementation of "Composed Object Retrieval: Object-level Retrieval via Composed Expressions"
A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
#计算机科学#Evaluating dense model-based approaches for Multimodal Medical Case retrieval.